Download as pdf or txt
Download as pdf or txt
You are on page 1of 337

Methods in

Molecular Biology 2611

Georgi K. Marinov
William J. Greenleaf Editors

Chromatin
Accessibility
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY

Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK

For further volumes:


http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and
methodologies in the critically acclaimed Methods in Molecular Biology series. The series was
the first to introduce the step-by-step protocols approach that has become the standard in all
biomedical protocol publishing. Each protocol is provided in readily-reproducible step-by-
step fashion, opening with an introductory overview, a list of the materials and reagents
needed to complete the experiment, and followed by a detailed procedure that is supported
with a helpful notes section offering tips and tricks of the trade as well as troubleshooting
advice. These hallmark features were introduced by series editor Dr. John Walker and
constitute the key ingredient in each and every volume of the Methods in Molecular Biology
series. Tested and trusted, comprehensive and reliable, all protocols from the series are
indexed in PubMed.
Chromatin Accessibility

Methods and Protocols

Edited by

Georgi K. Marinov and William J. Greenleaf


Stanford University, Stanford, CA, USA
Editors
Georgi K. Marinov William J. Greenleaf
Stanford University Stanford University
Stanford, CA, USA Stanford, CA, USA

ISSN 1064-3745 ISSN 1940-6029 (electronic)


Methods in Molecular Biology
ISBN 978-1-0716-2898-0 ISBN 978-1-0716-2899-7 (eBook)
https://doi.org/10.1007/978-1-0716-2899-7
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part
of Springer Nature 2023
Chapter 6 is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://
creativecommons.org/licenses/by/4.0/). For further details see license information in the chapter.
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface

The genomic distribution patterns of chromatin accessibility and its dynamics are key
features of the regulation of gene expression and many other aspects of chromatin biology.
The genomes of eukaryotes are usually packaged by nucleosomal particles, which have a
generally strong inhibitory effect on transcription and on the occupancy of DNA by
regulatory proteins. It is typically active cis-regulatory regions (cREs) in the genome that
are characterized by depleted nucleosomal occupancy and increased chromatin accessibility,
which has in turn proven to be a highly useful property enabling the identification of
candidate cREs as well as the tracking of their activity across cell types and conditions as
accessible DNA can be preferentially enzymatically or chemically labeled in numerous ways.
Technological advances in the labeling and readout of accessible DNA have played a
major role in driving forward our understanding of chromatin and regulatory biology over
the last few decades. The last 15 years have seen a particularly dramatic explosion in the
variety and power of approaches for studying chromatin accessibility, driven by two sequen-
tial technological revolutions: first, the development of high-throughput sequencing in the
mid-2000s, and then the advent of single-cell genomics in the 2010s. The current book aims
to provide a comprehensive resource covering the existing and state-of-the-art tools in the
field.
We have divided the protocols in the book into several sections, depending on the
different aspects of chromatin accessibility that they measure and/or approaches that they
take. In the first section, bulk-cell methods for profiling chromatin accessibility and nucleo-
some positioning that rely on enzymatic cleavage of accessible DNA and produce informa-
tion about relative accessibility are covered. The second section is dedicated to methods that
use single-molecule and enzymatic approaches to solving the problem of mapping absolute
occupancy/accessibility. The third section covers the wide array of emerging tools for
mapping DNA accessibility and nucleosome positioning in single cells, as well as a number
of single-cell multiomics methods that simultaneously measure chromatin accessibility and
other features of the cell, such as the transcriptome, the methylome, and protein markers.
More recently, imaging-based methods for visualizing accessible chromatin in its nuclear
context have emerged; these are included in the fourth section. The final section features
computational methods for the processing and analysis of chromatin accessibility datasets.
This book will serve as an extensive and useful reference for researchers studying different
facets of chromatin accessibility in a wide variety of biological contexts.

Stanford, CA, USA Georgi K. Marinov


William J. Greenleaf

v
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

PART I BULK CLEAVAGE-BASED METHODS

1 Genome-Wide Mapping of Active Regulatory Elements Using


ATAC-seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Georgi K. Marinov, Zohar Shipony, Anshul Kundaje,
and William J. Greenleaf
2 Mapping Nucleosome Location Using FS-Seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Barry Milavetz, Brenna Hanson, Kincaid Rowbotham,
and Jacob Haugen
3 Universal NicE-Seq: A Simple and Quick Method for Accessible
Chromatin Detection in Fixed Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Hang Gyeong Chin, Udayakumar S. Vishnu, Zhiyi Sun,
V. K. Chaithanya Ponnaluri, Guoqiang Zhang, Shuang-yong Xu,
Touati Benoukraf, Paloma Cejas, George Spracklin,
Pierre-Olivier Estève, Henry W. Long, and Sriharsa Pradhan
4 Measuring Inaccessible Chromatin Genome-Wide Using Protect-seq . . . . . . . . . 53
George Spracklin, Liyan Yang, Sriharsa Pradhan, and Job Dekker
5 Determination of the Chromatin Openness in Bacterial Genomes. . . . . . . . . . . . . 63
Mahmoud M. Al-Bassam and Karsten Zengler
6 Profiling Chromatin Accessibility on Replicated DNA with
repli-ATAC-Seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Kathleen R. Stewart-Morgan and Anja Groth
7 Analysis of Chromatin Interaction and Accessibility by Trac-Looping. . . . . . . . . . 85
Shuai Liu, Qingsong Tang, and Keji Zhao

PART II METHODS FOR MEASURING THE ABSOLUTE


LEVELS OF OCCUPANCY/ACCESSIBILITY

8 Single-Molecule Mapping of Chromatin Accessibility Using


NOMe-seq/dSMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Michaela Hinks, Georgi K. Marinov, Anshul Kundaje,
Lacramioara Bintu, and William J. Greenleaf
9 ORE-Seq: Genome-Wide Absolute Occupancy Measurement
by Restriction Enzyme Accessibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Elisa Oberbeckmann, Michael Roland Wolff, Nils Krietenstein,
Mark Heron, Andrea Schmid, Tobias Straub, Ulrich Gerland,
and Philipp Korber

vii
viii Contents

PART III METHODS FOR PROFILING CHROMATIN ACCESSIBILITY


AT THE SINGLE-CELL LEVEL

10 Single-Cell Joint Profiling of Open Chromatin and Transcriptome


by Paired-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Chenxu Zhu, Zhaoning Wang, and Bing Ren
11 Simultaneous Single-Cell Profiling of the Transcriptome and
Accessible Chromatin Using SHARE-seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Samuel H. Kim, Georgi K. Marinov, S. Tansu Bagdatli, Soon Il Higashino,
Zohar Shipony, Anshul Kundaje, and William J. Greenleaf
12 Simultaneous Measurement of DNA Methylation and Nucleosome
Occupancy in Single Cells Using scNOMe-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Michael Wasney and Sebastian Pott
13 Massively Parallel Profiling of Accessible Chromatin and Proteins
with ASAP-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Eleni Mimitou, Peter Smibert, and Caleb A. Lareau
14 Concomitant Sequencing of Accessible Chromatin and Mitochondrial
Genomes in Single Cells Using mtscATAC-Seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Leif S. Ludwig and Caleb A. Lareau

PART IV IMAGING METHODS FOR VISUALIZATION OF ACCESSIBLE DNA

15 ATAC-See: A Tn5 Transposase-Mediated Assay for Detection of


Chromatin Accessibility with Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Yonglong Dang, Ram Prakash Yadav, and Xingqi Chen
16 NicE-viewSeq: An Integrative Visualization and Genomics Method
to Detect Accessible Chromatin in Fixed Cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Pierre-Olivier Estève, Udayakumar S. Vishnu,
Hang Gyeong Chin, and Sriharsa Pradhan

PART V COMPUTATIONAL ANALYSIS OF CHROMATIN ACCESSIBILITY DATASETS

17 ATAC-seq Data Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305


Daniel S. Kim
18 Deep Learning on Chromatin Accessibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Daniel S. Kim

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Contributors

MAHMOUD M. AL-BASSAM • Department of Pediatrics, University of California, San Diego,


La Jolla, CA, USA
S. TANSU BAGDATLI • Department of Genetics, Stanford University, Stanford, CA, USA
TOUATI BENOUKRAF • Faculty of Medicine, Craig L. Dobbin Genetics Research Centre,
Memorial University of Newfoundland, St. John’s, NL, Canada
LACRAMIOARA BINTU • Department of Bioengineering, Stanford University, Stanford, CA,
USA
PALOMA CEJAS • Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute,
Boston, MA, USA
XINGQI CHEN • Department of Immunology, Genetics and Pathology, Uppsala University,
Uppsala, Sweden
HANG GYEONG CHIN • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology
Division, New England Biolabs, Inc., Ipswich, MA, USA
YONGLONG DANG • Department of Immunology, Genetics and Pathology, Uppsala
University, Uppsala, Sweden
JOB DEKKER • Program in Systems Biology, University of Massachusetts Medical School,
Worcester, MA, USA; Howard Hughes Medical Institute, Boston, MA, USA
PIERRE-OLIVIER ESTÈVE • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology
Division, New England Biolabs, Inc., Ipswich, MA, USA
ULRICH GERLAND • Department of Physics, Technical University of Munich, Garching,
Germany
WILLIAM J. GREENLEAF • Department of Genetics, Stanford University, Stanford, CA, USA;
Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA;
Department of Applied Physics, Stanford University, Stanford, CA, USA; Chan Zuckerberg
Biohub, San Francisco, CA, USA
ANJA GROTH • Novo Nordisk Foundation Center for Protein Research (CPR), Faculty of
Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Biotech
Research and Innovation Centre (BRIC), Faculty of Health and Medical Sciences,
University of Copenhagen, Copenhagen, Denmark
BRENNA HANSON • Department of Biomedical Sciences, School of Medicine, University of
North Dakota, Grand Forks, ND, USA
JACOB HAUGEN • Department of Biomedical Sciences, School of Medicine, University of North
Dakota, Grand Forks, ND, USA
MARK HERON • Quantitative and Computational Biology, Max Planck Institute for
Biophysical Chemistry, Göttingen, Germany; Gene Center, Faculty of Chemistry and
Pharmacy, Ludwig-Maximilians-Universit€ a t München, Munich, Germany
SOON IL HIGASHINO • Department of Genetics, Stanford University, Stanford, CA, USA
MICHAELA HINKS • Department of Genetics, Stanford University, Stanford, CA, USA
DANIEL S. KIM • Biomedical Informatics Program, Stanford University School of Medicine,
Stanford, CA, USA
SAMUEL H. KIM • Cancer Biology Program, School of Medicine, Stanford University,
Stanford, CA, USA; Medical Scientist Training Program, Stanford University, Stanford,
CA, USA

ix
x Contributors

PHILIPP KORBER • Biomedical Center (BMC), Division of Molecular Biology, Faculty of


Medicine, LMU Munich, Martinsried, Germany
NILS KRIETENSTEIN • Novo Nordisk Foundation Center for Protein Research, University of
Copenhagen, Copenhagen, Denmark
ANSHUL KUNDAJE • Department of Genetics, Stanford University, Stanford, CA, USA;
Department of Computer Science, Stanford University, Stanford, CA, USA
CALEB A. LAREAU • Departments of Genetics and Pathology, Stanford University, Stanford,
CA, USA
SHUAI LIU • Laboratory of Epigenome Biology, Systems Biology Center, Division of
Intramural Research, National Heart, Lung and Blood Institute, National Institutes of
Health, Bethesda, MD, USA
HENRY W. LONG • Center for Functional Cancer Epigenetics, Dana-Farber Cancer
Institute, Boston, MA, USA
LEIF S. LUDWIG • Berlin Institute of Health at Charité Universit€atsmedizin Berlin, Berlin,
Germany; Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association,
Berlin Institute for Medical Systems Biology, Berlin, Germany
GEORGI K. MARINOV • Department of Genetics, Stanford University, Stanford, CA, USA
BARRY MILAVETZ • Department of Biomedical Sciences, School of Medicine, University of
North Dakota, Grand Forks, ND, USA
ELENI MIMITOU • Immunai, New York, NY, USA
ELISA OBERBECKMANN • Department of Molecular Biology, Max Planck Institute for
Multidisciplinary Sciences, Göttingen, Germany
V. K. CHAITHANYA PONNALURI • New England Biolabs Inc., Ipswich, MA, USA
SEBASTIAN POTT • University of Chicago, Chicago, IL, USA
SRIHARSA PRADHAN • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology
Division, New England Biolabs, Inc., Ipswich, MA, USA
BING REN • Ludwig Institute for Cancer Research, La Jolla, CA, USA; Department of
Cellular and Molecular Medicine, University of California San Diego, School of Medicine,
La Jolla, CA, USA; Center for Epigenomics, Institute of Genomic Medicine, Moores Cancer
Center, University of California San Diego, School of Medicine, La Jolla, California, USA
KINCAID ROWBOTHAM • Department of Biomedical Sciences, School of Medicine, University of
North Dakota, Grand Forks, ND, USA
ANDREA SCHMID • Biomedical Center (BMC), Division of Molecular Biology, Faculty of
Medicine, LMU Munich, Martinsried, Germany
ZOHAR SHIPONY • Department of Genetics, Stanford University, Stanford, CA, USA
PETER SMIBERT • 10x Genomics, Pleasanton, CA, USA
GEORGE SPRACKLIN • Program in Systems Biology, University of Massachusetts Medical School,
Worcester, MA, USA
KATHLEEN R. STEWART-MORGAN • Novo Nordisk Foundation Center for Protein Research
(CPR), Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen,
Denmark; Biotech Research and Innovation Centre (BRIC), Faculty of Health and
Medical Sciences, University of Copenhagen, Copenhagen, Denmark
TOBIAS STRAUB • Core Facility Bioinformatics, Biomedical Center (BMC), Faculty of
Medicine, LMU Munich, Martinsried, Germany
ZHIYI SUN • New England Biolabs Inc., Ipswich, MA, USA
QINGSONG TANG • Laboratory of Epigenome Biology, Systems Biology Center, Division of
Intramural Research, National Heart, Lung and Blood Institute, National Institutes of
Health, Bethesda, MD, USA
Contributors xi

UDAYAKUMAR S. VISHNU • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology
Division, New England Biolabs, Inc., Ipswich, MA, USA
ZHAONING WANG • Department of Cellular and Molecular Medicine, University of
California San Diego, School of Medicine, La Jolla, CA, USA
MICHAEL WASNEY • Genetics and Genomics Program, University of California, Los Angeles,
CA, USA
MICHAEL ROLAND WOLFF • Department of Physics, Technical University of Munich,
Garching, Germany
SHUANG-YONG XU • New England Biolabs Inc., Ipswich, MA, USA
RAM PRAKASH YADAV • Department of Immunology, Genetics and Pathology, Uppsala
University, Uppsala, Sweden
LIYAN YANG • Program in Systems Biology, University of Massachusetts Medical School,
Worcester, MA, USA
KARSTEN ZENGLER • Department of Pediatrics, University of California, San Diego, La Jolla,
CA, USA; Center for Microbiome Innovation, University of California, San Diego, La
Jolla, CA, USA; Department of Bioengineering, University of California, San Diego, La
Jolla, CA, USA
GUOQIANG ZHANG • New England Biolabs Inc., Ipswich, MA, USA
KEJI ZHAO • Laboratory of Epigenome Biology, Systems Biology Center, Division of
Intramural Research, National Heart, Lung and Blood Institute, National Institutes of
Health, Bethesda, MD, USA
CHENXU ZHU • Ludwig Institute for Cancer Research, La Jolla, CA, USA
Part I

Bulk Cleavage-Based Methods


Chapter 1

Genome-Wide Mapping of Active Regulatory Elements Using


ATAC-seq
Georgi K. Marinov, Zohar Shipony, Anshul Kundaje,
and William J. Greenleaf

Abstract
Active cis-regulatory elements (cREs) in eukaryotes are characterized by nucleosomal depletion and,
accordingly, higher accessibility. This property has turned out to be immensely useful for identifying
cREs genome-wide and tracking their dynamics across different cellular states and is the basis of numerous
methods taking advantage of the preferential enzymatic cleavage/labeling of accessible DNA. ATAC-seq
(Assay for Transposase-Accessible Chromatin using sequencing) has emerged as the most versatile and
widely adaptable method and has been widely adopted as the standard tool for mapping open chromatin
regions. Here, we discuss the current optimal practices and important considerations for carrying out
ATAC-seq experiments, primarily in the context of mammalian systems.

Key words Enhancers, Promoters, Chromatin accessibility, ATAC-seq, High-throughput sequencing

1 Introduction

Eukaryotic chromatin is generally packaged by nucleosomes, octa-


mer particles comprised of the four core nucleosomal histones H3,
H4, H2A, and H2B [1]. Nucleosomal packaging has an inhibitory
effect on transcriptional activity and prevents the binding of most
transcription factors and other regulatory proteins. Active pro-
moter and enhancer elements differ from the rest of the genome
in that they usually exist in a depleted of nucleosomes, open chro-
matin state. This property is highly useful in practice because just as
regulatory factors can access active cREs so can various enzymes,
whose action is otherwise inhibited by nucleosome particles. That
enhancers and promoters exhibit this property was already appre-
ciated nearly four decades ago, when their hypersensitivity to cleav-
age by DNase enzymes was first reported [2–4].

Georgi K. Marinov and Zohar Shipony authors contributed equally to this work.

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_1,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

3
4 Georgi K. Marinov et al.

DNase remained the main tool for mapping active cREs into
the genomic era, initially coupled to microarray readouts [5–7]
and eventually adapted to a high-throughput sequencing format
[8–10]. In parallel to these developments as well as more recently,
a wide variety of alternative methods taking advantage of the
preferential enzymatic/chemical cleavage/modification of accessi-
ble DNA were also developed, employing methyltransferases
[11–15], restriction enzymes [16], nicking enzymes [17], small
molecules [18], viral integration [19], and others.
ATAC-seq, which is based on the preferential insertion into
unprotected DNA by a hyperactive mutant version of the Tn5
transposase [20] (Fig. 1), has emerged as the most convenient,
widely adaptable and straightforward to execute method for
profiling open chromatin. Treatment of chromatin with Tn5 results
in the insertion into accessible DNA of adapters that then enable
the direct amplification of open chromatin fragments. This elim-
inates much of complex series of enzymatic steps that are unavoid-
able features of previous methods such as DNase-seq, allows for the
protocol to be completed in just a few hours, and also dramatically
lowers the input requirements, down to a few tens of thousands of
cells in bulk reactions as well as enabling single cell (scATAC) assays
[21, 22].
In this chapter, we describe the most important considerations
for carrying out successful ATAC-seq experiments in the context of
the Omni-ATAC protocol, an optimized version of the ATAC-seq
assay that produces high-quality ATAC libraries for most mamma-
lian cell lines and cell types, as well as for a number of other
eukaryotes.

2 Materials

Prepare a master stock of the ATAC-RSB buffer without detergents


in a large volume (e.g., 50 mL) and store it 4 ∘C.
Prepare master stocks of 2× TD buffer (e.g., in 2-mL tubes)
and keep those at - 20∘C

2.1 Transposition Prepare the ATAC-RSB-Lysis and ATAC-RSB-Wash buffers imme-


Buffers and Reagents diately before use by adding the necessary detergents; keep on ice.
1. IGEPAL CA-630 detergent (Sigma Cat# 11332465001; sup-
plied as a 10% solution)
2. Tween-20 detergent (Sigma Cat# 11332465001, supplied as a
10% solution; store at 4 ∘C)
3. Digitonin detergent (Promega Cat# G9441, supplied as a 2%
solution in DMSO; store at - 20∘C))
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 5

cells

Tn5 transposase

nuclei isolaon

Tagmentaon

- transposion
- library building
- sequencing

Purificaon and amplificaon

Final library, sequencing

Fig. 1 Outline of the ATAC-seq assay. Nuclei are isolated from cells and chromatin is incubated with an active
Tn5 transposase carrying PCR amplification adapter sequences. Tn5 preferentially inserts into accessible
chromatin, such as that found at active regulatory elements. After transposition, DNA is purified and PCR
amplification is carried out from the primer landing sites deposited by Tn5

4. ATAC-RSB buffer (master stock)


10 mM Tris-HCl pH 7.4
10 mM NaCl
3 mM MgCl2
5. ATAC-RSB-Lysis buffer
10 mM Tris-HCl pH 7.4
10 mM NaCl
3 mM MgCl2
0.1% IGEPAL CA-630
6 Georgi K. Marinov et al.

0.1% Tween-20
0.01% Digitonin
6. Lysis Wash Buffer (ATAC-RSB-wash)
10 mM Tris-HCl pH 7.4
10 mM NaCl
3 mM MgCl2
0.1% Tween-20
7. 2× TD buffer
20 mM Tris-HCl pH 7.6
10 mM MgCl2
20% Dimethyl Formamide
8. Tn5 transposase (see Note 1)

2.2 Library Building, 1. 200-μL PCR tubes


Sequencing, and 2. Sequencing primers/adapters (see Note 2)
Quality Evaluation 3. NEBNext High-Fidelity 2× PCR Master Mix (NEB, Cat#
M0541S)
4. Qubit fluorometer or equivalent
5. QuBit tubes
6. QuBit dsDNA HS Assay Kit
7. TapeStation (Agilent) or equivalent, e.g., BioAnalyzer
(Agilent).
8. TapeStation D1000 tape and reagents (Agilent)
9. 10 mM dNTP mix
10. 25× SYBR Green (Thermo Fisher Cat# S7563. Supplied as
10,000X)
11. Phusion High-Fidelity DNA Polymerase (NEB, Cat#
M0530L)

2.3 General 1. 1.5-mL microcentrifuge tubes, preferably low protein and


Materials and DNA binding (see Note 4)
Equipment 2. 2-mL, 15-mL, and 50-mL tubes
3. Incubator (37 ∘C), or a Thermomixer.
4. Tabletop centrifuge
5. Thermal cycler
6. MinElute PCR Purification Kit (Qiagen Cat# 28004/28006),
Zymo DNA Clean and Concentrator Kit (Zymo Cat# D4013/
D4014), or equivalent (see Note 5)
7. Nuclease-free H2O
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 7

8. 1× PBS buffer solution


9. qPCR machine (Step One or equivalent)

3 Methods

The general outline of the ATAC-seq assay is shown in Fig. 1.


Nuclei are isolated from cells, then a transposition reaction is
carried out, DNA purified, and sequencing libraries prepared.
Here we discuss the Omni-ATAC protocol [23] in its most widely
applicable version, as it derives the optimal results in terms of
reduced mitochondrial contamination (see Note 6) compared to
other versions of the assay [20, 24]. The Omni-ATAC protocol
works as described for the great majority of mammalian and insect
cell lines, as well as for many other eukaryotic cells without cell
walls. Different protocols need to be applied for nuclei isolation
from other sources, such as tissues (see Note 7), plant cells (see Note
8), various small metazoan animals (see Note 9), yeast (see Note
10), and others.
It is also in principle possible to carry out ATAC-seq on cross-
linked material but this generally produces suboptimal results and
we advise against it (see Note 11).

3.1 Removal of Non- The presence of non-viable cells can negatively affect the quality of
viable Cells (Optional) final ATAC-seq libraries as dead cells generate a general back-
ground of dechromatinized DNA, decreasing the enrichment for
open chromatin regions. Two strategies are usually used to address
this problem:
1. If the fraction of dead cells is not too high (i.e., 5–15%), cells
are treated with DNAse (200 U/mL) in culture media, usually
for 30 min at 37∘C. Cells are then washed thoroughly with
1×PBS to remove DNAse.
2. If the fraction of dead cells is higher, live cells can be separated
from dead cells using a Ficoll gradient (Sigma Cat# GE17-
1440-02), with the exact conditions varying depending on
the cell type.

3.2 Preparation of Once the quality of the input cells has been ensured, the next step is
Nuclei to prepare nuclei and transpose them. The empirically determined
optimum input number of cells for a species with a mammalian-
sized genome is 50,000 diploid cells. Scale appropriately according
to expected genome size and ploidy, and also change other para-
meters, such as centrifugation speeds, if necessary.
1. Centrifuge 50,000 viable cells at 500 g for 5 min at 4∘C
2. Carefully aspirate the supernatant avoiding the pellet.
8 Georgi K. Marinov et al.

3. Add 50 μL of cold ATAC-RSB-Lysis Buffer and pipette up and


down several times.
4. Incubate on ice for 3 min
5. Add 1 mL cold ATAC-RSB-Wash Buffer, and invert several
times to mix well.
6. Centrifuge at 500 g for 5 min at 4∘C
7. Carefully aspirate the supernatant as fully as possible while
avoiding the pellet.

3.3 Transposition Carry out transposition as follows:

1. Immediately resuspend the pellet in the transposase


reaction mix:
25 μL TD buffer
2.5 μL Tn5
22.5 μL nuclease-free H20
2. Incubate at 37∘C for 30 min in a Thermomixer at 1000 RPM.

3.4 DNA Purification 1. Immediately stop the reaction using 250 μL (i.e., 5×) of PB
buffer (if using MinElute) or DNA Binding Buffer (if using
Zymo; also see Note 12).
2. Purify samples following the kit instructions.
3. Elute with 10 μL of Elution Buffer.

3.5 PCR Typically, a dual-indexing approach is used when amplifying ATAC-


Amplification and seq libraries. The general structure of an ATAC-seq library as well as
Library Generation the relevant adapter and primer sequences are shown in Fig. 2. See
Note 2 for further discussion.
1. Set up a PCR reaction as follows:
10 μL eluted transposition reaction
10 μL Nuclease-free H2O
2.5 μL of Adapter 1
2.5 μL of Adapter 2
25 μL NEBNext High-Fidelity 2× PCR Master Mix
(see Note 3)
2. Optimization of PCR conditions, pre-amplification. Amplify
DNA for 5 cycles as follows:
72∘C for 3 min
98∘C for 30 s
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 9

A A insert B

ME ME

P5 i5 SR1

SR2 i7 P7

ME
B A 5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’
3’TCTACACATATTCTCTGTC-5;
ME
5’-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3’
B 3’TCTACACATATTCTCTGTC-5’

i7 primer 5’-CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG-3’

i5 primer 5’-AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC

Fig. 2 Structure of an ATAC-seq library. (a) After transposition, an original DNA fragment is flanked by two Tn5
molecules with their adapter. Note that all three possible configurations—A-A, B-B, and A-B/B-A (where “A”
and “B” indicate the two different adapters that Tn5 molecules used for transposition carry; these sequences
have a common “ME” segment)—are produced, but only the A-B ones can be subsequently amplified and
sequenced under conventional protocols. The A and B are used as landing sites for the PCR primers that add
the i5 and i7 barcodes and the P5 and P7 sequences needed for Illumina sequencing. (b) Typical sequences of
A and B adapters and of i5 and i7 PCR primers. The [i7] and [i5] sequences are typically 8-bp long and should
be chosen appropriately so as to maximize the sequence distance between each pair of indexes

5 cycles of:
98∘C for 10 s
63∘C for 30 s
72∘C for 30 s
Hold at 4∘C
3. Determining additional cycles using qPCR. Use 5 μL of the
pre-amplified reaction in a total qPCR reaction of 15 μL as
follows:
3.76 μL nuclease-free H2O
0.5 μL of Adapter 1
0.5 μL of Adapter 2
0.24 μL 25× SYBR Green (in DMSO)
5 μL NEBNext High-Fidelity 2× PCR Master Mix
5 μL pre-amplified sample
10 Georgi K. Marinov et al.

4. Determining additional cycles using qPCR. Run the qPCR


reaction with the following settings in a qPCR machine:
98∘C for 30 s
20 cycles of:
98∘C for 10 s
63∘C for 30 s
72∘C for 30 s
Hold at 4∘C
5. Assess the amplification profiles and determine the required
number of additional cycles to amplify. Typical results are
shown in Fig. 3.
6. Carry out final amplification by placing the remaining 45 μL in
a thermocycler and running the following program:
Nadd cycles of:
98∘C for 10 s
63∘C for 30 s
72∘C for 30 s
Hold at 4∘C
Where Nadd is the number of additional cycles.
In practice, we have found that 8–10 cycles are usually
sufficient to amplify a standard mammalian ATAC library,
and, if a very large number of samples are being processed at
a time, the following reaction can be run:
7. Single-step PCR.
72∘C for 3 min
98∘C for 30 s
8–10 cycles of:
98∘C for 10 s
63∘C for 30 s
72∘C for 30 s
Hold at 4∘C
8. Purify the amplified library following the same procedure used
for purified the ATAC reaction.
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 11

a
35,000

30,000
Relative fluorescence

25,000

20,000

15,000

10,000
+6 +8 +10
5,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Cycle

9
8
7
6
Log Conc.

5
4
3
2 y = -1.0879x + 11.974
1 R² = 0.9916
0
4 5 6 7 8 9 10 11
Ct

Phix - 200pM - 100pM - 50pM - 25pM - 12.5pM - 6.25pM - 3.125pM - 1.56pM

Fig. 3 Determination of additional PCR cycles (post pre-amplification) and library quantification using qPCR. (a)
Determination of additional PCR cycles; qPCR is performed to determine the number of extra cycles to perform
on the pre-amplified ATAC material without reaching saturation. To determine the number of extra cycles, find
the number of cycles needed to reach 1/3 of the maximum relative fluorescence, and then carry out this
number of additional PCR cycles. (b) Quantification of libraries; qPCR qualification is performed on diluted
ATAC-seq libraries (400×) against a serial dilution of PhiX (200–1.56 pM). A standard curve is generated based
on the PhiX dilutions and used to calculate the molarity of the ATAC-seq library

3.6 Library Before libraries can be sequenced, they need to be properly quanti-
Quantification and fied and their quality evaluated. There are two components to this
Evaluation of Library process—first, evaluation of the insert distribution, and second,
Quality quantification.
1. Examination of library size distribution. This step can be car-
ried out using a variety of instruments that are now available for
this purpose, such as a TapeStation or a BioAnalyzer. In our
practice we prefer to use a TapeStation (with the D1000 or HS
D1000 kits) due to its ease of use, flexibility, and rapid turn-
around time. Typical results are shown in Fig. 4. A successful
12 Georgi K. Marinov et al.

Fig. 4 Evaluation of ATAC-seq library size distribution. Shown is the fragment length distribution as evaluated
using a TapeStation instrument and a D1000 TapeStation kit for an ATAC-seq library for the human GM12878
cell line. When a clear nucleosomal signature is observed, as in the example shown here, the library is most
likely of high quality. Note that the nucleosomal signature can in some cases be obscured by the presence of
high levels of mitochondrial contamination or some other source of highly accessible DNA (see Note 13 for
further discussion)

mammalian ATAC-seq library usually exhibits a clear nucleo-


somal signature (though the reverse is not always true; see Note
13 for further discussion).
2. Quantification of library concentration. For most high-
throughput sequencing applications, this step is standardly
carried out using a Qubit fluorometer. This works well for
most libraries as they exhibit a unimodal fragment length dis-
tribution, and the Qubit generally returns highly accurate and
reliable measurements.
However, ATAC libraries do not exhibit a unimodal frag-
ment distribution and in fact often contain fragments of length
higher than what can be sequenced on standard Illumina
instruments. Thus the effective library concentration often
differs from the apparent library concentration measured
using Qubit (though Qubit measurements can still be used,
with that caveat in mind, if no other information can be
obtained)
3. Estimation of effective library concentration using qPCR.
A standard curve is generated using Illumina PhiX standard
(10nM) by first making a 50× dilution to 200 pM, from which
additional seven serial 2× dilutions are made (to 100 pM,
50 pM, 25 pM, 12.5 pM, 6.25 pM, 3.125 pM, and 1.56 pM).
Set up a 20 μL qPCR reactions as follows:
7.9 μL nuclease-free H2O
5 μL ATAC-seq 400× diluted library or PhiX standards
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 13

4 μL Phusion HF Buffer
1 μL 25 μM i7 primer
1 μL 25 μM i5 primer
0.4 μL 10mM dNTP mix
0.5 μL 25× SYBR Green (in DMSO)
0.2 μL NEB Phusion HF
Run the qPCR reaction with the following settings in a
qPCR machine:
98∘C for 30 s
20 cycles of:
98∘C for 10 s
63∘C for 30 s
72∘C for 30 s
Hold at 4∘C
Create a standard curve based on the PhiX dilutions and
estimate the true molarity of the qPCR library based on it.
Commercial kits such as NEBNext Library Quant Kit for
Illumina or KAPA Library Quantification Kits can also be used,
in a similar manner.

3.7 Sequencing The protocol described here generates libraries designed to be


sequenced on Illumina sequencers. A decision usually needs to be
made regarding the format to be used when sequencing.
We strongly advise against sequencing ATAC-seq libraries in a
single-end format, for two reasons. First, analysis of the fragment
length distribution is an important part of the quality evaluation of
ATAC-seq datasets, and this is only truly possible in paired-end
format. Second, many analyses of ATAC-seq data (e.g., transcrip-
tion factor footprinting) operate at the level of examining insertion
points rather than read coverage; paired-end reads produce twice as
many such insertion points for the same cost.
In practice, we have observed that ATAC-seq insert length
distributions peaks around 50–60 bp (Fig. 5). Therefore it is most
cost effective to sequence ATAC libraries in 2×36 bp or 2 × 50 bp
formats (depending on the exact sequencer and kits available), as
sequencing kits with more cycles are usually priced significantly
higher.
However, for some applications (for example, if aiming to study
the effects of sequence variation on chromatin accessibility), longer
reads can provide important additional information.
Thus how exactly sequencing is to be executed is to be deter-
mined depending on the specific needs of the study being
carried out.
14 Georgi K. Marinov et al.

A Fragment length distribution


B 0.4
0.008

0.3
Fraction of fragments

0.006

AverageRPM
0.004 0.2

0.002 0.1

0.000 0.0

00

00

00

0
0 50 100 150 200 250 300 350 400 450 500

00

0
0

00

50

00
50
,0

,5

,0

-5

1,

1,

2,
-2

-1

-1
Fragment length Position relative to TSS
100 kb hg38
C chr8: 127,700,000 127,750,000 127,800,000 127,850,000

CASC11 PVT1
CASC11
MYC
MYC

Fig. 5 Expected results from a successful ATAC-seq experiment. (a) Shown is the insert length distribution of a
typical sequenced mammalian ATAC-seq library, showing a prominent subnucleosomal peak, as well as a
mononucleosomal and a less pronounced dinuleosomal peak. (b) Aggregate ATAC-seq signal profile around
transcription start sites (TSSs). (c) ATAC-seq profile in a 212-kb neighborhood around the human MYC gene.
The ENCODE Consortium [34] keratinocyte dataset with accession ID ENCSR798IJQ was used for this example

4 Expected Results

After sequencing, reads mapped to the reference genome, and


several quality evaluation metrics are considered before proceeding
with downstream analysis. A typical ATAC-seq library exhibits a
nucleosomal signature as shown in Fig. 5a. Enhancer, promote and
insulator regions should be strongly enriched relative to the rest of
the genome (Fig. 5c shows the accessibility profile in the neighbor-
hood of the MYC gene, which highlights a number of candidate
distal regulatory elements). A quick way to evaluate the degree of
enrichment is to examine aggregate plots of ATAC-seq signal
around annotated TSSs, as shown in Fig. 5b, which can also be
further formalized as a TSS ratio score, which is calculated by
dividing the average number of fragments within ±100 bp of the
TSS to the sum of the average number of fragments within the two
100-bp windows at the points + 2 and - 2 kb away from the TSS.
The advantage of this metric is that it is independent of peak calling
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 15

or sequencing depth; its disadvantage is that it is annotation-


dependent and well calibrated only for the human and mouse
genomes. In the latter two cases, good ATAC-seq libraries usually
exhibit TSS rations ≥8.

5 Notes

1. The Tn5 transposase can be obtained as part of the various


Nextera DNA Library Prep kits offered by Illumina, and also
from several other commercial vendors. It can also be made by
individual laboratories following previously published proto-
cols [25]. The latter approach is, although laborious, the most
cost effective, especially for large-scale projects. If homemade
Tn5 is used, its activity should ideally be well characterized
relative to standard enzymatic formulations.
2. PCR and indexing primers/adapters supplied with the Nextera
DNA Library Prep kits offered by Illumina can be used. Alter-
natively, or if a larger number of indexing sequences is needed,
custom-designed, and synthesized oligos can also be used with
equivalent success. The structure of an ATAC-seq library with
the relevant sequences is shown in Fig. 2. The i7 primer
sequence is:

5’-CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG-3’

The i5 sequence is:

5’AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC-3’

Where [i7] and [i5] are the index sequences (typically


8-bp long). Dissolve and dilute to 25 μM.
3. The initial extension is very important when amplifying trans-
posed DNA as it is needed to fill in the gap left from the
transposition itself (see Fig. 2) and allow PCR primers to land
in subsequent amplification cycles. For this reason, it is not
recommended to use hot-start polymerase mixes, in which the
polymerase is only activated by exposing to denaturation
temperatures.
4. Low-binding tubes are preferable, though not absolutely
required, as a low number of cells (only 50,000) is usually
used as input to an ATAC reaction.
5. The MinElute kit can be replaced with other DNA purification
kits; for example, we have also had equivalent success using the
DNA Clean & Concentrator from Zymo. The important
16 Georgi K. Marinov et al.

variables regarding the DNA isolation procedure after transpo-


sition are the efficiency of recovery and the lower size limit of
the recovered fragments. The insert length distribution of most
ATAC-seq libraries peaks around 50–60 bp, i.e., even including
Tn5 adapter sequences, many of the informative fragments are
shorter than 90–100 bp, and should ideally be preserved dur-
ing the DNA purification procedure.
6. Early versions of the ATAC-seq protocol [20] exhibited very
high proportions of reads originating from the mitochondrial
genome, often exceeding 80% of the total. This is due to the
fact that the mitochondrial genome is not packaged by nucleo-
somes, and is therefore highly accessible to transposase inser-
tion. Decreasing the fraction of mitochondria has been a key
part of the improvement of the ATAC-seq protocol in its the
currently used variants relative to the original version, and has
been achieved thanks to the addition of the combination of
digitonin, Tween-20 and IgePAL detergents during the cell
lysis and nuclei preparation step. As a result ATAC-seq libraries
generated using modern protocols frequently contain as little
as ≤5% of reads for many cell types.
We do note, however, that there are cells that simply con-
tain an extremely high number of mitochondria due to very
high levels of metabolic activity (e.g., some cancer cell lines),
and even with the optimized Omni-ATAC protocol mitochon-
drial fractions are still quite high for them. These are special
cases though.
We also note that high levels of mitochondrial contamina-
tion do not necessarily correspond to poor-quality ATAC-seq
datasets in terms of signal-to-noise ratios in the nuclear
genome. We have found no inverse correlation between the
fraction of reads mapping to the mitochondrial genome and
the levels of enrichment for open chromatin regions. The key
benefit of eliminating mitochondrial reads is to reduce
sequencing costs as fewer overall reads need to be sequenced
to achieve the necessary coverage over the nuclear genome.
7. Nuclei isolation from tissues, especially when frozen, is a multi-
step procedure involving tissue homogenization by douncing
followed by density gradient centrifugation. The reader is
referred to [23] for more details.
8. Plant cells are a challenging system to isolate nuclei from
because of their thick cellulose cell walls. Nuclei are isolated
by grinding tissue material in liquid nitrogen [26–28] and then
sorting nuclei by sucrose sedimentation or by FACS, if the cell
type of interest has been labeled accordingly, e.g., using the
INTACT approach [29].
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 17

9. It is also often necessary to carry out homogenization followed


by nuclei isolation when working with whole animals, e.g.,
C. elegans [30], with the exact protocol optimized according
to the specifics of the organism being studied.
10. Yeast (and fungal cells in general) have thick cell walls com-
prised of polysaccharides, lipids, and chitin in various propor-
tions. They present a barrier to the access of Tn5 to the
nucleus, thus ATAC-seq protocols tailored to such cells involve
treatment with zymolyase or chitinase enzymes [31], with the
exact details varying depending on the species studied.
11. It is in principle possible to carry out ATAC-seq experiments
and obtain enriched libraries from crosslinked sources. This has
in fact been how a number of scATAC-seq studies have been
executed in recent years [32, 33]. However, ATAC-seq
libraries generated from crosslinked cells are generally subopti-
mal, with lower signal-to-noise ratio than standard ATAC-seq
datasets, and they also tend to display a pronounced loss of
subnucleosomal fragments compared to the standard protocol.
We thus advice against using fixed material for ATAC-seq
except for special circumstances where this is the only available
option.
12. This is also a possible stopping point if necessary. DNA can be
stored in PB buffer at - 20∘C before proceeding with
subsequent clean up steps at a later time.
13. If a clear nucleosomal signature is observed in a TapeStation
profile that in almost all cases indicates a successful ATAC-seq
experiment. The inverse is not always true, as the fragment
distribution can be dominated by the presence of large
amounts of strongly accessible DNA in the original sample.
For example, libraries with high levels of mitochondrial con-
tamination often exhibit an obscured nucleosomal signature,
and so do nearly all yeast libraries (in the latter case it is because
yeast genomes contain a large number of ribosomal DNA
copies, which are nearly nucleosome-free when being actively
transcribed, and often comprise half or even more of yeast
ATAC-seq libraries [13]). As discussed in Note 6, high levels
of mitochondrial contamination are undesirable in terms of the
efficient utilization of sequencing resources but do not neces-
sarily result in poor-quality datasets in the nuclear genome.

Acknowledgements

The authors thank members of the Greenleaf and Kundaje labs for
many helpful discussions. This work was supported by NIH grants
UM1HG009436 and P50HG007735 (to W.J.G.). WJG is a Chan
18 Georgi K. Marinov et al.

Zuckerberg investigator. Z.S. is supported by EMBO Long-Term


Fellowship EMBO ALTF 1119-2016 and by Human Frontier
Science Program Long-Term Fellowship HFSP LT 000835/
2017-L. G.K.M. was supported by the Stanford School of Medi-
cine Dean’s Fellowship.

References
1. Luger K, M€ader AW, Richmond RK et al. turnover at paused promoters. Mol Cell 67:
(1997) Crystal structure of the nucleosome 411–422.e4
core particle at 2.8 A resolution. Nature 389: 13. Shipony Z, Marinov GK, Swaffer MP et al.
251–260 (2018) Long-range single-molecule mapping
2. Wu C (1980) The 5′ ends of Drosophila heat of chromatin accessibility in eukaryotes. bioR-
shock genes in chromatin are hypersensitive to xiv 504662
DNase I. Nature 286(5776):854–860 14. Wang Y, Wang A, Liu Z et al. (2019) Single-
3. Keene MA, Corces V, Lowenhaupt K et al. molecule long-read sequencing reveals the
(1981) DNase I hypersensitive sites in Dro- chromatin basis of gene expression. Genome
sophila chromatin occur at the 5′ ends of Res 29:1329–1342
regions of transcription. Proc Natl Acad Sci U 15. Aughey GN, Estacio Gomez A, Thomson J
S A 78:143–146 et al. (2018) CATaDa reveals global remodel-
4. McGhee JD, Wood WI, Dolan M et al. (1981) ling of chromatin accessibility during stem cell
A 200 base pair region at the 5′ end of the differentiation in vivo. Elife 7:pii: e32341
chicken adult β-globin gene is accessible to 16. Chereji RV, Eriksson PR, Ocampo J, Clark DJ
nuclease digestion. Cell 27:45–55 (2019) DNA accessibility is not the primary
5. Dorschner MO, Hawrylycz M, Humbert R determinant of chromatin-mediated gene reg-
et al. (2004) High-throughput localization of ulation bioRxiv 639971
functional elements by quantitative chromatin 17. Ponnaluri VKC, Zhang G, Estéve PO et al.
profiling. Nat Methods 1:219–225 (2017) NicE-seq: high resolution open chro-
6. Sabo PJ, Humbert R, Hawrylycz M et al. matin profiling. Genome Biol 18(1):122
(2004) Genome-wide identification of DNaseI 18. Umeyama T, Ito T (2017) DMS-seq for in vivo
hypersensitive sites using active chromatin genome-wide mapping of protein-DNA inter-
sequence libraries. Proc Natl Acad Sci U S A actions and nucleosome centers. Cell Rep 21:
101:4537–4542 289–300
7. Sabo PJ, Kuehn MS, Thurman R et al. (2006) 19. Timms RT, Tchasovnikarova IA, Lehner PJ
Genome-scale mapping of DNase I sensitivity (2019) Differential viral accessibility (DIVA)
in vivo using tiling DNA microarrays. Nat identifies alterations in chromatin architecture
Methods 3:511–518 through large-scale mapping of lentiviral inte-
8. Crawford GE, Holt IE, Whittle J et al. (2006) gration sites. Nat Protoc 14:153–170
Genome-wide mapping of DNase hypersensi- 20. Buenrostro JD, Giresi PG, Zaba LC et al.
tive sites using massively parallel signature (2013) Transposition of native chromatin for
sequencing (MPSS). Genome Res 16:123–131 fast and sensitive epigenomic profiling of open
9. Boyle AP, Davis S, Shulha HP et al. (2008) chromatin, DNA-binding proteins and nucleo-
High-resolution mapping and characterization some position. Nat Methods 10:1213–1218
of open chromatin across the genome. Cell 21. Buenrostro JD, Wu B, Litzenburger UM et al.
132:311–322 (2015) Single-cell chromatin accessibility
10. Thurman RE, Rynes E, Humbert R et al. reveals principles of regulatory variation.
(2012) The accessible chromatin landscape of Nature 523:486–490
the human genome. Nature 489:75–82 22. Cusanovich DA, Daza R, Adey A et al. (2015)
11. Kelly TK, Liu Y, Lay FD et al. (2012) Genome- Multiplex single cell profiling of chromatin
wide mapping of nucleosome positioning and accessibility by combinatorial cellular indexing.
DNA methylation within individual DNA Science 348:910–914
molecules. Genome Res 22:2497–2506 23. Corces MR, Trevino AE, Hamilton EG et al.
12. Krebs AR, Imanci D, Hoerner L, Gaidatzis D (2017) An improved ATAC-seq protocol
et al. (2017) Genome-wide single-molecule reduces background and enables interrogation
footprinting reveals high RNA polymerase II of frozen tissues. Nat Methods 14:959–962
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 19

24. Corces MR, Buenrostro JD, Wu B et al. (2016) individual cell types within a tissue. Dev Cell
Lineage-specific and single-cell chromatin 18:1030–1040
accessibility charts human hematopoiesis and 30. Daugherty AC, Yeo RW, Buenrostro JD et al.
leukemia evolution. Nat Genet 48:1193–1203 (2017) Chromatin accessibility dynamics reveal
25. Picelli S, Björklund AK, Reinius B et al. (2014) novel functional enhancers in C. elegans.
Tn5 transposase and tagmentation procedures Genome Res 27:2096–2107
for massively scaled sequencing projects. 31. Schep AN, Buenrostro JD, Denny SK et al.
Genome Res 24:2033–2040 (2015) Structured nucleosome fingerprints
26. Lu Z, Hofmeister BT, Vollmers C et al. (2017) enable high-resolution mapping of chromatin
Combining ATAC-seq with nuclei sorting for architecture within regulatory regions.
discovery of cis-regulatory regions in plant gen- Genome Res 25:1757–1770
omes. Nucleic Acids Res 45:e41 32. Cusanovich DA, Reddington JP, Garfield DA
27. Maher KA, Bajic M, Kajala K et al. (2018) et al. (2018) The cis-regulatory dynamics of
Profiling of accessible chromatin regions across embryonic development at single-cell resolu-
multiple plant species and cell types reveals tion. Nature 555:538–542
common gene regulatory principles and new 33. Cao J, Cusanovich DA, Ramani V et al. (2018)
control modules. Plant Cell 30:15–36 Joint profiling of chromatin accessibility and
28. Bajic M, Maher KA, Deal RB (2018) Identifi- gene expression in thousands of single cells.
cation of open chromatin regions in plant gen- Science 361:1380–1385
omes using ATAC-seq. Methods Mol Biol 34. ENCODE Project Consortium (2012) An
1675:183–201 integrated encyclopedia of DNA elements in
29. Deal RB, Henikoff S (2010) A simple method the human genome. Nature 489:57–74
for gene expression and chromatin profiling of
Chapter 2

Mapping Nucleosome Location Using FS-Seq


Barry Milavetz, Brenna Hanson, Kincaid Rowbotham, and Jacob Haugen

Abstract
The organization of nucleosomes in eukaryotic chromatin is thought to play a critical role in the regulation
of the biological function of the chromatin. Because of this potential role in regulation, a number of
techniques have been developed, which combine chromatin fragmentation around nucleosomes with next-
generation sequencing to map the location of nucleosomes in chromatin. In this section, a procedure using
a kit from New England Biolabs (NEB NEXT Ultra II FS DNA library prep Kit) to fragment chromatin in
preparation for next-generation sequencing is described and compared to other available procedures for
mapping nucleosome location.

Key words NGS, Nucleosomes, Chromatin, Sequencing, Phasing

1 Introduction

It has been known for many years that eukaryotic DNA is found
within the nucleus of a cell organized with histones to form chro-
matin. The basic building block of chromatin is the nucleosome,
which consists of approximately 145 base pairs of DNA wrapped
around a histone octamer core containing two copies each of
histone H2A, H2B, H3, and H4. As shown in Fig. 1 for the
eukaryotic virus Simian Virus 40 (SV40), the nucleosomes typically
appear as “beads on a string” in chromatin.
Figure 1 also shows a short region of DNA, indicated by an
arrow, that appears to lack at least one nucleosome. This region of
“naked” DNA is found in the SV40 regulatory region [1–3] and for
obvious reasons it has been referred to as a “nucleosome-free
region” (NFR). The presence of a specialized chromatin structure,
such as the NFR found in SV40 chromatin, which is characterized
by specific nucleosome location and/or histone modifications,
appears to be a general characteristic of genes that are poised for
transcription or are actively transcribing [4].

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_2,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

21
22 Barry Milavetz et al.

Fig. 1 SV40 minichromosome showing the “beads on a string” structure of


chromatin. The arrow indicates the location of a nucleosome-free region. SV40
chromosomes were prepared and analyzed by electron microscopy [20]

Initially, the presence of modified chromatin structure in regu-


latory regions of cellular genes was determined by differences in
susceptibility to nucleases such as DNase I [5], since naked DNA
would be expected to digest more quickly than DNA present in
nucleosomes. While nuclease sensitivity has been a valuable tool to
map the location of putative regulatory regions, it is relatively a
low-resolution tool. However, with the development of robust
next-generation sequencing (NGS) techniques, it is now possible
to directly map the location of nucleosomes in chromatin to deter-
mine whether there is an NFR generated or whether the initiation
of transcription results in other changes in nucleosome location.
The NGS workflow in Fig. 2 consists of four steps: fragmenta-
tion of chromatin into nucleosomes, preparation of DNA libraries
from the DNA in the nucleosome-sized chromatin fragments, NGS
sequencing of the libraries, and bioinformatic analysis of the
sequencing data to determine the location of the DNA fragments
present in the library.
One of the keys to the NGS workflow is the procedure used to
fragment the DNA into nucleosomes. Originally, chromatin was
fragmented by micrococcal nuclease and the strategy was referred
to as MN-Seq [6]. Fragmentation of chromatin by micrococcal
nuclease is based on the fact that the micrococcal nuclease is a
double strand specific endonuclease that would be expected to
cleave DNA in the linker region of chromatin without digesting
the DNA present in a nucleosome [6]. While this is generally true,
the procedure has two major disadvantages. First, the specificity of
micrococcal nuclease for linker region DNA is relative and because
of this, it is necessary to titrate the amount of nuclease, temperature
of digestion, and length of digestion to optimize the generation of
Mapping Nucleosome Location Using FS-Seq 23

3.5

2.5

2
Reads

.5

0.5

0
1087
1268
1449
1630
1811
1992
2173
2354
2535
2716
2897
3078
3259
3440
3621
3802
3983
4164
4345
4526
4707
4888
5069
182
363
544
725
906
1

Nucleotide number

Fig. 2 Workflow for mapping nucleosomes using chromatin fragmentation and next-generation sequencing.
Blue circle = nucleosome; blue rectangle = adapter 1; gold rectangle = adapter 2

nucleosome-sized fragments [6]. Second, for reasons that are not


completely understood, some nucleosomal DNA is much more
sensitive to digestion than other nucleosomal DNA [6].
A second strategy for mapping nucleosomes is known as “Assay
for Transposase Accessible Chromatin” (ATAC)-Seq. ATAC-Seq
uses a transposase (Tagment from Illumina) in vitro to fragment
the chromatin by targeting open regions of the chromatin [7]. With
this procedure the chromatin is fragmented in linker regions and
libraries are prepared in one step, since the transposase introduces
the linkers needed for library amplification and sequencing. The
major disadvantage of the ATAC-Seq procedure is that transposi-
tion is very sensitive to higher order chromatin structure and
because of this, the nucleosomes generated tend to be located in
the regulatory region of active genes [7]. This result has led to
ATAC-Seq being used as an assay for open chromatin [7].
24 Barry Milavetz et al.

Chromatin Immunoprecipitation sequencing (ChIP-Seq) has


also been used to map nucleosomes but primarily for those nucleo-
somes containing a specific form of histone modification [8]. In this
procedure, antibody is used to target an epitope on a histone, the
chromatin is fragmented typically by sonication and the fragments
containing the epitope separated from other fragments. Again
libraries are prepared from the former and sequenced. The major
disadvantage with this procedure is that it is specific to the antibody
being used. Frequently, this results in only a small number of
nucleosomes being mapped in a particular target chromatin.
We have recently described a fourth method for mapping
nucleosomes in viral chromatin based on a kit (New England
Biolabs FS) that utilizes a proprietary procedure for fragmenting
chromatin [9]. We have used this procedure to map the location of
nucleosomes in SV40 chromatin and found that the procedure
(FS-Seq) yields nucleosome maps similar to but not identical to
the maps that we obtain using either MN-Seq, ATAC-Seq, or
ChIP-Seq. In this chapter, we describe the procedures that we
have used for mapping nucleosomes by FS-Seq.

1.1 Basic Protocol This protocol describes procedures for fragmenting SV40 chroma-
for Preparing tin using the proprietary reagents in the New England Biolabs
Sequencing Libraries NEXT Ultra II FS DNA library prep Kit and for preparing DNA
from Chromatin sequencing libraries from the fragmented chromatin using the New
Fragmented Using the England Biolabs NEXT Ultra II DNA library prep Kit and E7335S
FS Kit Mulitplex oligos for Illumina. The protocol includes a procedure
using submerged agarose gel electrophoresis to select and purify
the subset of the library members that contain insert fragments of
SV40 DNA sized from approximately 60–200 base pairs.

2 Materials

1. NEB library kit NEXT Ultra II FS DNA library prep Kit.


2. NEB library kit NEXT Ultra II DNA library prep Kit and
Mulitplex oligos for Illumina (e.g., E7335S).
3. AMPure XP (Beckman Coulter, #A63880).
4. Ethanol (Sigma-Aldrich, #459844).
5. Nuclease-free water (Ambion, #fAM9937).
6. Certified low-Melt Agarose (Bio-Rad, #1613101).
7. Certified low-Melt Agarose (Bio-Rad, #1613112).
8. Agarose (Sigma-Aldrich A6877).
9. DNA Size Markers.
10. SsoAdvanced Universal SYBR Green Supermix (Bio-Rad,
#172-5274).
Mapping Nucleosome Location Using FS-Seq 25

11. Primers to target genomic DNA.


12. Agarose gel sample buffer (see Subheading 3.4).
13. GelGreen Nucleic Acid Stain (EmbTec, EC-1995).
14. Zymoclean Gel DNA Recovery Kit (Zymo Research, #D400).
15. Monarch PCR & DNA Cleanup Kit (New England BioLabs
#T1030S).
16. MiSeq Reagent Kit v3 (150 cycle).
17. 10 μL Graduated Filter Tips.
18. 2 μL Pipetman.
19. 10 μL Pipetman.
20. 200 μL Graduated Filter Tips.
21. 200 μL Pipetman.
22. 1000 μL Filter Tips.
23. 1000 μL Pipetman.
24. 200 μL thin-wall PCR Tubes (VWR, #20170-012).
25. Eppendorf MasterCycler Personal PCR.
26. BioRad CFX Connect real-time PCR system.
27. Eppendorf snap-cap microcentrifuge flex tubes (Fisher Scien-
tific, #022264111).
28. Minicentrifuge LabDoctor 12 (MidSci).
29. Savant DNA 120 SpeedVac concentrator.
30. Power supply Bio-Rad Power Pac 3000 at 125 constant volts.
31. Submerged Agarose Gel Apparatus.
32. EmbiTec PrepOne Sapphire.

3 Methods

3.1 Fragmentation of 1. We have previously described in detail the procedures that we


SV40 Chromatin Using use to prepare various forms of SV40 chromatin including
the New England functionally active minichromosomes and the chromatin from
Biolabs NEXT Ultra II virus particles [10]. We have used similar procedures to suc-
FS DNA Library Prep cessfully prepare chromatin for analysis from other small DNA
Kit viruses including bovine papillomavirus (BPV) and human
papillomavirus (HPV).
2. In preparation of the fragmentation procedure, we first set the
block on the Eppendorf MasterCycler Personal PCR to 4 °C
and cooled the appropriate number of the thin-walled PCR
tubes for 10 min in the block. Typically, three or four samples
including controls are fragmented at the same time. When
working with a new source of chromatin, we would do three
26 Barry Milavetz et al.

samples at the same time: the first would be the input chroma-
tin, the second input chromatin with the supplied reaction
buffer, and the third input chromatin, reaction buffer and
enzyme.
3. We also set up a Monarch column for each sample and labeled
the columns and flex tubes to receive the purified DNA follow-
ing column purification of the fragmented chromatin.
4. Each PCR tube in a set then received 13 μL of SV40 chromatin
using the 200 μL Pipetman. The buffer control and enzyme
tubes then received 3.5 μL of the reaction buffer using the
10 μL Pipetman and the enzyme tube received an additional
1 μL of the enzyme mix from the FS kit using the 2 μL Pipet-
man. Finally, using the 10 μL Pipetman set to 10 μL, the liquid
in each tube was mixed by drawing the liquid into the 10 μL tip
followed by forcing the liquid back into the PCR tube.
5. Fragmentation was carried out for varying times and tempera-
tures to optimize the generation of fragments. Typically, frag-
mentation was done for 5–15 min at either 4 °C or 37 °C. The
lower temperature was tested first since at 4 °C nucleosomes
would not be expected to slide appreciably and the results
would be expected to most closely resemble the original orga-
nization of nucleosomes in the chromatin. However, if only
limited fragmentation occurred, we then tested the higher
temperature.
6. Following fragmentation, the reaction in each PCR tube was
stopped by the addition of 100 μL of binding buffer (Monarch
kit) using the 200 μL Pipetman. The binding buffer and sample
in each tube was mixed as above using the 200 μL Pipetman,
and then added to the Monarch column. The DNA was pur-
ified according to the protocol and reagents supplied in the
Monarch kit. The purified bound DNA was eluted with 13 μL
of nuclease-free water (Ambion) and stored at -20 °C until
used in subsequent steps.
7. The extent of fragmentation in each of the samples was deter-
mined by qPCR. The assay is based on the idea that PCR will
only amplify a particular region of DNA if the DNA is intact. By
comparing the amount of DNA amplification product from the
untreated sample to the amount of product following addition
of reaction buffer and a mixture of reaction buffer and enzyme
mix, it is possible to determine the extent of fragmentation that
occurred in the buffer alone or with the mixture of buffer and
enzymes at each of the fragmentation conditions. In order to
analyze SV40 chromatin, we have a number of sets of primers,
all of which yield amplifications products between 200 and
400 base pairs in size. The primer sets recognize various
regions of the SV40 genome including the regulatory region,
Mapping Nucleosome Location Using FS-Seq 27

coding region, and between the coding regions. We will some-


times compare the extent of fragmentation in the regulatory
region to other genomic sites to determine if the regulatory
region is hypersensitive to fragmentation in certain forms of
SV40 chromatin. Routinely, we use the set of primers that are
found between the two coding regions, 5′ AAAATGAA
GATGGTGGGGAGAAGAA 3′ and 5′ GACTCGAGGT
GAAATTTGTGATGCT 3′, which recognize a fragment
approximately 250 base pairs in size. We prepare a master
PCR mix containing 10 μL of 2X BioRad amplification buffer
(SsoAdvanced Universal SYBR Green Supermix), 0.2 μL of
each primer (100 μM), and nuclease-free water to 20 μL total
volume per sample to be analyzed. A 1 μL of the Monarch
purified sample DNA is added to a 20 μL volume of the PCR
mix and the tubes placed in the BioRad CFX Connect real-time
PCR system. The mixture in each PCR tube is preheated in the
PCR machine for 10 min to activate the DNA polymerase and
then the DNA is amplified for 35 cycles at 95 °C for 1 min to
denature the DNA, 54 °C for 1 min to anneal the DNA, and
72 °C for 1 min DNA extension.
8. Following the completion of the PCR amplification, the cycle
threshold for the untreated sample is compared to the
corresponding cycle threshold for the sample containing only
the reaction buffer and the sample containing the reaction
buffer and enzyme mix. If typical fragmentation is occurring
with the FS kit, we would expect to observe from no change to
a one cycle increase in threshold for the sample containing only
buffer and a two to four cycle increase in threshold for the
sample containing buffer and enzyme mix. The changes in cycle
threshold for the sample containing the buffer and enzyme in
the example above would indicate that approximately 75–90%
of the input chromatin has been fragmented.

3.2 Preparation of 1. The fragmented DNA obtained using the FS kit was then used
Sequencing Libraries for the preparation of sequencing libraries using an NEB
from FS Fragmented NEXT Ultra II DNA library prep Kit designed for sequencing
DNA Using the New on an illumina sequencing platform. All biochemical manipula-
England Biolabs NEXT tions associated with the preparation of libraries with this kit
Ultra II DNA Library were performed in a BSL-II hood. An Eppendorf MasterCycler
Prep Kit Personal PCR located in the hood was set to a block tempera-
ture of 4 °C and a lid temperature of 65 °C. With the heated lid
up, sterile thin-walled PCR tubes being used for library prepa-
ration were placed in the 4 °C block of the cycler and cooled for
at least 10 min. For most library preparations, we generated
eight libraries at one time. At the same time that the cycler was
being set up, we added 11 μL using the 10 μL Pipetman of
adapter dilution buffer (NEB) to another PCR tube and placed
this tube in a -20 °C freezer for later use.
28 Barry Milavetz et al.

2. Libraries were prepared according to the protocol supplied


with the NEXT Ultra II DNA library prep Kit with minor
modifications. In the first step of the protocol, the fragmented
DNA obtained from chromatin was end-repaired using an
enzyme mix supplied with the kit. The purified DNA from
the fragmentation step was diluted to 25 μL total volume
using the 200 μL Pipetman with nuclease-free water and
added to a thin-wall PCR precooled in the cycler block. To
each PCR tube, we then added 3.5 μL of repair buffer with the
10 μL Pipetman and 1.5 μL of repair enzymes using the 2 μL
Pipetman both of which were supplied in the kit. The tempera-
ture of the cycler block was raised to 20 °C and the DNA and
reagents were incubated for 30 min. The temperature was then
raised to 65 °C for 30 min to inactivate enzymes with the lid
closed to keep the temperature even throughout the tubes
followed by opening of the lid and cooling the block to 4 °C.
3. The cooled PCR tubes were centrifuged in the LabDoctor
12 minicentrifuge in the hood to ensure that all the liquid in
each tube was located at the bottom of the tubes to maintain
the proper concentration of reactants. The tubes were returned
to the block in the cycler at 4 °C. The next step in the protocol
from the kit was the ligation of adapters onto the ends of the
end-repaired DNA present in each of the tubes.
4. Using the 2 μL Pipetman, 0.5 μL of adapter was added to the
thawed 11 μL of adapter dilution buffer placed into the cycler
block and thoroughly mixed. Next, using the 10 μL Pipetman,
2.5 μL of adapter was added to each sample tube, followed by
1 μL of enhancer again using the 2 μL Pipetman. Finally, using
the 200 μL Pipetman, 15 μL of ligation mix from the kit was
added with thorough mixing to each tube, and tubes were
incubated in the block for 15 min at 20 °C.
5. The tubes in the block were cooled to 4 °C for the addition of
the USER enzyme. Using the 2 μL Pipetman, 1.5 μL of USER
(from the NEB primer kit) was added to each tube, the tubes
mixed using the 10 μL Pipetman set to 10 μL, and incubated at
37 °C for 15 min. At the end of this incubation, the tubes were
cooled to 4 °C, centrifuged in the LabDoctor 12 minicentri-
fuge, and stored in a -20 °C freezer before purification.
6. The libraries were then purified through a series of steps includ-
ing column purification, submerged agarose gel size selection,
and AMPure purification prior to sequencing. The libraries
were column purified using the Monarch kits in part to con-
centrate the libraries. The frozen libraries were thawed to room
temperature in a BSL-II hood, 250 μL of binding buffer
(Monarch kit) was added using a 1 mL Pipetman with mixing,
and the library in binding buffer was added to the column. The
Mapping Nucleosome Location Using FS-Seq 29

sample was centrifuged for 1 min in the LabDoctor 12 mini-


centrifuge and then the bound DNA was washed with 194 μL
of wash buffer (from the kit). The library DNA was eluted with
25 μL of nuclease-free water (200 μL Pipetman) and collected
in a sterile flex tube. The eluted libraries were then dried using a
Savant DNA 120 SpeedVac concentrator set to heat off and
slow dry. This typically took approximately 40 min.
7. Each library was size-selected on a mixture of low-melting
temperature agarose and standard agarose in order to maximize
the number of library members that had the correct size inserts.
For a 50 mL total gel volume, which was used in our gel
apparatus, 0.9 g of low-melt agarose (Certified low-Melt Aga-
rose, Bio-Rad, #1613112) and 0.1 g of regular agarose
(Certified low-Melt Agarose, Bio-Rad, #1613101) was added
to a 100 mL bottle. For each analysis, 350 mL of buffer was
prepared in a 500 mL bottle by diluting a 50X stock buffer (see
Subheading 3.4) kept at 4 °C. A total of 50 mL of the running
buffer was added to the agarose in the bottle and the mixture
was heated in a microwave until all of the agarose dissolved.
When completely dissolved, 2 μL of GelGreen Nucleic Acid
Stain was added to the agarose and the gel was then poured
into the gel apparatus.
8. When the agarose was solidified (typically approximately 1 h at
room temperature), the gel was covered in running buffer and
the sample wells were loaded. Typically, a gel contained 10 sam-
ple well and was loaded with size markers in lane 2, lane 6, and
lane 10. For our work, we use PCR amplification products
approximately 120 and 300 base pairs in size. The libraries to
be size-selected were loaded into lanes 4 and 8 by adding 10 μL
of sample buffer to the dried libraries to resuspend the DNA
and then the liquid placed in the sample well. The samples and
size markers were electrophoresed for approximately 1 h and
15 min with the voltage set to 125 volts or until the blue dye in
the sample buffer was at the end of the gel.
9. The gel was removed from the electrophoresis apparatus and
placed onto the EmbiTec PrepOne Sapphire. The Sapphire was
turned on to illuminate the gel and the center of the bands in
the marker lanes were sliced with a blade. The Sapphire was
turned off and the gel transferred to paper towels for cutting
out of the libraries. Using the slices in the DNA marker lanes as
guides the portion of the gel from the library lanes that
contained the same-sized DNA were cut out. The portion of
the gel containing the correct-sized DNA was sliced into small
pieces and added to 800 μL of binding buffer from the Zymo-
clean Gel DNA Recovery Kit in a flex tube. The tube was
shaken repeatedly until all of the agarose gel had dissolved.
30 Barry Milavetz et al.

10. The dissolved library was added to the Gel DNA recovery
column and purified according to the protocol supplied by
the kit. Following the required washes, the library DNA was
eluted in 25 μL of nuclease-free water. The eluted library was
dried as above in the Savant DNA 120 SpeedVac concentrator.
11. The dried library was resuspended in 5 μL of nuclease-free
water in the preparation for PCR amplification with appropri-
ate primers. Libraries were amplified in a total volume of
160 μL of amplification buffer. The buffer was prepared by
adding 80 μL using a 200 μL Pipetman of 2X SsoAdvanced
Universal SYBR Green Supermix, 1.6 μL of universal primer
(NEB Mulitplex oligos for Illumina) using a 2 μL Pipetman,
1.6 μL of an indexed primer using a 2 μL Pipetman (NEB
Mulitplex oligos for Illumina), and 80 μL nuclease-free water
using a 200 μL Pipetman.
12. Following thorough mixing of the amplification buffer, a
10 μL aliquot was transferred to a PCR tube with the 200 μL
Pipetman to be used as a non-DNA control. A total of 2.5 μL
of the library DNA was added to the remaining 150 μL of
amplification buffer using the 10 μL Pipetman and the DNA
was thoroughly mixed in the buffer. A 10 μL aliquot was
removed with the 200 μL Pipetman and placed into a PCR
tube. The remaining 140 μL of amplification buffer was stored
in a freezer at -20 °C until needed. The non-DNA control and
library DNA PCR tubes were then placed in a BioRad CFX
Connect real-time PCR system and amplified using 1 min
cycles of 60°, 72°, and 95°. Following amplification the peak
for the amplification containing the library DNA was deter-
mined from the cycle threshold data generated and the remain-
ing 140 μL was divided into four aliquots of approximately
35 μL each using a 200 μL Pipetman and then amplified to the
cycle threshold empirically determined.
13. Following amplification the amplified library DNA was purified
by AMPure. All manipulation of the amplified libraries was
performed in a BSL-II hood. The amplification buffer in the
four tubes were combined and 95 μL of AMPure was added to
the tube using a 200 μL Pipetman, the contents mixed thor-
oughly and then transferred to flex tube. The combined con-
tents were incubated at room temperature for 10 min to allow
the library DNA to bind to the AMPure beads.
14. Following the incubation the tube was centrifuged to ensure
that the contents were all at the bottom of the tube, and the
tube was placed into a magnetic stand to separate the magnetic
beads with bound DNA from the DNA-depleted liquid. The
stand was placed on its side while the magnetic beads were
bound so that the bound beads would be located
Mapping Nucleosome Location Using FS-Seq 31

approximately half way up the tube and not at the bottom. This
was done so that when the liquid was removed there was less
chance that the beads would be dislodged. After a 10-min
incubation to allow the beads to be separated from the liquid,
the stand was placed upright in order to allow the liquid to
collect at the bottom of the tube. The DNA-depleted liquid
was removed using a 200 μL Pipetman set to 200 μL, being
careful not to dislodge any of the magnetic beads.
15. The beads on the side of the tube were washed twice with
400 μL of a wash solution consisting of 80% ethanol and 20%
nuclease-free water, which was prepared right before use using
a 1 mL Pipetman. Following the removal of the second wash,
the beads were air-dried for 10 min in the hood.
16. The tube containing the magnetic beads was removed from the
magnetic rack and 16 μL of nuclease-free water was added
using a 10 μL Pipetman set to 8 μL. The beads and water
were thoroughly mixed by vortexing and incubated for
10 min at room temperature. Following the incubation, the
mixture was centrifuged in a Minicentrifuge LabDoctor 12 for
10 s to force the beads and liquid to the bottom of the flex
tube. The tube was then placed back into the magnetic stand
and incubated for an additional 5 min to allow the beads to
bind to the side of the bottom of the tube and separate from
the nuclease-free water that contains the eluted library DNA.
17. A 12 μL aliquot of the nuclease-free water containing the
library DNA was very carefully removed from the tube using
a 10 μL Pipetman set to 6 μL and placed in a new sterile flex
tube. This aliquot is stored in the freezer at -20 °C and would
be submitted for DNA sequencing if it meets our quality
control. The remaining aliquot of the library (4 μL) is also
stored in the freezer and eventually analyzed by submerged
agarose gel electrophoresis to determine the size and amount
of nucleosome-sized DNA in the library.
18. The quality of the amplified libraries was determined using
submerged agarose gel electrophoresis. In preparation for anal-
ysis of the library DNA, we prepared running buffer and an
agarose gel. The running buffer was prepared by adding 7 mL
of a 50X stock TAE buffer to a 500 mL bottle and adding
350 mL of distilled purified water. To identify the location of
DNA in the gel, 17.5 μL of ethidium bromide was added to the
running buffer using a 200 μL Pipetman. In a 100 mL bottle,
we added 1.4 g agarose (Sigma-Aldrich) and 50 mL of the
running buffer and heated the mixture in the microwave to
dissolve the agarose. When the agarose was completely dis-
solved, we added 2.5 μL of the ethidium bromide solution,
swirl the agarose, and pour it into the gel apparatus.
32 Barry Milavetz et al.

19. When the gel has cooled and hardened, it is covered with
running buffer. The library sample is suspended in 10 μL of
sample buffer using a 10 μL Pipetman and added to a sample
well. In a well adjacent to the library sample, we added a DNA
marker and subject the samples to submerged electrophoresis
for approximately 1 h and 15 min with the voltage set to
125 volts. The gel is removed and the DNA present in the
gel visualized on a LiCor Oddyssey FC. A high-quality library
would be expected to show only a fairly tight band around the
size of a nucleosome with added adapters at approximately
250 base pairs in size.
20. Libraries that are of sufficient quality are then used for DNA
sequencing.
21. Libraries are sequenced on an Illumina MiSeq using a MiSeq
Reagent Kit v3 (150 cycle) in the sequencing core at the
University of North Dakota. Typically, 20–25 individual
libraries are sequenced at the same time. Because of the small
size of the SV40 genome, this usually results in enough reads
per library to adequately cover the genome.

3.3 Bioinformatic 1. Following sequencing of libraries, the data files generated are
Analyses analyzed using standard bioinformatics software. First, the
FASTQ files generated by sequencing are subjected to an initial
quality control analysis using FASTQC v.0.11.2 [11]. Second,
the adapters attached to the ends of the insert DNA during the
preparation of the libraries were removed using Scythe v0.98
[12]. Third, quality trimming was performed using Sickle
v1.33 [13], and readings with a Phred score of less than
30 and reads smaller than 45 base pairs were discarded. Fourth,
the reads corresponding to African green monkey (Chlorocebus
sabaeus 1.1) and human (hg19) sequences were removed fol-
lowing alignment to their respective genomes. While we con-
tinue to do this we have found that it has little effect on the
actual virus reads. Fifth, the reads present in the FASTQ files
remaining after these treatments were then aligned to the SV40
genome (RefSeq ACC: NC_001669.1) cut between nucleo-
tides 2666 and 2667 using Bowtie2 v2.2.4 [14]. Cutting the
genome was necessary to display the data as a linear map
because the SV40 genome is normally found as a circle. Sixth,
duplicate reads were removed using the Picard Tools (Broad)
Mark Duplicates function. Seventh, bam files were generated
using an awk script from each biological library replicate with
filtering for specific size ranges of the DNA. Nucleosome-sized
DNA was identified using filtered reads from 100 to 150 base
pairs in size, while potential transcription factor binding sites
were identified using reads filtered between 60 and 99 base
pairs in size.
Mapping Nucleosome Location Using FS-Seq 33

2. The bam files generated are displayed for comparison purposes


as merged heatmaps. Typically, a minimum of four libraries
generated from different biological replicates are sequenced
to generate each heatmap. The individual bam files are normal-
ized first so that all are weighted equally using Samtools v1.3.1
[15] and then merged using the R programming language.
Bedgraphs are normalized to 1X coverage from filtered dedu-
plicated reads using DeepTools v2.5.4 [16]. Finally, heatmaps
were generated from the Z-scores of the normalized coverage
and displayed using IGV v2.3.52 [17].

3.4 Reagents and Ethidium Bromide Stain (0.5 μg/μL) 50 mg ethidium bromide,
Solutions add water to 100 mL.
TAE stock (50X) 242 g of tris base dissolved in 750 mL water.
Add 57.1 mL glacial acetic acid and 100 mL EDTA. Adjust final
volume to 1 L. Bring the pH to 8.5.
TAE electrophoresis running buffer to 20 mL TAE (50X) stock
buffer, add 980 mL water.
Agarose gel sample buffer 6.0 mL 10% SDS, 2.0 mL of 0.1 M
EDTA, 50 mL glycerol 1% Coomassie blue, and water to 100 mL.

4 Notes

Comparison of FS-Seq to other methods for mapping nucleosomes.


A comparison between the mapping results obtained from
FS-Seq and other mapping techniques for the SV40 chromatin
found in virus particles is shown in Fig. 3.

Fig. 3 Mapping nucleosomes using FS-Seq, ATAC-Seq, MN-Seq, and ChIP-Seq. The chromatin was obtained
from SV40 virus particles and analyzed by each of the different procedures as previously described [9]. ChIP-
Seq is shown using antibody targeting H3K9me3 and targeting H4K20me1 in nucleosomes
34 Barry Milavetz et al.

The figure compares the location of nucleosomes using FS-Seq


to ATAC-Seq, MN-Seq, and ChIP-Seq using antibodies that rec-
ognize nucleosomes containing H3K9me3 and H4K20me1. It is
apparent that many of the brighter yellow bands, which represent
favored nucleosome locations in the chromatin, are present in a
number of mapping techniques. For example, a very bright band is
located in the enhancer using the FS, ATAC, and ChIP-Seq with
antibody to H3K9me3 procedures. Similarly, many of the less
bright bands also appear to be present in the maps obtained by
more than one technique. We include the results of ChIP-Seq using
antibodies to H3K9me3 and H4K20me1 to demonstrate that the
pattern of nucleosomes obtained with FS-Seq most likely represents
the nucleosome locations in the major form of SV40 chromatin
that was present in the virus particles since the FS pattern closely
resembles the ChIP-Seq result with H3K9me3 antibody, which is
present in a significant fraction of viral chromatin [18] and does not
resemble as much the result with H4K20me1, which appears to be
present in only a small amount of the chromatin.
The figure also shows that each mapping technique has pre-
ferred sites of action. As is well known, ATAC-Seq targets open
chromatin, which typically is found in the regulatory region of
active genes [7]. In marked contrast, MN-Seq can under-represent
nucleosomes located in regulatory regions if digestion occurs at
higher temperatures or for longer periods of time [6]. This has led
to the suggestion that nucleosomes in certain regulatory regions
are “fragile” due to the histone modifications are associated factors
that allow them to be targeted by MN-Seq more efficiently than the
rest of the nucleosomes in a gene [6]. Of course, ChIP-Seq results
will vary with the antibody used, since the nucleosomes containing
the antibody target is likely to be present at preferred sites in the
genome. Based on our experience with viral chromatin, FS-Seq
appears to generate maps that are most similar to ATAC-Seq,
with more nucleosomes present, or ChIP-Seq for target histone
modifications that are present more or less throughout the
genome.
The FS-Seq procedure is relatively rapid and simple. There are a
minimal number of manipulations and the overall procedure can be
completed over a 30 min to 1 h timeframe depending upon the
fragmentation time. The FS kit is designed as a one-step procedure
for fragmenting DNA and preparing libraries. The kit accomplishes
this by inactivating the fragmenting enzymes during an incubation
at 65 °C for 30 min. Following inactivation, adapters are ligated to
the fragmented DNA as in a usual NEXT Ultra procedure. We
chose not to do this because we were concerned that heating the
chromatin–enzyme mixture to 65 °C for 30 min would be likely to
result in over-fragmentation of the viral chromatin.
We have not used FS-Seq to map the location of nucleosomes
in cellular chromatin. However, we believe that a workflow similar
Mapping Nucleosome Location Using FS-Seq 35

to that used for ATAC-Seq would likely work with FS-Seq for this
purpose as well. This workflow would consist of preparing nuclei
from cells followed by resuspension of the nuclei in reaction buffer
and addition of fragmentation enzymes to allow for fragmentation.
Fragmentation would be assayed by qPCR measurement of the
amount of a target gene or genes that is found in the buffer
following incubation at different temperatures or times. Since
FS-Seq is enzyme-based like ATAC-Seq and the two procedures
yield similar results with viral chromatin, it seems likely that it
would also tend to target open chromatin and might be an alterna-
tive strategy for analyzing open chromatin.
Specific considerations when preparing fragmented chromatin for
FS-Seq.
In working with viral chromatin we always determine the rela-
tive amount of DNA present in a sample using qPCR. In order to
ultimately obtain useful sequencing data following FS-Seq, we have
experimentally determined that for a genome that is SV40 in size
(5243 base pairs), we need the input chromatin to have a cycle
threshold of less than 20 cycles. This is due to the fact that the
fragmentation by FS typically results in a shift of the cycle threshold
to around 25 cycles. As noted below as long as the amount of DNA
in the fragmented samples is in this range, useful libraries can be
prepared. With larger genomes it is likely that in order to obtain
sufficient coverage of the genome, more input chromatin will
probably be needed. With SV40 used this way, we can obtain
anywhere from around 500 reads per library sample to 5000
reads. Since there are only about 24 nucleosomes in SV40, this is
sufficient coverage.
Based on the relatively large numbers of samples analyzed, we
have noted that occasionally we will have a sample of disrupted
virus that does not appear to fragment very well. At this time we do
not know why this is the case, because we have successfully frag-
mented a number of other samples of chromatin from disrupted
virus. We believe that this may be due to the presence of inhibitors
remaining with the chromatin. For example, we use a high concen-
tration of dithiothreitol to disrupt the virus particles and this may
be the reason for the problem. We have not noted this issue with
other samples that were not prepared in the presence of dithio-
threitol. We are presently investigating whether the dithiothreitol is
responsible for this inhibition and if so whether there are alternative
ways to purify the chromatin from disrupted virus to prevent this
inhibition. This observation shows the importance of at least initi-
ally using the two controls listed, chromatin alone and chromatin
with buffer, since with qPCR of the samples it is possible to quickly
determine the extent of chromatin fragmentation. We have also
observed that with some samples, but not all we observed,
36 Barry Milavetz et al.

approximately a 1 cycle reduction in the sample that contains only


added buffer. We believe that this is most likely due to the activation
of endogenous nucleases that are present in the biological prepara-
tions. This is somewhat variable and appears to depend on a num-
ber of parameters including the time point in SV40 infection and
the number of passages that the cells used for SV40 infections have
undergone. Typically, in these situations we would still observe a 2-
to 3-cycle increase in the cycle threshold number in the sample
containing buffer and enzyme compared to buffer alone.
For our fragmentation studies, we have usually tried to use the
lowest incubation temperature possible to try to limit any natural
movement of nucleosomes since nucleosome movement might
occur at higher temperatures. However, we have not studied this
closely and do not know if it is an important consideration.
Specific considerations when preparing libraries following FS frag-
mented chromatin.
We have found that as long as the cycle threshold of the
fragmented chromatin is less than or equal to approximately
25 cycles, we can obtain high-quality libraries. We judge this by
analyzing the quality of the library as described in the methods
section by agarose gel electrophoresis and looking for a single
relatively sharp band in the region of the gel where we would expect
to find nucleosome-sized DNA fragments with attached adapters.
When fragmenting a new form of chromatin (such as a new virus),
we generally will amplify the prepared library after column purifica-
tion and analyze it on a gel similarly. In this case we are looking for a
smear of amplification products from the correct size to a size
significantly larger but with a clear increase in the number of
products at the correct size. If the products are all larger than the
correct size, it indicates that fragmentation was not done for long
enough or at a high enough temperature. We would redo the
fragmentation taking this into account and adjust the fragmenta-
tion conditions accordingly. When the cycle threshold is greater
than 25 cycles, we find that the libraries sometimes meet our quality
requirements but in some cases do not. We have observed that if a
library does not show a single broad band of the correct size, but
instead shows multiple bands or a smear of library elements, it is
unlikely to yield good sequencing data. For those libraries we have
found it better to redo the fragmentation and library preparation
with more or a different sample instead of sequencing the available
library.
We have found that the purification of the proper sized frag-
ments by submerged gel electrophoresis is an important step.
Without this step the fraction of DNA fragments of the proper
size is very small and most of the DNA sequenced is discarded
during the bioinformatics analysis because it is too large. It is
Mapping Nucleosome Location Using FS-Seq 37

important that the slice of gel used for purification is sufficiently


large to include the library elements containing inserts from
approximately 60–200 base pairs and not larger elements.
Specific considerations when analyzing sequencing data generated by
FS-Seq.
In our bioinformatic analysis, we include only inserts that are
from 100 to 150 base pairs for nucleosomes. It would be possible
to use reads that are somewhat larger but it makes it more difficult
to determine exactly where the center of the associated nucleosome
is supposed to be. This is because for the larger DNA inserts, it is
not possible to know whether the nucleosome is found in the
center of the DNA or at one of the ends.
We chose to display our data as normalized merged heatmaps.
We did this because in our studies on SV40, we noticed that one
characteristic of viral regulation in our system appeared to be
relatively small changes in nucleosome position [19] and thought
that the heatmaps showed this best. However, the data could also
be displayed as bedgraphs following normalizing and merging of
data. We have chosen to use normalizing and merging as a way to
remove some of the biological variability that appears in our system.
As indicated above, we also use a minimum of four biological
replicates in our studies so that the merged data is from at least
four samples. Frequently, we have used as many as ten biological
replicates if there appears to be variability. One place in the heat-
maps that this can be seen is in the width of the bands in the
heatmaps. When bands appear to be relatively broad, it probably
occurs at a site in the genome where there are more than one
preferred nucleosome locations.
In our recent publication [9] describing the mapping of
nucleosomes on the SV40 genome, we also describe how we used
the sequencing data from shorter reads to look for the location of
transcription factor binding. With the shorter read analysis, we
found a number of reads that corresponded to the position of
SP1 binding at its cognate sequence. Of course, since we are only
looking at reads, it is not possible to exclude the possibility that
there are other factors also bound at this site. Nevertheless, an
analysis of shorter reads using the FS-Seq sequencing data may
help to identify sites of interest for binding by factors in chromatin.

Acknowledgments

This work was funded by a grant from the National Institutes of


Health, AI142011 (to B.M.) The authors thank Ms. Corina Mur-
phy and New England Biolabs for the generous gift of reagents.
38 Barry Milavetz et al.

References
1. Scott WA, Wigmore DJ (1978) Sites in simian 12. Buffalo V (2011) Scythe: a Bayesian adapter
virus 40 chromatin which are preferentially trimmer
cleaved by endonucleases. Cell 15(4): 13. Joshi NA, Fass JN (2011) Sickle – a windowed
1511–1518 adaptive trimming tool for FASTQ files using
2. Varshavsky AJ, Sundin OH, Bohn MJ (1978) quality
SV40 viral minichromosome: preferential 14. Langmead B, Trapnell C, Pop M, Salzberg SL
exposure of the origin of replication as probed (2009) Ultrafast and memory-efficient align-
by restriction endonucleases. Nucleic Acids Res ment of short DNA sequences to the human
5(10):3469–3477. PubMed PMID: 214758; genome. Genome Biol 10(3):R25. https://
PMCID: 342688 doi.org/10.1186/gb-2009-10-3-r25.
3. Waldeck W, Fohring B, Chowdhury K, PubMed PMID: 19261174; PMCID:
Gruss P, Sauer G (1978) Origin of DNA repli- 2690996
cation in papovavirus chromatin is recognized 15. Li H, Handsaker B, Wysoker A, Fennell T,
by endogenous endonuclease. Proc Natl Acad Ruan J, Homer N, Marth G, Abecasis G,
Sci U S A 75(12):5964–5968. PubMed PMID: Durbin R, Genome Project Data Processing S
216004; PMCID: 393097 (2009) The sequence alignment/map format
4. Parmar JJ, Padinhateeri R (2020) Nucleosome and SAMtools. Bioinformatics 25(16):
positioning and chromatin organization. Curr 2078–2079. https://doi.org/10.1093/bioin
Opin Struct Biol 64:111–118. https://doi. formatics/btp352. PubMed PMID:
org/10.1016/j.sbi.2020.06.021 19505943; PMCID: PMC2723002
5. Weintraub H, Groudine M (1976) Chromo- 16. Ramirez F, Ryan DP, Gruning B, Bhardwaj V,
somal subunits in active genes have an altered Kilpert F, Richter AS, Heyne S, Dundar F,
conformation. Science 193(4256):848–856. Manke T (2016) deepTools2: a next genera-
https://doi.org/10.1126/science.948749 tion web server for deep-sequencing data anal-
6. Voong LN, Xi L, Wang JP, Wang X (2017) ysis. Nucleic Acids Res 44(W1):W160–W165.
Genome-wide mapping of the nucleosome https://doi.org/10.1093/nar/gkw257.
landscape by micrococcal nuclease and chemi- PubMed PMID: 27079975; PMCID:
cal mapping. Trends Genet 33(8):495–507. PMC4987876
https://doi.org/10.1016/j.tig.2017.05.007. 17. Robinson JT, Thorvaldsdottir H, Winckler W,
PubMed PMID: 28693826; PMCID: Guttman M, Lander ES, Getz G, Mesirov JP
PMC5536840 (2011) Integrative genomics viewer. Nat Bio-
7. Sun Y, Miao N, Sun T (2019) Detect accessible technol 29(1):24–26. https://doi.org/10.
chromatin using ATAC-sequencing, from prin- 1038/nbt.1754. PubMed PMID: 21221095;
ciple to applications. Hereditas 156:29. PMCID: PMC3346182
https://doi.org/10.1186/s41065-019- 18. Milavetz B, Kallestad L, Gefroh A, Adams N,
0105-9. PubMed PMID: 31427911; PMCID: Woods E, Balakrishnan L (2012) Virion-
PMC6696680 mediated transfer of SV40 epigenetic informa-
8. Park PJ (2009) ChIP-seq: advantages and chal- tion. Epigenetics 7(6):528–534. https://doi.
lenges of a maturing technology. Nat Rev org/10.4161/epi.20057. PubMed PMID:
Genet 10(10):669–680. https://doi.org/10. 22507897; PMCID: 3398982
1038/nrg2641. PubMed PMID: 19736561; 19. Kumar MA, Kasti K, Balakrishnan L, Milavetz
PMCID: PMC3191340 B (2018) Directed nucleosome sliding in SV40
9. Milavetz B, Haugen J, Rowbotham K (2020) minichromosomes during the formation of the
Comparing a new method for mapping nucleo- virus particle exposes dna sequences required
somes in simian virus 40 chromatin to standard for early transcription. J Virol. https://doi.
procedures. Epigenetics. 1–10. https://doi. org/10.1128/JVI.01678-18
org/10.1080/15592294.2020.1814487 20. Kube D, Milavetz B (1996) Differential regu-
10. Balakrishnan L, Milavetz B (2017) Epigenetic lation by SV40 T-antigen binding at site I
analysis of SV40 minichromosomes. Curr Pro- defines two distinct classes of nucleosome-free
toc Microbiol 46:14F 3 1–1F 3 26. https:// promoter. Anat Rec 244(1):28–32. https://
doi.org/10.1002/cpmc.35 doi.org/10.1002/(SICI)1097-0185
11. Andrews S (2010) FastQC: a quality control (199601)244:1<28::AID-AR3>3.0.CO;2-B
for high throughput sequence data
Chapter 3

Universal NicE-Seq: A Simple and Quick Method


for Accessible Chromatin Detection in Fixed Cells
Hang Gyeong Chin, Udayakumar S. Vishnu, Zhiyi Sun,
V. K. Chaithanya Ponnaluri, Guoqiang Zhang, Shuang-yong Xu,
Touati Benoukraf, Paloma Cejas, George Spracklin, Pierre-Olivier Estève,
Henry W. Long, and Sriharsa Pradhan

Abstract
Genome-wide accessible chromatin sequencing and identification has enabled deciphering the epigenetic
information encoded in chromatin, revealing accessible promoters, enhancers, nucleosome positioning,
transcription factor occupancy, and other chromosomal protein binding. The starting biological materials
are often fixed using formaldehyde crosslinking. Here, we describe accessible chromatin library preparation
from low numbers of formaldehyde-crosslinked cells using a modified nick translation method, where a
nicking enzyme nicks one strand of DNA and DNA polymerase incorporates biotin-conjugated dATP,
dCTP, and methyl-dCTP. Once the DNA is labeled, it can be isolated for NGS library preparation. We
termed this method as universal NicE-seq (nicking enzyme-assisted sequencing). We also demonstrate a
single tube method that enables direct NGS library preparation from low cell numbers without DNA
purification. Furthermore, we demonstrated universal NicE-seq on FFPE tissue section sample.

Key words Nicking enzyme, Methyl-dCTP, Biotin-14-dCTP/dATP, Nucleosome, Open chromatin


labeling, Epigenetic profiling, DNA library preparation, Formaldehyde-crosslinked cells

1 Introduction

Nucleosome-depleted regions on chromatin are accessible to tran-


scription factor and other chromatin-associated protein binding to
drive cellular processes. Early studies identified these nucleosome-
depleted regions being hypersensitive to DNase I and demon-
strated an association of these protein-depleted regions with gene
activation, and DNA methylation in eukaryotic organisms [1–
4]. This subsequently led to genome-wide mapping of DNase
hypersensitive sites (DHS), also known as “open chromatin” by
DNase-Chip or massive parallel DNA sequencing [5, 6]. Following
DNase-seq, another accessible chromatin determination method,

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_3,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

39
40 Hang Gyeong Chin et al.

FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Ele-


ments Sequencing), was used for isolation and sequencing of
nucleosome-depleted regions of the genome. This method relied
on formaldehyde-crosslinked chromatin sheared by sonication and
phenol–chloroform extraction of the accessible DNA in the aque-
ous phase that could be sequenced or hybridized to a DNA micro-
array [7]. However, both methods were cumbersome and needed
large numbers of cells. Recent improvements for DHS mapping
include the addition of circular carrier DNA to perform single cell
DNase I seq (scDNase I-seq), requiring an input of between 1 and
1000 cells. This technology revealed highly expressed genic regions
with multiple active histone marks displaying constitutive DNase I
hypersensitive sites among different single cell analysis data. How-
ever, in scDNase I-seq, the mappability of 1000 cells to the refer-
ence genome was low, ranging from 2% to 40% at the single cell
level [8]. Another complementary method, known as MNase-seq,
uses nonspecific endo- and exonuclease activities of micrococcal
nuclease to cleave protein-unbound regions of DNA on chromatin.
Here, DNA bound to histones or other chromatin-associated pro-
teins remain undigested and sequencing of these DNA fragments
yields genome-wide maps of bound proteins [9–11]. The undi-
gested DNA displays a mirror image of accessible chromatin.
In 2013, ATAC-seq (Assay for Transposase-Accessible Chro-
matin using sequencing) was introduced to study chromatin acces-
sibility genome-wide. The method was simpler and easier to use
compared with MNase-seq (sequencing of micrococcal nuclease
sensitive sites), FAIRE-seq, and DNase-seq. The basic principle
relied on a prokaryotic Tn5 transposon, which is loaded with
sequencing adapters creating an active dimeric transposome com-
plex. The complex can provide the cut accessible chromatin and the
simultaneous ligation of specific sequences. In the early days,
ATAC-seq also generated nonspecific amplification of nonnuclear
DNA, such as the mitochondrial genome, accounting for ~50% of
all reads [12]. Subsequent improvements to the method have led to
reduce mitochondrial DNA reads [13]. This application of ATAC-
seq to human cell lines and clinical samples has led to many land-
mark studies [14–19]. Here we report an improved, robust, and
sensitive method, nicking enzyme-assisted sequencing (univer-
sal NicE-seq), for epigenetic profiling of the mammalian chromatin
that can provide in-depth open versus closed chromatin sequence
information from limited amounts of formaldehyde fixed cells
[20, 21]. This method is suitable for identification of transcription
factor occupancy and complementary to other accessible chromatin
methods.
Universal NicE-Seq: A Simple and Quick Method for Accessible. . . 41

2 Materials

Prepare all solutions using ultrapure water (e.g., Milli-Q water or


equivalent by purifying deionized water, to attain a conductivity of
18 MΩ-cm at 25 °C) and analytical grade reagents. Prepare and
store all reagents at room temperature or at 4 °C (unless indicated
otherwise).

2.1 Harvesting and 1. HCT116 cells are cultured in McCoys 5A medium (Thermo
Crosslinking Cells Fisher Scientific #16600082) supplemented with 10% Fetal
Bovine Serum (GemCell #100-500).
2. TrypLE (Thermo Fisher Scientific #12605028, store at R/T
before use).
3. 50 mL conical falcon tubes and pipette tips for automatic pipet.
4. Cell culture flasks.
5. Trypan Blue Solution, 0.4% (Thermo Fisher Scientific
#15250061).
6. Hemocytometer and inverted microscope.
7. 1.5 mL Eppendorf tube for cell harvest; 1.5 mL DNA LoBind
tube (Eppendorf AG #022431021).
8. 16% formaldehyde (Thermo Fisher Scientific #28908).
9. 1X PBS (from 10X PBS, Gibco #70011-044).
10. 2.5 M Glycine (Sigma #G7126).
11. End-over-end bench top rotator (VWR #10136-084).

2.2 Accessible 1. Prepare Cytosolic Buffer: 15 mM Tris–HCl pH 7.5, 5 mM


Chromatin Labeling MgCl2, 60 mM KCl, 15 mM NaCl, 1% NP40, and 300 mM
sucrose. Add 0.5 mM fresh DTT before use (NEB #B7705S).
(Note: Use freshly made Cytosolic buffer; Cytosolic buffer can
be stored up to 2–3 week at 4 °C without the addition of DTT.
For microbial stability, it may be sterile filtered through
0.22 μM membrane).
2. Prepare a 10X dNTP mix: 240 μM dATP, 240 μM mdCTP
(NEB #N0356S), 60 μM biotin-14-dCTP (Thermo Fisher
Scientific #19518018), 300 μM dGTP, 300 μM dTTP, and
60 μM of biotin-14-dATP (Thermo Fisher Scientific
#19524016).
3. Prepare 2X Accessible Chromatin labeling mix (for one label-
ing reaction): 20 μL of NEB Buffer #2 10X (NEB #B7002S),
1 U of Nt.CviPII (NEB #R0626S), 10 U of DNA Polymerase I
(NEB #M0209S), with 20 μL of 10X dNTP mix, and adjust
the final volume to 100 μL with water.
42 Hang Gyeong Chin et al.

2.3 Other Material 1. 37 °C incubator, 65 °C heat-block, and bench-top centrifuge.


and Equipment for 2. 0.5 M EDTA (Invitrogen #15575-038) and RNase A (Invitro-
Chromatin Labeling gen #12091021).
3. Proteinase K (NEB #P8107S) and 20% SDS (TEKNOVA
#S0295).
4. Thermolabile Proteinase K (NEB #P8111S).
5. Phenol:Chloroform:Isoamyl Alcohol (Invitrogen #15593-031).
6. Isopropanol (Pharmco-Aaper #231HPLC99).
7. Glycogen (Sigma-Aldrich #10901393001).
8. 80% ethanol in nuclease-free water (Thermo Fisher Scientific
#AM9932).
9. 1X TE buffer.

2.4 Material for 1. Heat block at 95 °C.


Quality Control 2. Ice-water bath.
3. Positively charged nylon membranes (Amersham #RPN119B).
4. UV light.
5. Blotting-grade blocker (Bio-Rad #170-6404).
6. 1X PBS with 0.1% Tween 20.
7. Goat anti-biotin-HRP antibody (CST #7075).
8. LumiGLO reagent (CST #7003).

2.5 Material of the 1. Covaris S2 sonicator and Covaris microtubes (Covaris


Library Construction #500330).
2. NEB Ultra II DNA library prep Kit for Illumina (NEB
#E7645).
3. Index primers set (NEB #7335S or E7500S).
4. Prepare 2X High Salt Buffer: 10 mM Tris–HCl pH 8.0, 2 M
NaCl, 1 mM EDTA, adjust with nuclease-free water.
5. Prepare High Salt Buffer with 0.05% Triton X-100.
6. DNA LoBind Eppendorf tubes (Eppendorf AG #022431021).
7. Streptavidin Magnetic Beads (NEB #S1420S).
8. NEB Next Sample Purification Beads (E7104) or AMPure XP
beads (Beckman Coulter #A63881).
9. Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific
#Q32854).
10. End-over-end rotator (VWR #10136-084).
11. Bioanalyzer (Agilent 2100 Bioanalyzer with Agilent High Sen-
sitivity DNA Kit, #5067-4526).
12. Monarch Genomic DNA Purification Kit (NEB #T3010S).
13. PCR machine.
Universal NicE-Seq: A Simple and Quick Method for Accessible. . . 43

3 Methods

Carry out all procedures at room temperature unless otherwise


specified.

3.1 Harvesting and 1. Take cells from the incubator and visually check their health
Crosslinking Cells under the microscope. Remove old medium in the flask and
transfer 50 mL to a sterile conical centrifuge tube and then add
5–10 mL TrypLE to the adherent cells in the flask, and incu-
bate for 5 min at 37 °C.
2. Gently tap to detach cells, pipet cells to the old medium con-
taining sterile conical centrifuge (the old medium/serum will
inhibit the activity of the trypsin). Spin down 5 min at room
temperature, 1500 rpm and remove supernatant (decant).
3. Wash in 5 mL 1X PBS, spin down (5 min at room temperature,
1500 rpm), remove supernatant, and resuspend in 5 mL
1X PBS.
4. Count cells: Dilute small amounts of cells (1:1) with Trypan
Blue Stain (gives you a dilution factor of 2) (e.g., take 100 μL
cells and 100 μL stain). Pipet into small edge of the hemocy-
tometer at counting chamber. Calculate the number of cells
per mL.
5. Calculate how many cells in total. Take 10e6 cells and transfer
to 1.5 mL DNA LoBind Eppendorf tube.
6. Add 62.5 μL of 16% formaldehyde and adjust with 1X PBS up
to 1 mL (final concentration would be 1% formaldehyde),
incubate cells for 10 min at room temperature by the end-
over-end rotator (see Note 1).
7. Quench the reaction by adding 125 mM Glycine (for a volume
of 1 mL, add 52 μL of 2.5 M stock) and incubate for 5 min at
room temperature by the end-over-end rotator.
8. Wash cells twice by resuspending in 1 mL of 1X PBS and
spinning down 1500 rpm, 1 min at 4 °C. Remove supernatant
and cells may be stored at -80 °C for later use. For immediate
use, resuspend cells in 1X PBS (0.5 mL) and counter the
number of cells and make aliquots depending on how many
cells will be needed for downstream work (ideally 400 μL for
one million cells, which will give 250,000 in 0.1 mL). Note:
Cells may be lost during centrifuge steps. Therefore, the above
counting step is crucial for universal NicE-seq. In our experi-
ence, cell loss up to ~40% during the crosslink step may occur, if
we count cells before adding formaldehyde. This is a critical
step when cell numbers are limited (see Notes 1 and 2).
44 Hang Gyeong Chin et al.

3.2 Accessible 1. Start with 250–250,000 crosslinked cells suspended in 1X PBS


Chromatin Labeling (25 μL).
and Decrosslinking 2. Add 400 μL Cytosolic buffer to the cells and incubate for
10–20 min on ice with occasional tapping for mixing. The
nuclei can be visualized under the microscope at this point
(circular with smooth edges) (see Note 3).
3. Spin the nuclei down 10 min at 4 °C, 3000 rpm. Discard the
supernatant.
4. Wash the nuclei pellet with 1X PBS for 5 min at 4 °C,
3000 rpm. Discard the supernatant. If cell number is small
<2000, this step can be skipped. To ensure no loss of nuclei,
leave a small amount (10–20 μL) of supernatant in the tube.
5. Add 100 μL of cold 1X PBS buffer and tap/pipet gently to
dissolve the cells.
6. Add 100 μL of 2X Accessible Chromatin labeling buffer (at this
point, total volume is 200 μL). Incubate for 2 h at 37 °C.
Occasionally, tap the reaction to mix.
7. Add 20 μL of 0.5 M EDTA and 2 μl of RNase A (1 mg/ml) to
each sample. Incubate for 30 min at 37 °C to digest RNA.
8. Add 20 μL of 20% SDS and 20 μL of Proteinase K. Incubate for
overnight (O/N) at 65 °C.

3.3 Labeled Genomic 1. Add 250 μL of phenol/chloroform. Vortex 3 times, 5–10 s


DNA Extraction Using each time.
Phenol Chloroform/ 2. Centrifuge for 10 min at 4 °C, 14,000 rpm. Transfer aqueous
Spin Column Method phase (upper part) to Eppendorf tube (will be maximum
250 μL but starting from a lot of cells you do not need to
take so much as this will increase the chances of collecting the
interphase that contains proteins).
3. Add 0.7 volume of isopropanol and 2 μL of glycogen. Incubate
for 2 h at -80 °C or O/N at -20 °C.
4. Centrifuge for 10 min at 4 °C, 14,000 rpm.
5. Carefully wash DNA pellet with 500 μL of 80% ethanol. Cen-
trifuge for 10 min at 4 °C, 14,000 rpm.
6. Remove ethanol solution completely. Air dry sample and resus-
pend DNA in 50 μL of 1X TE buffer.
Optional: Extract biotin-labeled genomic DNA by column
purification using NEB Monarch Genomic DNA
Purification Kit.
7. Measure DNA concentration by Qubit High Sensitivity Kit
with reading ds DNA HS mode (in case concentration
too high: dilute 10X, when starting with high number of cells
(~250,000).
Universal NicE-Seq: A Simple and Quick Method for Accessible. . . 45

3.3.1 Quality Control for 1. Denature genomic DNA by heating for 3 min at 95 °C and
Accessible Chromatin incubate for 3 min in an ice-water bath.
Labeling (Optional) 2. Make a serial dilution (1, 0.5, 0.25, 0.125 μg) of genomic
DNA in MQ water on ice. Never exceed a total volume of
5 μL per spot.
3. Prepare a positively charged nylon membrane (Amersham
#RPN119B) by drawing the circles with a pencil where the
spot will be with DNA. Spot the dilution on the membrane and
let dry.
4. Cut mark the upper left corner (for orientation).
5. Wet the membrane by dripping MQ water on top of it so that it
is fully hydrated.
6. DNA crosslinking by UV: put membrane in UV crosslinker
machine.
7. Wash the membrane with 1X PBST. Transfer the membrane to
a square protein gel box (with lid).
8. Block the membrane with 5% skimmed milk in 1X PBST for
30–60 min at room temperature on a shaker.
9. Add HRP-conjugated goat anti-biotin antibody (1/2000 dilu-
tion) for 1 h.
10. Detect biotin signal with LumiGLO reagent.

3.4 DNA 1. For fragmentation of genomic DNA, take 200 ng of genomic


Fragmentation for NGS DNA, transfer to Covaris microtube, and add up to 50 μL of
Library 1X TE buffer (If genomic DNA is less than 200 ng, the entire
DNA can be used for the sonication). Use the following setting
to obtain 150 bp fragments (intensity: 5, duty cycle: 10%,
cycles per burst: 200, treatment time: 2 min).
2. After the sonication step, transfer the content to 1.5 mL DNA
LoBind Eppendorf tube.

3.5 Universal NicE- 1. For DNA pull-down, add 1 mL of 2X High Salt Buffer to the
seq Library fragmented DNA tube and add 50 μL of streptavidin magnetic
Construction beads. If the quantity of DNA is below 50 ng or the number of
starting cell is below 2 K, the bead amount can be reduced to
15–20 μL.
2. Incubate for 2 h at 4 °C using end-over-end rotator.
3. Place the tube on magnetic rack. When the solution is clear,
remove the liquid using a pipette and resuspend the beads for
5 min with 1 mL of 1X High Salt Buffer containing 0.05%
Triton X-100 at room temperature.
4. Repeat wash steps 3 times more (4 times in total).
5. Wash the beads once with 1 mL of 1X TE buffer by inverting
5 times.
46 Hang Gyeong Chin et al.

6. Resuspend the beads in 50 μL of 1X TE buffer. These beads


will be used for PCR amplification step for library making.
Therefore, care must be taken not to lose the beads (see
Note 4).
7. For end-repair/dA tailing, to the above beads, add 7 μL of
NEB Next Ultra II End Prep Reaction Buffer, 3 μL of NEB
Next Ultra II Prep Enzyme Mix (the final volume would be
60 μL).
8. Mix well, incubate for 30 min at 20 °C, and follow the incuba-
tion for 30 min at 65 °C.
9. Wash the bead with 1 mL of 1X High Salt Buffer with 0.05%
Triton X-100 and incubate it on end-over-end rotator for
5 min at room temperature. Use magnetic rack to collect the
beads.
10. Repeat the above wash step. Use magnetic rack to collect the
beads.
11. Add 1 mL of 1X TE buffer and incubate on the end-over-end
rotator for 5 min at room temperature.
12. Resuspend the bead in 60 μL of 1X TE buffer.
13. For adaptor ligation, add 30 μL of NEB Next Ultra II Ligation
Master Mix, 1 μL of NEB Next Ligation Enhancer, and 1 μL of
1:10 diluted of NEB Next Adaptor for Illumina. (Note: If start
material of DNA is very low, <50 ng, the adaptor can be
diluted up to 1:30).
14. Incubate for O/N or 2–16 h at room temperature.
15. Remove the solution and collect the magnetic beads using a
magnetic rack.
16. Wash the bead with 1 mL of 1X High Salt Buffer with 0.05%
Triton X-100 and incubate it on the end-over-end rotator for
5 min at room temperature.
17. Repeat once more the wash step.
18. Add 1 mL of 1X TE buffer and incubate on the end-over-end
rotator for 5 min at room temperature.
19. Capture the bead and remove the solution, resuspend the bead
in 16 μL of 1X TE buffer.
20. Add 3 μL of USER enzyme and incubate for 20 min at 37 °C.
21. For PCR amplification, transfer all the 19 μL into the
PCR tube.
22. Add 3 μL of Index primer (10 μM), 3 μL of Universal primer
(10 μM), and 25 μL of NEB Ultra II Q5 Master Mix. The total
volume of the final reaction is 50 μL.
23. Set up PCR: 8 cycles for sonication method and 12 cycles for
enzymatic digestion at 30 s 98 °C; initial denaturation: 10 s
Universal NicE-Seq: A Simple and Quick Method for Accessible. . . 47

98 °C; denaturation: 30 s 65 °C; annealing: 45 s 65 °C;


extension: 5 min 72 °C; final extension: Hold 4 °C.
Note: Depending on the amount of DNA, PCR cycles can
be modified. If starting material of DNA is below 50 ng, PCR
cycles can be increased up to 12–13 cycles.

3.6 PCR Cleanup 1. Bring the AMPure beads solution to room temperature before
Using AMPure Beads the cleanup steps (If PCR cleanup conduct with cold beads, it
may affect the efficiency for the recovery of DNA). After PCR
reaction is over, PCR tubes can be vortex for 2–3 s followed by
quick spin down. Put PCR reaction tubes on the magnetic rack
for 1 min, transfer the solution (containing library) to a new
DNA LoBind tube, and add 0.9 volume (45 μL) AMPure
beads (see Note 5).
2. Incubate for 10–15 min at room temperature.
3. Put samples on magnetic rack. When the solution looks clear,
remove it and wash the beads twice with 200 μL of freshly
prepared 80% ethanol by slowly pipetting the ethanol on the
beads without removing the Eppendorf tube off the rack. Wait
10 s, remove the liquid from the beads, and repeat. After
removing the ethanol for a second time, quickly spin down
the tubes, put them back on magnetic rack, and remove the
remaining ethanol at the bottom of the tube.
4. Resuspend the bead in 10 μL of 0.1X TE buffer and incubate
for 10 min at room temperature.
5. Put back on the magnetic rack and transfer the library contain-
ing solution into the new DNA LoBind tube. It is the final
library DNA.
6. Measure the amount of the library DNA using the Qubit HS
DNA Kit set at ds High Sensitivity mode. If the concentration
is >1 ng/μL, the yield of library is acceptable for the NGS
(e.g., Illumina NextSeq 500/550).
7. Analyze the library DNA on the Bioanalyzer with DNA High
Sensitivity ChIP to check the actual library quantity. (Note: In
case the ligated adaptor is still visible in Bioanalyzer, the library
pool can be re-purified using AMPure bead after the libraries
are combined.)

3.7 Optional Method 1. Take 200 ng of genomic DNA into 1.5 mL of DNA LoBind
A: Sonication Free Eppendorf tube, add 10 μL of 10X NEB Buffer #2, 1 U of Nt.
Labeled DNA CviPII, and adjust with MQ water up to 100 μL.
Enrichment for NicE- 2. Incubate for O/N at 37 °C.
seq Library 3. After O/N digestion, heat-inactivate Nt.CviPII by incubating
Preparation Using the reaction tube at 65 °C for 10 min.
Nicking Enzyme
Digestion
48 Hang Gyeong Chin et al.

Note: The enzyme amounts can be adjusted for digestion.


For 100 ng of genomic DNA from HCT116 cells, 0.5 U of Nt.
CviPII can be used for accessible chromatin enrichment.
4. For the library preparation, use steps of Subheadings 3.5
and 3.6.

3.8 Optional Method This method is sonication-free, one tube NicE-seq library from
B: One Tube NicE-seq cultured cells, and is recommended for low cell number, i.e.,
<1000 cells. However, starting materials between 250 and 5 K
cells are adapted to one tube NicE-seq method.

3.8.1 Harvesting and Harvest and crosslink cells by the same procedure that described in
Crosslinking Cells one of the Harvesting and Crosslinking Cells section. Care must be
taken to wash cells with 1X PBS before 1% formaldehyde for
fixation, followed by two washes with 1X PBS. Note: Residual
formaldehyde and glycine may inhibit downstream reaction.

3.8.2 Accessible 1. Start with 250–100,000 crosslinked cells suspended in 25 μL


Chromatin Labeling and of 1X PBS.
Decrosslinking 2. Add 400 μL Cytosolic buffer to the cells and incubate for
10–20 min with occasional mixing by tapping. The nuclei can
be visualized under the microscope (circular with smooth
edges).
3. Spin the nuclei down for 10 min at 4 °C, 3000 rpm. Discard
the supernatant. Wash the nuclei with 1X PBS for 5 min at 4 °
C, 3000 rpm. Discard the supernatant. Note: If cell number is
small <2000 cells, this step can be skipped. To avoid the loss of
the nuclei, leave a small amount (10–20 μL) of supernatant in
the tube.
4. Add 100 μL of cold 1X PBS buffer and tap/pipet gently to
resuspend the nuclei.
5. Add 100 μL of Accessible Chromatin labeling buffer (at this
point, total volume is 200 μL). Incubate for 2 h at 37 °C.
Occasionally, tap to mix the reaction.
6. Add 2 μL of 0.5 M EDTA, 18 μL of MQ water, and 2 μl of
RNase A (1 mg/ml) to each sample. Incubate for 30 min at
37 °C to digest RNA.
Note: The high concentration of EDTA can be inhibitory
for nicking enzyme digestion of DNA. Hence, the amount of
0.5 M EDTA is reduced to prevent the undigested DNA for
this method compared with the method, 2. Accessible Chro-
matin Labeling and Decrosslinking section.
7. Add 2.0 μL of 10% SDS and 2 μL of Thermolabile Proteinase K
(TLPK). The final SDS concentration would be 0.1%.
8. Incubate for O/N at 37 °C.
Universal NicE-Seq: A Simple and Quick Method for Accessible. . . 49

9. Incubate for 15 min at 55 °C to inactivate TLPK.


10. Add 22 μL of 10% Triton X-100, mix well, and leave the tube
for 5–10 min at room temperature. The final Triton X-100
concentration would be 1%.

3.8.3 Fragmentation of 1. Add 80 μL of 10X NEB Buffer #2, 669.5 μL of MQ water, 5 U


Labeled Chromatin by the of Nt.CviPII (2.5 μl) in the reaction tube and mix well to the
Nicking Enzyme Digestion final volume of 1 mL.
2. Incubate for 4–16 h at 37 °C by the end-over-end rotator.
3. Incubate for 10 min at 65 °C to inactivate Nt.CviPII.

3.8.4 DNA Pull-Down 1. Add 50 μL of streptavidin magnetic bead, 300 μL of 5 M NaCl


and MQ water up to 1.5 mL.
2. Incubate for 2 h at 4 °C.
3. Wash the bead with 1X High Salt Buffer containing 0.05%
Triton X-100 for 5 min at the room temperature by the end-
over-end rotator and remove the solution carefully.
4. Repeat the above step 3 times more (4 times in total).
5. Wash the bead once with 1 mL of 1X TE buffer by inverting a
couple of times.
6. Resuspend the bead in 50 μL of 1X TE buffer. These beads will
be used for the library making and PCR amplification steps (see
Note 4).

3.8.5 End-Repair/dA Follow the same procedure that described in steps 7–12 of Sub-
Tailing heading 3.5 UniNicE-seq library construction section.

3.8.6 Adaptor Ligation Follow the same procedure that described in steps 13–20 of Sub-
heading 3.5 UniNicE-seq library construction section. If the staring
material is <1000 cells, the adaptor can be diluted up to 1:30.

3.8.7 PCR Amplification Follow the same procedure that described in Subheading 3.5,
and PCR Cleanup by UniNicE-seq library construction section and in Subheading 3.6,
AMPure Beads PCR Cleanup using AMPure beads. PCR cycle can be modified
up to 13 cycles while avoiding high PCR duplication. For very low
number cells, <250 cells, PCR cycle can be modified up to
20 cycles (see Note 5).

3.9 Optional Method Start with 5–10 μm FFPE section on the slide.
C: One Tube NicE-seq
1. Add 500 μL of mineral oil (Sigma-Aldrich #330760) to the
of the Human FFPE
area of tissue on the slide.
Samples from 5 to
10 μm Tissue Section
2. Incubate for 20–30 min at 52 °C.
3. Remove the mineral oil carefully, transfer the slide to coupling
3.9.1 Removal of Paraffin jars/plate with the ethanol 100%, and incubate for 5 min at
from the Tissue Section R/T.
Slide
50 Hang Gyeong Chin et al.

4. Transfer the slide to gradually 90% ethanol, 80% ethanol, 70%


ethanol, each step for 5 min at R/T and hydrate in MQ water
for 5 min.
5. Transfer the slide in 1X PBST buffer and incubate for 1 h at
65 °C. Gradually cool the slide down to RT.
6. Cooling it down at R/T.
7. Transfer the slide to 1X PBST buffer containing Proteinase K
(45 μL of Proteinase K in 50 mL of 1X PBST) in a square Petri
dish and incubate for 15 min at R/T.
8. Exchange the slide in 1X PBS buffer for 2 min at R/T and
repeat it once more.
9. Incubate the slide with 1X PBS buffer for 2 min at R/T.
10. Incubate the slide with Cytosolic buffer for 20 min at 4 °C.
11. Incubate the slide with 1X PBS buffer for 5 min at R/T.

3.9.2 Accessible 1. Mark the circle the area of tissue by the liquid-repellent slide
Chromatin Labeling and maker pen and carefully drop it off with 200 μL of 1X Accessible
Decrosslinking Chromatin labeling mix onto the circle (see Note 6).
2. Incubate for 2 h at 37 °C in a humidified chamber.
3. Transfer the slide in 1X PBS containing 0.5 M EDTA and
incubate for 10 min at R/T.
4. Collect the tissue carefully into 1.5 mL DNA LoBind Eppen-
dorf tube by using a surgical scalpel and add 200 μL of ATL
buffer (QIAGEN, QIAamp DNA FFPE Tissue Kit #56404).
5. Add 2 μl of RNase A (1 mg/ml) and incubate for 30 min at
37 °C.
6. Add 20 μL of Proteinase K and incubate for O/N at 65 °C.
7. Incubate for 2 min at 95 °C for heat-inactivation of
Proteinase K.
8. Cooling it down for 15 min at R/T.

3.9.3 Enrichment of 1. Add 50 μL of 10X NEB Buffer #2, 2.5 U of Nt.CviPII to the
Labeled Chromatin by the tube and adjust with MQ water up to 500 μL, and mix well.
Nicking Enzyme Digestion 2. Incubate for O/N at 37 °C.
3. Incubate for 15 min at 65 °C for heat-inactivation of Nt.
CviPII.

3.9.4 DNA Pull-Down 1. Add 50 μL of streptavidin magnetic beads and add 1 mL of 2X


High Salt Buffer and incubate for 2 h at 4 °C by end-over-end
rotator.
2. Wash the bead with 1 mL of 1X High Salt Buffer containing
0.05% Triton X-100 for 5 min at R/T.
Universal NicE-Seq: A Simple and Quick Method for Accessible. . . 51

3. Repeat the above wash step 3 times more (4 times in total).


4. Wash the bead with 1X TE buffer for 5 min at R/T.
5. Resuspend the bead in 50 μL of 1X TE buffer (see Note 4).

3.9.5 End-Repair/dA Follow the same procedure that described in steps 7–12 of Sub-
Tailing heading 3.5 UniNicE-seq library construction section.

3.9.6 Adaptor Ligation Follow the same procedure that described in steps 13–20 of Sub-
heading 3.5 UniNicE-seq library construction section.

3.9.7 PCR Amplification Follow the same procedure that described in Subheadings 3.5 and
and PCR Cleanup by 3.6. PCR amplification and PCR Cleanup by AMPure Beads sec-
AMPure Beads tion (see Note 5).

4 Notes

1. For harvesting and crosslinking cells, always use fresh formal-


dehyde solution and ensure removal of residual formaldehyde
during washing step.
2. Number of actual cells for the labeling reaction step must be
counted for accurate determination of cell numbers. In our
observation, cells are lost during the washing step after
crosslinking.
3. Cytosolic buffer incubation period can be extended up to
30 min after addition.
4. After DNA pull-down, all streptavidin bead bound DNA may
be used for DNA library preparation. Loss of beads may yield in
lower library quantity.
5. During PCR cleanup steps, it is important that AMPure bead
should be warmed to room temperature before use.
6. One tube NicE-seq of FFPE tissue section is highly dependent
on DNA quality and length. In our case, samples with average
DNA length of 0.8–1 k bp yielded acceptable quality of NGS
library. High molecular weight DNA gave better quality
library.

Acknowledgments

This work was partly supported by NIH SBIR grant


R44HG011006 and New England Biolabs, Inc. to S.P. We thank
C. Carlow for critical reading of the manuscript, T. Evans,
D. Comb, Sir R.J. Roberts, and J. Ellard for encouragement.
Basic research support for H.G.C., P.O.E, U.S.V., and S.P. was
provided by New England Biolabs, Inc.
52 Hang Gyeong Chin et al.

References
1. Keene MJ, Corces V, Lowenhaupt K, Elgin SC DNA-binding proteins and nucleosome posi-
(1981) DNase I hypersensitive sites in Dro- tion. Nat Methods 10:1213–1218
sophila chromatin occur at the 5′ ends of 13. Corces M, Trevino A, Hamilton E et al (2017)
regions of transcription. PNAS 78:143–146 An improved ATAC-seq protocol reduces
2. Babiss LE, Bennett A, Friedman JM, Darnell background and enables interrogation of fro-
JE Jr (1986) DNase I-hypersensitive sites in the zen tissues. Nat Methods 14:959–962
5′-flanking region of the rat serum 14. Liu C, Wang M, Wei X, Wu L, Xu J et al (2019)
albumin gene: correlation between chromatin An ATAC-seq atlas of chromatin accessibility in
structure and transcriptional activity. PNAS 83: mouse tissues. Sci Data 6:65
6504–6508 15. Bysani M, Agren R, Davegårdh C, Volkov P,
3. Winter BB, Arnold HH (1987) Tissue-specific Rönn T, Unneberg P, Bacos K, Ling C (2019)
DNase I-hypersensitive Sites and Hypomethy- ATAC-seq reveals alterations in open chroma-
lation in the Chicken CardiacMyosin Light tin in pancreatic islets from subjects with type
Chain Gene (L2-A). JBC 262:13750–13757 2 diabetes. Sci Rep 9:7785
4. Tsompana M, Buck MJ (2014) Chromatin 16. Bentsen M, Goymann P, Schultheis H, Klee K,
accessibility: a window into the genome. Epi- Petrova A, Wiegandt R, Fust A, Preussner J,
genetics Chromatin 7:33 Kuenne C, Braun T, Kim J, Looso M (2020)
5. Crawford GE, Davis S, Scacheri PC et al ATAC-seq footprinting unravels kinetics of
(2006) DNase-chip: a high-resolution method transcription factor binding during zygotic
to identify DNase I hypersensitive sites using genome activation. Nat Commun 11:4267
tiled microarrays. Nat Methods 3:503–509 17. Davie K, Jacobs J, Atkins M, Potier D,
6. Boyle AP, Davis S, Shulha HP, Meltzer P, Mar- Christiaens V, Halder G, Aerts S (2015) Dis-
gulies EH et al (2008) High-resolution covery of transcription factors and regulatory
mapping and characterization of open chroma- regions driving in vivo tumor development by
tin across the genome. Cell 132:311–322 ATAC-seq and FAIRE-seq open chromatin
7. Giresi PG, Kim J, McDaniell RM, Iyer VR, profiling. PLoS Genet 11:e1004994
Lieb JD (2007) FAIRE (Formaldehyde- 18. Buenrostro JD, Wu B, Litzenburger UM,
Assisted Isolation of Regulatory Elements) iso- Ruff D, Gonzales ML, Snyder MP, Chang
lates active regulatory elements from human HY, Greenleaf WJ (2015) Single-cell chroma-
chromatin. Genome Res 17:877–885 tin accessibility reveals principles of regulatory
8. Jin W, Tang Q, Wan M et al (2015) Genome- variation. Nature 523:486–490
wide detection of DNase I hypersensitive sites 19. Corces MR, Granja JM, Shams S, Louie BH,
in single cells and FFPE tissue samples. Nature Seoane JA, Zhou W, Silva TC, Groeneveld C,
528:142–146 Wong CK, Cho SW, Satpathy AT, Mumbach
9. Schones DE, Cui K, Cuddapah S, Roh TY, MR, Hoadley KA, Robertson AG, Sheffield
Barski A, Wang Z et al (2008) Dynamic regu- NC, Felau I, Castro MAA, Berman BP, Staudt
lation of nucleosome positioning in the human LM, Zenklusen JC, Laird PW, Curtis C, Cancer
genome. Cell 132:887–898 Genome Atlas Analysis Network, Greenleaf
10. Kuan PF, Huebert D, Gasch A, Keles S (2009) WJ, Chang HY (2018) The chromatin accessi-
A non-homogeneous hidden-state model on bility landscape of primary human cancers. Sci-
first order differences for automatic detection ence 362:eaav1898
of nucleosome positions. Stat Appl Genet Mol 20. Ponnaluri VKC, Zhang G, Estève PO,
Biol 8:Article29. https://doi.org/10.2202/ Spracklin G, Sian S, Xu SY, Benoukraf T, Prad-
1544-6115.1454. PMC 2861327 han S (2017) NicE-seq: high resolution open
11. Klein DC, Hainer SJ (2019) Genomic methods chromatin profiling. Genome Biol 18:122
in profiling DNA accessibility and factor locali- 21. Chin HG, Sun Z, Vishnu US, Hao P, Cejas P,
zation. Chromosom Res 28:69–85 Spracklin G, Estève PO, Xu SY, Long HW,
12. Buenrostro JD, Giresi PG, Zaba LC, Chang Pradhan S (2020) Universal NicE-seq for
HY, Greenleaf WJ (2013) Transposition of high-resolution accessible chromatin profiling
native chromatin for fast and sensitive epige- for formaldehyde-fixed and FFPE tissues. Clin
nomic profiling of open chromatin, Epigenetics 12:143
Chapter 4

Measuring Inaccessible Chromatin Genome-Wide Using


Protect-seq
George Spracklin, Liyan Yang, Sriharsa Pradhan, and Job Dekker

Abstract
Chromatin accessibility has been an immensely powerful metric for identifying and understanding regu-
latory elements in the genome. Many important regulatory elements, such as enhancers and transcriptional
start sites, are characterized by “open” or nucleosome-free regions. Understanding the areas of the genome
that are not considered open chromatin has been more difficult. Protect-seq is a genomics technique that
aims to identify inaccessible chromatin associated with the nuclear periphery. These regions are enriched for
histone modifications associated with transcriptional repression and correlate with loci identified by other
techniques measuring heterochromatin and peripheral localization. Here, we discuss the protocol and best
practices to perform Protect-seq.

Key words Protect-seq, Chromatin, Heterochromatin, Nuclear periphery, Genomics

1 Introduction

Mammalian genomes are highly organized inside the nucleus.


Accessible chromatin regions are often nucleosome-free (“open
chromatin”) and have a strong correlation with regulatory elements
such as active promoters and enhancers [1]. Almost all genomics
techniques which probe chromatin structure rely on enzyme acces-
sibility. Conversely, very few genomics techniques are designed to
measure the inaccessible and/or heterochromatic regions of the
genome [2–5] and some that do require complex genome editing
[6]. Thus, there is a need for more genomics techniques to probe
specific genome structures, especially the inactive genome.
Protect-seq aims to measure the inaccessible chromatin by
degrading accessible chromatin with saturating levels of nucleases
for excessive time periods (Fig. 1a) [4]. Techniques to measure
open chromatin, such as MNase-seq [7], DNase-seq [8], and
ATAC-seq [9], digest the genome for short periods of time
(<15 min). In contrast, Protect-seq uses a higher concentration

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_4,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

53
A B

C D
1 2 3 4 5 6

Protect-seq Library Bioanalyzer Trace

3kb

1kb
RFU

500bp

200bp
100bp 100

200

300
400
500

1000

3000
1

Size (bp)

E chr8 63.0M 64.0M 65.0M 66.0M 67.0M 68.0M 69.0M 70.0M 71.0M 72.0M 73.0M 74.0M

1.1

Protect-seq 0.0

-1.1
6.7

DNaseI-seq
0.0

q12.3 q13.1 q13.2 q13.3 q21.11

RefSeq Genes

Fig. 1 Overview of Protect-seq in HCT116 cells. (a) Cartoon schematic of Protect-seq. Chromatin in gray/black
and nucleases depicted in red scissors. (b) Microscopy images of untreated (non-digested) and treated
(digested) nuclei stained with DAPI. Reproduced from [4] by permission from Oxford University Press. (c) Gel
electrophoresis image of a typical Protect-seq experiment. Lane 1 is a 2-log DNA Ladder (NEB# N3200S). Lane
2 is empty. Lane 3 is undigested genomic DNA. Lane 4 is nuclei digested with DNase only. Lane 5 is nuclei
digested with MNase only. Lane 6 is nuclei digested with both DNaseI and MNase (Protect-seq method). (d)
Bioanalyzer trace of a typical experiment after DNA purification and NGS library preparation. Note: NEB
adapters are around 120 nucleotides. LM lower marker, UM upper marker, both are internal standards. (e)
Genome browser of example region (chr8: 60–75 M) of Protect-seq and DNaseI-seq, representing inaccessi-
ble and accessible chromatin, respectively. Protect-seq signal track (GSE135580) is represented as log2ratio
(treatment/input) and DNaseI-seq signal track is reprocessed from ENCODE (ENCSR000ENM) and represented
as reads per genomic coverage
Measuring Inaccessible Chromatin Genome-Wide Using Protect-seq 55

of enzyme, and a longer digestion time (>60 min). After allowing


nucleases to overdigest nuclei in HCT116, the remaining DNA is
enriched at the nuclear periphery in DAPI dense foci (Fig. 1b).
Importantly, the nuclei are still intact at this stage, making Protect-
seq amenable to single-cell sequencing. Interestingly, in HCT116
cells, Protect-seq identifies two types of domains: strong and weak
domains, which correspond to H3K9me3-HP1 and H3K27me3
chromatin domains, respectively. Taken together, Protect-seq is a
simple, reliable, and robust technique to identify chromatin that is
either resistant to nuclease and/or tethered to a nuclear body and is
thus insoluble.
In this chapter, we describe a simplified Protect-seq protocol
and the necessary experimental considerations and quality controls
measures.

2 Materials

Protect-seq uses standard laboratory reagents. Lysis buffer can be


prepared ahead of time and stored at 4 °C. The original published
protocol did not perform the nuclease digestion in the lysis buffer.
The detergent in the lysis buffer is helpful at preventing cell clump-
ing and does not interfere with enzyme activity (see Note 1).

2.1 Buffers and 1. 37% Formaldehyde (Fisher Scientific Cat# BP531-25).


Reagents 2. Glycine (2.5 M stock) (Sigma Cat# G8898-1KG).
3. IGEPAL CA-630 detergent (MP, Cat# 198596).
4. Lysis Buffer:
10 mM Tris–HCl pH 8.0.
10 mM NaCl.
0.2% Igepal CA-630 (NP40).
5. Stop Buffer:
1X NEB2 (NEB Cat# B7002S).
5 mM EDTA.
5 mM EGTA.
6. DNaseI (NEB Cat# M0303).
7. Micrococcal Nuclease (NEB Cat# M0247S).
8. RNaseA 20 mg/mL (Invitrogen Cat# 12091021).
9. Low TE:
10 mM Tris–HCl pH 8.0.
0.1 mM EDTA.
56 George Spracklin et al.

2.2 DNA Purification, 1. 200 μL PCR tube.


Library Building, and 2. 96-well qPCR plate and seal.
Quality Control
3. NEB Monarch DNA cleanup kit (NEB Cat# T1030).
4. NEB Next Ultra II DNA library prep Kit for Illumina (NEB
Cat# E7645).
5. NEB Next Library Quant Kit for Illumina (NEB Cat# E7630).
6. AMPure XP beads (Beckman Cat# A63881).
7. Qubit 4 Fluorometer (Ref# Q33226) or equivalent.
8. Qubit assay tubes (Ref# Q32856).
9. Qubit dsDNA High-Sensitivity Assay Kit (Ref# Q32854).
10. qPCR machine (Thermo StepOne, BioRad CFX or
equivalent).

2.3 Microscopy DAPI staining is a useful control to determine the localization of


the remaining (or protected) chromatin (optional).
1. ProLong Gold antifade reagent with DAPI (Ref# P36931).
2. Microscope (equipped with the DAPI excitation and emissions
filters).

2.4 General Material 1. Eppendorf DNA LoBind tube 1.5 mL (Cat# 022431021).
and Equipment 2. Tabletop centrifuge.
3. Thermal cycler.
4. ThermoMixer.
5. Nuclease-free water.

3 Methods

Protect-seq is robust and reliable in HCT116 cells, which can be


used as a positive control (see Note 2). The original published
protocol performed two rounds of nuclease digestion with a wash
step in between in NEB#2 [4]. In this streamlined protocol version,
we perform one round of nuclease digestion carried out in Lysis
Buffer with the addition of MgCl2 and CaCl2. In HCT116 cells,
the two protocols yield nearly identical results. Carry out all proce-
dures at room temperature unless otherwise specified.

3.1 Cell Culture and 1. HCT116 cells were cultured in McCoy5A media supplemen-
Crosslinking ted with 10% fetal bovine serum (FBS) at 37 °C and 5% CO2.
Once cells reach 75% confluency, trypsinize and wash twice
with 1X phosphate buffered saline (PBS) and stored at -80 °
C. Note: These conditions will vary depending on the cell type.
Measuring Inaccessible Chromatin Genome-Wide Using Protect-seq 57

2. Resuspend 1–5 M cells in 500 μL in 1% formaldehyde (see Note


3 on additional crosslinkers).
3. Rotate for 10 min.
4. Add 25 μL 2.5 M Glycine (final conc. 0.125 M Glycine) to
quench the reaction.
5. Rotate for 5 min.
6. Incubate on ice for 15 min.
7. Spin at 1000 × g for 5 min.
8. Wash twice with 1 mL PBS (1X).

3.2 Nuclei Keeping nuclei in single-cell suspension is important to ensure all


Purification nuclei are permeable to the nucleases. Note: In some scenarios, 1%
Triton X-100 after SDS can induce clumping, in which case 0.1%
Triton X-100 should alleviate issues.
1. Resuspend 1–5 M cells in 200 μL cold lysis buffer.
(a) Add 2 μL of protease inhibitor cocktail (100X).
2. Incubate on ice for 15 min.
3. Add 2 μL 10% SDS (final conc. 0.1% SDS).
4. Incubate for 10 min at 65 °C, put on ice.
5. Add 20 μL 10% Triton X-100 (final conc. 1%).
(a) Optional incubation at 37 °C for 15 min.
6. Save input aliquots (non-digested sample).
(a) ~4.4 μL Microscopy/DAPI stain
(b) ~4.4 μL NGS input (2%).

3.3 Nuclease The nuclease digestion leads to complete genome fragmentation as


Digestion shown in Fig. 1c. The addition of MgCl2 and CaCl2 can induce cell
clumping and potentially lead to block enzyme permeability, for
this reason we do not rotate the cells, rather leave them on the
benchtop. The issue of cell clumping is further discussed in Note 1.
1. Add 2.5 mM MgCl2 and 2 mM CaCl2.
0.55 μL 1 M MgCl2
4.5 μL 100 mM CaCl2.
2. Add 5 μL DNaseI and 5 μL MNase.
3. Incubate for 1 h at room temp.
4. Add 2 μL RNaseA.
5. Incubate at 37 °C for 30 min.
6. Spin at 2000 × g for 5 min.
7. Wash with 400 μL Lysis Buffer.
8. Spin at 2000 × g for 5 min.
58 George Spracklin et al.

9. Repeat wash ×2.


10. Resuspend 100 μL Stop Buffer.
11. Save aliquot (~4 μL) of nuclei for a DAPI stain (digested
sample) (example shown in Fig. 1b) (see Note 4).

3.4 Reverse 1. Add 20 μL Proteinase K and 10 μL 10% SDS.


Crosslinking and DNA 2. Incubate at 65 °C on ThermoMixer overnight (shaking
Purification optional).
3. Purify DNA using NEB Monarch DNA cleanup kit or equiva-
lent using manufacturer’s instructions.
4. Quantify DNA for NGS Library Prep using Qubit.
(a) No sonication necessary.

3.5 NGS Library NGS libraries are constructed using NEB Next Ultra II DNA
Preparation library kit following the manufacturer’s protocol. We typically use
200 ng for library construction (ranging from 1 ng to 1 mg). NGS
libraries have also been generated using individual enzymes for
end-repair, dA-tailing, and adapter ligation opposed to a commer-
cial kit.
1. Add end-repair and dA-tailing enzymes.
7 μL NEB Next Ultra II End Prep Reaction Buffer (E7647AA)
3 μL NEB Next Ultra II End Prep Enzyme Mix (E7646AA)
50 μL Purified DNA fragments.
2. Incubate for 30 min at RT.
3. Incubate for 30 min at 65 °C.
4. Add Ligation Enzymes.
60 μL End-Repair/dA-tailed DNA fragments
2.5 μL NEB loop adapter
30 μL Ligation Master mix
1 μL Ligation enhancer.
5. Incubate for 1 h (see Note 5).
6. Add 3 μL USER.
(a) Note this step is specific to NEB loop adapters.
7. Incubate for 30 min at 37 °C.
8. Cleanup DNA fragments and remove unligated adapter with
0.9X AMPure.
9. Resuspend in 15 μL low-TE.
10. Setup PCR reaction.
15 μL Resuspended DNA
5 μL Universal F primer
Measuring Inaccessible Chromatin Genome-Wide Using Protect-seq 59

5 μL Index R primer
25 μL Q5 Master mix (2X).
11. PCR amplify library to add index barcodes.
98 °C for 30 s
5 Cycles of:
98 °C for 10 s
65 °C for 75 s
65 °C for 5 min.
Hold at 4 °C.
12. Cleanup PCR product with 0.9X AMPure or equivalent fol-
lowing manufacturer’s instructions.
13. Measure DNA by Qubit.
14. Store DNA library at -20 °C (long-term) or 4 °C (short-
term).

3.6 NGS Library Before NGS libraries are sequenced, we perform three quality
Quantification and controls.
Quality Control
1. Examination of library fragment size distribution. Gel electro-
phoresis and/or Bioanalyzer are suitable for approximating
fragment length. On average, our fragments are 50–200 bp as
shown in Fig. 1d.
2. Estimation of effective library concentration for optimal cluster
density. Following the manufacturer’s instructions, we use
NEB Next Library Quant Kit for Illumina, which includes
P5/P7 primers for amplification and DNA standards for quan-
titative PCR (qPCR).
3. Compare control (non-digested) and treatment (digested)
nuclei using DAPI staining. In HCT116 cells, the treatment
(or digested) sample consistently results in DAPI dense foci
around the nuclear periphery, whereas the input will have signal
throughout the nucleus as shown in Fig. 1b from [4].

3.7 Sequencing and The above protocol is designed to generate sequencing libraries for
Expected Results illumina. This version of the Protect-seq protocol results in small
DNA fragments (<200 bp) after nuclease digestion. Therefore, we
recommend the use of paired-end 2 × 50 bp or 2 × 75 bp sequenc-
ing kits. After sequencing, paired-end reads are mapped to the
reference genome. We do not apply a MAPQ threshold because
Protect-seq is enriched at transposable and repetitive elements and
enriched evenly across the centromeres. Signal tracks can be repre-
sented as either fold-change or log2 ratio using MACS2 [10] or
deepTools [11] as shown in Fig. 1e. Thus far, Protect-seq is strongly
correlated with constitutive heterochromatin. Therefore, we
60 George Spracklin et al.

typically compare Protect-seq with other genomics techniques,


such as ChIP-seq for H3K27me3 and H3K9me3 when available.
In addition, examining Protect-seq signal across chromosome 19 in
humans can be a good positive control as it contains conserved
heterochromatin domains over zinc-finger repeats [12].

4 Notes

1. In human cell lines, we found quite a high degree of variability


in nuclei extractions with some cell lines tending to strongly
clump while others less so. Therefore, nuclei extraction proto-
cols may need to be optimized depending on the cell line of
interest. The MgCl2 and CaCl2 can also be lowered if needed.
2. Protect-seq was developed using HCT116 and has been the
most consistent in HCT116. Expanding and interpreting
Protect-seq data on cell lines with few and/or small hetero-
chromatin domains (i.e., stem cells and blood lineages) has
been less successful. One potential improvement may be the
addition of high salt washes after digestion to remove soluble
chromatin [13–16].
3. Protect-seq is compatible with other types of crosslinkers such
as disuccinimidyl glutarate (DSG) and ethylene glycol bis(suc-
cinimidyl succinate) (EGS). Below is an example of a double
crosslinker protocol with formaldehyde (FA). Prepare a
300 mM stock DSG or EGS (dissolve DSG or EGS powder
in DMSO). After FA crosslinking (see step 8 in Subheading
3.1), cell pellets are resuspended in 5 mL PBS. Add 50 μL
300 mM DSG/EGS (3 mM final conc.). Rotate 40 min at
room temperature. Quench the reaction with 962 μL 2.5 M
Glycine (0.4 M final conc.). Rotate for 5 min, spin 15 min at
2000 × g, wash double crosslinked cell pellets with
PBS + 0.05% BSA, spin at 4 °C, 2000 × g for 15 min, and
remove supernatant completely. For long-term storage, snap
freeze-dry pellets in liquid nitrogen and store at -80 °C.
4. This aliquot volume might need to be adjusted depending on
the nuclei concentration.
5. Ligation times can range from 30 min to 4 h.

References
1. Klemm SL, Shipony Z, Greenleaf WJ (2019) 3. Sebestyén E, Marullo F, Lucini F et al (2020)
Chromatin accessibility and the regulatory epi- SAMMY-seq reveals early alteration of hetero-
genome. Nat Rev Genet 20:207–220 chromatin and deregulation of bivalent genes
2. Becker JS, McCarthy RL, Sidoli S et al (2017) in Hutchinson-Gilford Progeria Syndrome.
Genomic and proteomic resolution of hetero- Nat Commun 11:6274
chromatin and its restriction of alternate fate 4. Spracklin G, Pradhan S (2020) Protect-seq:
genes. Mol Cell 68:1023–1037.e15 genome-wide profiling of nuclease inaccessible
Measuring Inaccessible Chromatin Genome-Wide Using Protect-seq 61

domains reveals physical properties of chroma- 11. Ramı́rez F, Ryan DP, Grüning B et al (2016)
tin. Nucleic Acids Res 48:e16 deepTools2: a next generation web server for
5. van Schaik T, Vos M, Peric-Hupkes D et al deep-sequencing data analysis. Nucleic Acids
(2020) Cell cycle dynamics of lamina- Res 44:W160–W165
associated DNA. EMBO Rep 21:e50636 12. Vogel MJ, Guelen L, de Wit E et al (2006)
6. Guelen L, Pagie L, Brasset E et al (2008) Human heterochromatin proteins form large
Domain organization of human chromosomes domains containing KRAB-ZNF genes.
revealed by mapping of nuclear lamina interac- Genome Res 16:1493–1504
tions. Nature 453:948–951 13. Berezney R, Funk LK, Crane FL (1970) The
7. Schones DE, Cui K, Cuddapah S et al (2008) isolation of nuclear membrane from a large-
Dynamic regulation of nucleosome positioning scale preparation of bovine liver nuclei. Bio-
in the human genome. Cell 132:887–898 chim Biophys Acta 203:531–546
8. Boyle AP, Davis S, Shulha HP et al (2008) 14. Ueda K, Matsuura T, Date N, Kawai K
High-resolution mapping and characterization (1969) The occurrence of cytochromes in
of open chromatin across the genome. Cell the membranous structures of calf thymus
132:311–322 nuclei. Biochem Biophys Res Commun 34:
9. Buenrostro JD, Giresi PG, Zaba LC et al 322–327
(2013) Transposition of native chromatin 15. Kay RR, Fraser D, Johnston IR (1972) A
for fast and sensitive epigenomic profiling of method for the rapid isolation of nuclear mem-
open chromatin, DNA-binding proteins and branes from rat liver. Characterisation of the
nucleosome position. Nat Methods 10: membrane preparation and its associated
1213–1218 DNA polymerase. Eur J Biochem 30:145–154
10. Zhang Y, Liu T, Meyer CA et al (2008) Model- 16. Berezney R, Coffey DS (1974) Identification
based analysis of ChIP-Seq (MACS). Genome of a nuclear protein matrix. Biochem Biophys
Biol 9:R137 Res Commun 60:1410–1417
Chapter 5

Determination of the Chromatin Openness in Bacterial


Genomes
Mahmoud M. Al-Bassam and Karsten Zengler

Abstract
The hyperactive Tn5 transposase in the ATAC-seq method has been widely used to determine the open
DNA regions and understand the overall epigenomic regulation in the chromatins of eukaryotic cells. Here,
we describe POP-seq (Prokaryotic chromatin Openness Profiling sequencing), an adaptation of the ATAC-
seq method, to interrogate changes in the openness of prokaryotic nucleoids.

Key words Nucleoid structure, Tn5 transposase, Nucleoid-associated proteins, Chromatin structure,
H-NS, HiC, Transcription factor binding sites, POP-seq

1 Introduction

In eukaryotes, histone oligomers organize the chromosomal DNA


into nucleosomes of defined sizes, which are the basic building
blocks of the highly ordered chromosome. In addition to their
role in organizing the structure of the eukaryotic genome, histones
and other structuring proteins play a pivotal role in the regulation
of gene expression and the functional differentiation of cells
[1]. Many tools have been developed, including MNase-seq [2],
FAIRE-seq [3], and DNase-seq [4], which showed that regions of
DNA not occupied by histones (open chromatin) are highly asso-
ciated with active transcription. The ATAC-seq method, developed
by the Greenleaf lab [5], utilizes the hyperactive Tn5 transposase
and represents a major improvement to the previous methods.
ATAC-seq is simpler, can be applied at high-throughput, and is
highly sensitive, thus allowing the study of the open chromatin at a
single cell level [6]. Unlike in eukaryotes, well-defined nucleosome
structures are lacking in bacteria. The bacterial nucleoid is
organized by conserved nucleoid-associated proteins (NAPs),

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_5,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

63
64 Mahmoud M. Al-Bassam and Karsten Zengler

including HU and SMC [7, 8], as well as poorly conserved NAPs.


The dynamic organization of the nucleoid directly affects how
genetic information is accessed, interpreted, and implemented [9–
11]. The lack of methods to study the open chromatin in prokar-
yotes has motivated us to develop POP-seq (Prokaryotic chromatin
Openness Profiling sequencing), which probes the open DNA in
the bacterial nucleoid that is prefixed with formaldehyde. POP-seq
enables the determination of the nucleoid openness and provides
new insights into understanding nucleoid structure, gene regula-
tion, and phenotype.

2 Materials

1. Zr BashingBead Lysis Tubes (0.1 and 0.5 mm) Zymo


Research.
2. Vortex Genie 2 (Scientific Industries Inc.) with horizontal
shaker 24-tube adapter (Orbital Shakers).
3. Nextera XT DNA library preparation kit (Illumina
FC_131_1096). DNA/RNA UD Indexes Set A, Tagmenta-
tion (96 Indexes, 96 Samples, Illumina 20027214).
4. AMPure XP DNA beads for PCR cleanup (Beckman Coulter
A63880).
5. Nuclease-free water (Millipore-Sigma W4502-1L).
6. Qubit DNA high sensitivity kit (ThermoFisher).
7. Agilent High Sensitivity D1000 ScreenTape System (Agilent).
8. Hardware: 64-bit computer running Linux or Mac OS with at
least 8 GB of RAM.
9. Software: Trim galore! A wrapper around Cutadapt [12] and
FastQC.
10. Software: Bowtie2 [13] for short read sequencing alignment.
11. Software: SAMtools [14] for downstream analysis of alignment
files.
12. Software: deepTools [15] for the calculation of sequencing
depth and generation of dense and continuous data tracks.
13. Software: Integrated Genome Browser (IGB) [16] for viewing
BigWig files generated by deepTools.
14. Software: The bioinformatic tools required for the basic analy-
sis of the POP-seq data are found in https://github.com/
maxmicrobe/POP-seq.
Determination of the Chromatin Openness in Bacterial Genomes 65

3 Methods

3.1 Preparation of 1. Grow bacterial cultures to mid exponential phase


the Crosslinked Cell (OD600 = 0.3–0.5 for Escherichia coli) (see Note 1). The
Lysate volume of the culture depends on the density of the culture
(see Note 2).
2. Crosslink the cells by treatment with 1% formaldehyde for
10 min (see Note 3).
3. Quench the crosslinking reaction with 250 mM glycine (final
concentration) for 5 min. Centrifuge at maximum speed (i.e.,
13,000 × g), remove the supernatant, and place pellet on ice.
4. Resuspend cell pellets in 300 μL lysis buffer (75 mM NaCl,
25 mM EDTA pH 8, 20 nM Tris–HCl pH 8) and transfer the
solution to the Zr BashingBead tube.
5. Place the BashingBead tubes into the horizontal tube shaker
preincubated at 4 °C and run at maximum speed for 10 min.
6. Centrifuge the lysate for 10 min at 14,000 rpm and 4 °C to
remove the debris. Incubate on ice.
7. Take 1 μL of the lysate and measure the DNA concentration by
Qubit DNA high sensitivity kit (ThermoFisher) (see Note 4).
8. Dilute the DNA to 0.7 ng/μL. Remeasure the DNA concen-
tration to determine the exact concentration using Qubit (see
Note 5). Do not exceed 0.7 ng/μL, which is the total amount
of DNA input required for the library preparation kit.

3.2 Tagmentation 1. In a 0.2 mL PCR tube, mix 10 μL of Tagmentation DNA


and Preparation of the Buffer (TD), 5 μL of Amplicon Tagment Mix (ATM), and
PCR Mix 5 μL of the DNA input (0.7 ng total in 5 μL) (see Note 6).
2. Incubate the mix at 55 °C for 7 min in a preheated PCR
machine with the lid heated at 100 °C. Incubate on ice imme-
diately and add 5 μL of Neutralize Tagment buffer (NT) to
stop the Tn5 reaction.
3. Add 4.5 μL of the i7 adaptor and 4.5 μL of the i5 adaptor.
4. In a separate tube, make a master mix containing 15 μL of
Nextera PCR Master Mix (NPM) and 1 μL of 50× SYBR
Green. Add 16 μL to each sample and mix by pipetting.

3.3 Amplification of 1. Program the qPCR machine according to the following:


the POP-seq Libraries (a) 72 °C for 3 min
(b) 95 °C for 30 s
(c) 95 °C for 10 s
(d) 60 °C for 30 s
(e) 72 °C for 15 s + read tubes
66 Mahmoud M. Al-Bassam and Karsten Zengler

(f) 72 °C for 15 s
(g) Return to (c) for 20 times.
2. Transfer the 50 μL amplification mix into a qPCR tube and
start the amplification.
3. Watch the progress of the amplification and stop the reaction at
the end of the exponential phase when the curve starts to
plateau by simply pausing the program at 72 °C after the
reading in step “e” and removing the tube from the qPCR
machine (see Note 7). Incubate on ice.

3.4 Purification of 1. Mix 1.8× AMPure XP beads with PCR product in either 0.2 or
the Libraries 1.5 mL tubes (depending on the size of the magnet available in
your lab). For example, 90 μL of AMPure XP beads +50 μL of
PCR product. Incubate the mix at least 5 min at room
temperature.
2. Place on magnet till the solution is completely clear. While on
the magnet, remove the supernatant and add 170 μL of freshly
prepared 80% ethanol. Incubate for at least 30 s.
3. Remove the 80% ethanol and add another 170 μL of 80%
ethanol. Incubate for at least 30 min and remove the superna-
tant. Try to remove ethanol as much as possible, remove the
tubes from the magnet, and incubate at room temperature and
allow to dry for 3–4 min to remove any traces of ethanol.
4. Finally, resuspend the dried beads with 25 μL of DNase-free
water, mix well, and incubate at room temperature for 2 min.
Place on magnet.
5. Once the solution is clear, take 23 μL of the purified library and
transfer into 1.5 mL tube.

3.5 Measuring the 1. Measure the concentration of the library of each sample using
Library Concentration the Qubit high sensitivity DNA kit as described in Subheading
and Checking the 3.1, step 3.
Quality of the Libraries 2. Use a TapeStation (Agilent) to determine the average size and
the quality of the libraries. Both the Qubit concentration and
the library size are required to determine the molar concentra-
tion of each library. See Fig. 1 for an example of the POP-seq
library.

3.6 Alignment of the 1. Trim the libraries using specialized trimming software such as
Library Sequencing Cutadapt [12].
Reads to the Reference 2. Determine the quality of the libraries by using FastQC.
Genome
3. The trimmed fastq files can be used as input for fastq2wig2.pl
customed script written in Perl language that outputs “.wig”
files with the genome coverage normalized as counts per mil-
lion (CPM).
Determination of the Chromatin Openness in Bacterial Genomes 67

er

13
w

2
Lo

18

14
3000
Sample Intensity [Normalized FU]

2500

2000

1500

1000

500

0
Size

1000

1500
100 [bp]

200

300

400

500

700
25

50

Fig. 1 A snapshot from the TapeStation analysis software showing the lower and upper sizes of a typical
POP-seq library. The average library size is automatically calculated by the software

(a) Download the script from https://github.com/


maxmicrobe/POP-seq/blob/main/fastq2wig2.pl
(b) To run the script, first generate a Bowtie2 index file spe-
cific for the genome of interest, using this command:
bowtie2-build genome.fasta name_of_index.
(c) The fasta file should contain the entire genome sequence
in one entry. The header should be the accession number
of the genome.
(d) If the fasta file is not available, download the functions.py
tools from https://github.com/maxmicrobe/POP-seq/
blob/main/functions.py and run the “write_full_seq
()” function, which takes two arguments: the genbank
file path and the output path.
(e) The output of the fastq2wig2.pl script, a “.wig” file, can
be readily viewed via Integrated Genome Browser (IGB)
[16]. See Fig. 2 for an example of the output on IGB.
(f) A “.bed” file is typically required to view the starts and the
ends of genes or any other features. To generate the bed
file, download the gbk2bed.pl script from https://github.
com/maxmicrobe/POP-seq/blob/main/gbk2bed.pl.
This script only takes the genbank file as input and prints
the output on the screen. To save the output into a file,
add “> file_name” at the end of the command.
68 Mahmoud M. Al-Bassam and Karsten Zengler

WT1

WT2

WT3

IHF1

IHF2

IHF3

1065 1069 1076 1080 1084

Fig. 2 A snapshot from the IGB software showing a section of the E. coli BW25113 genome. The three tracks
with black signals represent biological replicates of POP-seq experiments in the wild type. The three tracks
with red signals are three biological replicates of congentic ihfB deletion mutants. The experiments are highly
reproducible for both strains. The genes colored in green have significantly higher Tn5 accession in the wild-
type strain compared to ihfB ( p-value <0.05). The locus_tags are shown for four genes

4 Notes

1. Harvesting cells at exponential phase is likely to give better


results because the cell growth is more synchronized. However,
stationary phase POP-seq has been performed in E. coli and is
equally successful.
2. In rich media, culture volumes can be as low as 200 μL. A total
of 5 μL of 0.16 ng/μL input is required for the Nextera XT
sequencing library preparation step.
3. The final concentration of formaldehyde can vary between 1%
and 3%. The fixation time can also vary between 10 and 30 min
depending on the bacterial strain. A good starting point is 1%
formaldehyde for 10 min.
4. We do not recommend using Nanodrop due to the lack of
sensitivity. The exact DNA concentration is very important
for reproducible results and successful library preparations.
5. We recommend using 5 μL of input into the Qubit reaction as
follows:
(a) 5 μL diluted DNA
(b) 1 μL Qubit dsDNA HS reagent
(c) 194 μL Qubit dsDNA HS buffer.
6. It is well documented that Tn5 has increased accessibility
towards high A/T regions. Therefore, including a pure DNA
negative control is highly recommended to normalize the
accessibility bias.
Determination of the Chromatin Openness in Bacterial Genomes 69

7. If you miss the 72 °C step, an extra cycle of amplification will


not harm, but avoid over-amplification as this will cause uneven
amplification of the amplicons and could affect the overall
outcome. Usually, the PCR ends by cycles 15–16; however,
amplification to cycle 20 is still acceptable if the plateau is not
reached.

References

1. Bannister AJ, Kouzarides T (2011) Regulation Microbiol Rev. https://doi.org/10.1111/


of chromatin by histone modifications. Cell 1574-6976.12045
Res. https://doi.org/10.1038/cr.2011.22 9. Dame RT, Rashid F-ZM, Grainger DC (2019)
2. Schep AN, Buenrostro JD, Denny SK, Chromosome organization in bacteria: mecha-
Schwartz K, Sherlock G, Greenleaf WJ (2015) nistic insights into genome structure and func-
Structured nucleosome fingerprints enable tion. Nat Rev Genet. https://doi.org/10.
high-resolution mapping of chromatin archi- 1038/s41576-019-0185-4
tecture within regulatory regions. Genome 10. McArthur M, Bibb M (2006) In vivo DNase I
Res 25:1757–1770. https://doi.org/10. sensitivity of the Streptomyces coelicolor chro-
1101/gr.192294.115 mosome correlates with gene expression:
3. Giresi PG, Kim J, McDaniell RM, Iyer VR, implications for bacterial chromosome struc-
Lieb JD (2007) FAIRE (Formaldehyde- ture. Nucleic Acids Res. https://doi.org/10.
Assisted Isolation of Regulatory Elements) iso- 1093/nar/gkl649
lates active regulatory elements from human 11. Tran NT, Laub MT, Le TBK (2017) SMC
chromatin. Genome Res. https://doi.org/10. progressively aligns chromosomal arms in Cau-
1101/gr.5533506 lobacter crescentus but is antagonized by con-
4. Boyle AP, Davis S, Shulha HP, Meltzer P, Mar- vergent transcription. Cell Rep. https://doi.
gulies EH, Weng Z, Furey TS, Crawford GE org/10.1016/j.celrep.2017.08.026
(2008) High-resolution mapping and charac- 12. Martin M (2011) Cutadapt removes adapter
terization of open chromatin across the sequences from high-throughput sequencing
genome. Cell. https://doi.org/10.1016/j. reads. EMBnetJournal. https://doi.org/10.
cell.2007.12.014 14806/ej.17.1.200
5. Buenrostro JD, Giresi PG, Zaba LC, Chang 13. Langmead B, Salzberg SL (2012) Fast gapped-
HY, Greenleaf WJ (2013) Transposition of read alignment with Bowtie 2. Nat Methods.
native chromatin for fast and sensitive epige- https://doi.org/10.1038/nmeth.1923
nomic profiling of open chromatin, 14. Li H, Handsaker B, Wysoker A, Fennell T,
DNA-binding proteins and nucleosome posi- Ruan J, Homer N, Marth G, Abecasis G, Dur-
tion. Nat Methods. https://doi.org/10. bin R (2009) The sequence alignment/map
1038/nmeth.2688 format and SAMtools. Bioinformatics 25:
6. Buenrostro JD, Wu B, Litzenburger UM, 2078–2079. https://doi.org/10.1093/bioin
Ruff D, Gonzales ML, Snyder MP, Chang formatics/btp352
HY, Greenleaf WJ (2015) Single-cell chroma- 15. Ramı́rez F, Ryan DP, Grüning B, Bhardwaj V,
tin accessibility reveals principles of regulatory Kilpert F, Richter AS, Heyne S, Dündar F,
variation. Nature 523:486–490. https://doi. Manke T (2016) deepTools2: a next genera-
org/10.1038/nature14590 tion web server for deep-sequencing data anal-
7. Stojkova P, Spidlova P, Stulik J (2019) ysis. Nucleic Acids Res. https://doi.org/10.
Nucleoid-associated protein Hu: a lilliputian 1093/nar/gkw257
in gene regulation of bacterial virulence. 16. Freese NH, Norris DC, Loraine AE (2016)
Front Cell Infect Microbiol. https://doi.org/ Integrated genome browser: visual analytics
10.3389/fcimb.2019.00159 platform for genomics. Bioinformatics 32:
8. Nolivos S, Sherratt D (2014) The bacterial 2089–2095. https://doi.org/10.1093/bioin
chromosome: architecture and action of bacte- formatics/btw069
rial SMC and SMC-like complexes. FEMS
Chapter 6

Profiling Chromatin Accessibility on Replicated DNA


with repli-ATAC-Seq
Kathleen R. Stewart-Morgan and Anja Groth

Abstract
Open or accessible chromatin typifies euchromatic regions and helps define cell type-specific transcription
programs. DNA replication massively disorders chromatin composition and structure, and how accessible
regions are affected by and recover from this disruption has been unclear. Here, we present repli-ATAC-seq,
a protocol to profile accessible chromatin genome-wide on replicated DNA starting from 100,000 cells. In
this method, replicated DNA is labeled with a short 5-ethynyl-2′-deoxyuridine (EdU) pulse in cultured
cells and isolated from a population of tagmented fragments for amplification and next-generation
sequencing. Repli-ATAC-seq provides high-resolution information on chromatin dynamics after DNA
replication and reveals new insights into the interplay between DNA replication, transcription, and the
chromatin landscape.

Key words repli-ATAC-seq, DNA replication, Transcription, Accessible chromatin, Chromatin


assembly, Nucleosome-free regions

1 Introduction

Transcriptionally active chromatin is characterized by an open


structure that permits entry of transcription factors and RNA poly-
merases to sites of transcription initiation, such as promoters and
enhancers [1]. Such sites are characterized by nucleosome deple-
tion, which renders DNA more accessible to transcription machin-
ery [2]. These regions are important regulatory features of the
chromatin landscape and, like the transcription programs they
reflect, are cell type-specific [3]. These “accessible” regions can be
profiled genome-wide using a number of strategies, including
DNase-seq, MNase-seq, and FAIRE-seq [4]. An attractive alterna-
tive to these methods, which are labor-intensive and require large
amounts of input material, is ATAC-seq, or assay for transposase-
accessible chromatin [5]. This method profiles accessible regions in
high resolution from low amounts of starting material and, with

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_6, © The Author(s) 2023

71
72 Kathleen R. Stewart-Morgan and Anja Groth

deep sequencing, can also inform on nucleosome positioning and


occupancy, especially in organisms with smaller genomes.
DNA replication compromises the chromatin landscape by
evicting proteins from DNA, temporarily disrupting chromatin
structure [6]. This includes nucleosomes, which are evicted prior
to replication fork passage and rapidly reassembled on the new
DNA strands. Recycled parental histones are distributed largely
symmetrically between daughter strands [7, 8], and in parallel,
newly synthesized histones are deposited to restore nucleosome
density [9]. Following replication, the chromatin landscape is
extensively remodeled and modified to restore the prereplicative
state [6].
DNA replication proceeds in a regulated, cell type-specific
manner, with euchromatic regions replicated early in S phase and
heterochromatic regions replicated late in S phase [10]. However,
this process is inherently heterogeneous, driven by stochastic repli-
cation initiation from a large number of possible initiation zones
(themselves found in accessible regions). This heterogeneity in the
replication program, and its duration across many hours in most
cell types, makes it impossible to use standard cell synchronization
methods, including cell sorting and drug-based approaches, to
address the transient changes in chromatin dynamics that occur in
the wake of replication in high resolution.
To investigate how these local effects manifest genome-wide,
multiple groups have developed methods that combine metabolic
labeling of DNA with next-generation sequencing approaches
[6]. Additionally, by chasing the DNA label, restoration of chro-
matin accessibility after replication can be investigated.
These novel methods have mainly adapted MNase-seq to inves-
tigate nucleosome positioning and occupancy in organisms with
small genomes, including Saccharomyces cerevisiae [11–13] and
Drosophila melanogaster [14]. In addition, analysis of subnucleoso-
mal MNase fragments was used to assess transcription factor
(TF) occupancy post-replication in D. melanogaster [14].
Here we describe replication-coupled ATAC-seq, or repli-
ATAC-seq [15] (Fig. 1), a novel method established in mammalian
cells to profile accessible chromatin genome-wide on replicated
DNA. Repli-ATAC-seq produces genome-wide accessibility pro-
files with thorough coverage of open chromatin sites, which show
high signal and generate clear peaks even in large genomes. By
filtering in silico specifically for subnucleosomal-length fragments,
repli-ATAC-seq can inform on transcription factor occupancy in
replicated chromatin. The protocol described here was developed
in mouse embryonic stem cells cultured in serum; cell culturing and
lysis conditions may require testing in other cell types. This proce-
dure, which can be completed in one day, starts from 100,000
asynchronous, cultured cells labeled with the thymidine analog
5-ethynyl-2′-deoxyuridine (EdU) for as little as 10 min, enabling
Repli-ATAC-seq 73

(EdU-labelled
Drosophila S2)

EdU
pulse

next-generation
nuclei Click-IT sequencing
tagmentation streptavidin
isolation + biotin pulldown

Fig. 1 Schematic of repli-ATAC-seq protocol. The cell type of interest is pulsed with EdU and harvested. If
using spike-in, freshly harvested, 100% EdU-labeled D. melanogaster S2 cells are mixed with the cells of
interest prior to nuclei isolation and lysis. DNA is digested with Tn5 transposase and EdU+ DNA fragments are
isolated through Click biotinylation and streptavidin conjugation. These fragments are then amplified and
sequenced using next-generation sequencing. (Adapted from Ref. [15])

high-resolution profiling of the nascent chromatin landscape. We


recommend using low-binding plasticware throughout the proto-
col to minimize sample loss. To follow chromatin accessibility
throughout chromatin maturation, we include a pulse/chase EdU
labeling strategy. To compare baseline accessibility between sam-
ples, an option to include native D. melanogaster S2 cell chromatin
as an internal spike-in control is described. Although classic ATAC-
seq in asynchronous cells can be informative for comparison, we
recommend that analyses compare nascent repli-ATAC-seq datasets
to fully mature datasets to control for any technical differences
introduced in the Click biotinylation and streptavidin pulldown
steps. Repli-ATAC-seq offers a novel approach to address the inter-
play between DNA replication and transcription and provides the
means to profile DNA replication-induced changes in rare and
dynamic cell populations.

2 Materials

2.1 Equipment 1. Cell culture hood.


2. Hemacytometer or automated cell counter such as Countess
(ThermoFisher).
3. Microcentrifuge.
4. Thermomixer.
5. Magnetic 1.5 mL tube rack.
6. Rotator with side movement.
7. Qubit 2.0 Fluorometer (Life Technologies).
8. BioAnalyzer (Agilent).
9. Thermocycler.
74 Kathleen R. Stewart-Morgan and Anja Groth

2.2 Reagents 1. Cell culture reagents (cell type-specific).


2. 1X Phosphate-buffered saline (PBS).
3. Trypsin.
4. PCR-grade H2O.
5. 100% ethanol.
6. 1.5 mL LoBinding tubes.
7. LoBinding pipette tips.
8. Illumina Tagment DNA Enzyme and Buffer Kit (Illumina).
9. Qiagen MinElute PCR Purification Kit (Qiagen).
10. EdU, 20 mM stock dissolved in DMSO and aliquoted at -20 °
C (ThermoFisher).
11. Click-IT cell reaction buffer kit (ThermoFisher).
• Click-IT cell buffer additive (Component C) (80 mg): Dis-
solve in 400 μL deionized H2O (100X) and store in 200 μL
aliquots at -20 °C for up to 1 year.
12. THPTA, 50 mM stock in PCR-grade H2O (Sigma).
13. Picolyl-Azide-PEG4-Biotin, 100 mM stock dissolved in
DMSO and stored at 4 °C (Jena Bioscience).
14. Qiagen MinElute Reaction Cleanup kit (Qiagen) (optional,
only for making unbound libraries).
15. Agencourt AMPure XP beads (Beckman Coulter).
16. Myone T1 streptavidin beads (ThermoFisher).
17. NEB Next High-Fidelity 2X PCR Master Mix (NEB).
18. Qubit HS assay (ThermoFisher).
19. Agilent HS DNA kit (Agilent).
20. Triton-X 100.
21. Tween 20.
22. 10 mM Tris–HCl pH 7.5.

2.2.1 For Thymidine Thymidine, 10 mM stock dissolved in deionized H2O and ali-
Chase to Study Chromatin quoted at -20 °C (Sigma).
Maturation

2.2.2 For D. 1. D. melanogaster S2 cells.


melanogaster Spike-in 2. Shields and Sang M3 Insect Medium (Sigma).
3. KHCO3 (Sigma).
4. Yeast Extract (Sigma).
5. Bactopeptone (BD).
6. Fetal Calf Serum (GE Hyclone).
7. Penicillin/Streptomycin (GIBCO).
8. Cell incubator at 25 °C.
Repli-ATAC-seq 75

2.3 Buffers 1. Buffer A without protease inhibitors: 10 mM HEPES pH 7.9,


10 mM KCl, 1.5 mM MgCl2, 0.34 M sucrose, 10% glycerol,
0.05% Triton-X. Store filtered and without Triton-X at 4 °C for
long-term use and add Triton-X just before use.
2. Lysis buffer: 10 mM Tris–HCl pH 7.4, 10 mM NaCl, 3 mM
MgCl2, 0.3% Igepal. Store without Igepal at 4 °C for up to 24 h
before use and add Igepal just before use. Discard any remain-
ing buffer after using.
3. Tagmentation Stop Buffer (TSB): 50 mM Tris pH 8, 10 mM
EDTA pH 8, 1% SDS. Store at RT.
4. 2× B & W buffer: 10 mM Tris–HCl pH 7.5, 1 mM EDTA, 2 M
NaCl, 0.1% Tween 20.
5. 1× B & W buffer: 5 mM Tris–HCl pH 7.5, 0.5 mM EDTA, 1M
NaCl, 0.05% Tween 20.
6. Elution Buffer: 10 mM Tris–HCl pH 8.5.
7. EBT Buffer: 10 mM Tris–HCl pH 8.5, 0.05% Tween 20.

3 Methods

3.1 EdU Labeling 1. Seed 2 × 106 mESCs per sample on a 10 cm gelatinized dish in
10 mL of appropriate media and grow them for 24 h at 37 °C,
5% CO2 (see Note 1).
2. After preparing reagents (see Note 2), begin EdU labeling.

Option 1: Nascent Repli-ATAC-seq


1. Replace culturing media with 10 mL EdU media (final concen-
tration: 20 μM) in the corresponding dish, swirl to disperse,
and incubate at 37 °C for 10 min (see Notes 3 and 4).
2. After 10 min, aspirate media and wash twice in 10 mL warmed
1X PBS.
3. Add 2 mL trypsin and return to 37 °C for 1–2 min. Take the
cold media out of the fridge and put it under the hood.
4. Check under a microscope for cell detachment. When cells are
detached, add 6 mL of COLD media to stop trypsin digestion.
5. Transfer cells to a 15 mL Falcon tube.
6. Take 10 μL of the cell suspension for counting using a hema-
cytometer or an automated cell counter.
(a) Goal: 100,000 cells for repli-ATAC-seq per sample.
7. Transfer 100,000 cells into a low-binding 1.5 mL tube. Spin
cells at 500 × g for 5 min at 4 °C. From this step, keep the tubes
on ice (see Note 5).
76 Kathleen R. Stewart-Morgan and Anja Groth

Option 2: Mature Repli-ATAC-seq


1. Replace culturing media with 10 mL EdU media (final concen-
tration: 20 μM) in the corresponding dish, swirl to disperse,
and incubate at 37 °C for 10 min (see Notes 3 and 4).
2. After 10 min, aspirate media and wash twice in 10 mL
warmed PBS.
3. Add 10 mL warmed media with 10 μM thymidine to the plate.
Return to incubator for the desired maturation time.
4. After the desired maturation interval (see Note 6), aspirate
media and wash twice in 10 mL warmed PBS.
5. Add 2 mL trypsin and return to 37 °C for 1–2 min. Take the
cold media out of the fridge and put it under the hood.
6. Check under a microscope for cell detachment. When cells are
detached, add 6 mL of COLD media to stop trypsin digestion.
7. Transfer cells to a 15 mL Falcon tube.
8. Take 10 μL of the cell suspension for counting using a hema-
cytometer or an automated cell counter.
(a) Goal: 100,000 cells for ATAC-seq per sample.
9. Transfer 100,000 cells into a low-binding 1.5 mL tube. Spin
cells at 500 × g for 5 min at 4 °C. From this step, keep the tubes
on ice (see Note 5).
Option 3: D. melanogaster Spike-in
It may be useful to include an internal control of accessibility on
replicated DNA. This can be done by EdU-labeling cultured
D. melanogaster S2 cells for 40 h (see Note 7) prior to labeling
repli-ATAC-seq dishes and adding 100 labeled cells to trypsinized
and aliquoted mESCs prior to lysis:
1. Culture D. melanogaster S2 cells in suspension following gen-
eral procedures as described by the Drosophila Genomics
Resource Center [16].
2. Replace culturing media with EdU media (final concentration:
10 μM) in the corresponding dish, swirl to disperse, and incu-
bate at 25 °C for 40 h. Time the EdU labeling such that S2 cells
will be fully labeled at the time of labeling nascent and any
mature repli-ATAC-seq samples.
3. Transfer cells to 50 mL tubes and spin at 300 × g for 5 min.
4. Aspirate supernatant and wash cells in an equivalent volume of
ice-cold 1X PBS.
5. Repeat wash.
Repli-ATAC-seq 77

6. Take 10 μL of the cell suspension for counting using a hema-


cytometer or an automated cell counter.
(a) Goal: 100 cells per repli-ATAC-seq sample.
7. Transfer 100 fresh, EdU-labeled S2 cells into tubes containing
cells prepared in parallel for repli-ATAC-seq. Place tubes on ice
and proceed to cell lysis.

3.2 Cell Lysis 1. To prepare, aliquot 995 μL of Buffer A into a 1.5 mL tube and
add 5 μL of 10% Triton-X 100 to the aliquot. Invert to mix. For
steps 2 and 3, work in a 4 °C room.
2. Carefully remove media (see Note 8) from labeled cells and add
200 μL of cold 1X Buffer A with Triton-X. Pipet up and down
5 times to resuspend, being careful to avoid creating bubbles.
3. Incubate on ice for 7 min. Lay tube on ice in the cold room, but
avoid burying the tube in the ice (see Note 9).
4. Pellet nuclei by spinning at 1300 × g for 5 min at 4 °C and
carefully remove lysis buffer as in step 2, here using a P200 set
to 198 μL and gel-loading tips to aspirate all supernatant.
5. Add 100 μL of cold 1X lysis buffer and pipet up and down
10 times to resuspend, being careful to avoid creating bubbles.
6. Split sample into two 1.5 mL low-binding tubes each contain-
ing 50 μL lysate (equivalent to 50,000 cells) each.
7. Vortex samples for 10 s on medium-high strength.
8. Incubate on ice for 15 min at room temperature (RT) (bury
tubes in ice).
9. After incubation, vortex tubes for 10 s at medium-high
strength again. Pellet nuclei by spinning at 600 × g for
10 min at 4 °C and carefully remove lysis buffer as in steps
2 and 4, here using a P200 set at 48 μL and gel-loading tips to
aspirate all supernatant.

3.3 Transposase 1. Combine 2.5 μL 2X TD buffer and 2.5 μL transposase (TDE1)


Digestion (from Illumina kit) per tube or as a master mix and aliquot into
tubes on ice.
2. Pipet up and down 10 times to resuspend. Vortex on medium-
high strength for 10 s.
3. Incubate for exactly 30 min at 37 °C in a thermocycler shaking
at 1200 rpm.
4. After incubation, combine the two transposase digestions from
each sample into one 1.5 mL low-binding tube (final volume:
10 μL).
5. Add 90 μL TSB to the combined digestions (final volume:
100 μL).
78 Kathleen R. Stewart-Morgan and Anja Groth

6. Purify with Qiagen MinElute PCR Purification Kit. To elute,


add 80 μL PCR-grade H2O to columns and incubate for 5 min
at RT. Spin at maximum speed for 1 min and then re-elute
samples by adding the eluate back onto its respective column
and incubating for a further 5 min at RT. Spin at maximum
speed for 1 min and proceed to Click biotinylation (total
volume, accounting for loss: approximately 78 μL) (see
Note 10).

3.4 Click 1. Prepare THPTA–CuSO4 premix by mixing 1 μL 50 mM


Biotinylation THPTA and 0.1 μL 100 mM CuSO4 per sample in a separate
1.5 mL low-binding tube.
2. Prepare 10X buffer additive by mixing 1 μL 100X buffer addi-
tive and 9 μL PCR-grade H2O per sample in a separate tube.
3. Set up the Click reaction by adding the reagents to the purified
DNA in the following order: 10 μL 10X Click-IT buffer, 0.5 μL
100 mM picolyl-azide-PEG4-biotin, 1.1 μL THPTA–CuSO4
premix, 10 μL 10X buffer additive (see Note 11).
4. Incubate for 30 min at RT.
5. During incubation, equilibrate AMPure beads at RT for
30 min prior to use. Keep AMPure beads at RT to use after
Library Amplification (see Subheading 3.6).
6. To purify DNA, add 55 μL equilibrated AMPure beads to each
sample (0.55:1 bead ratio).
7. Mix thoroughly by vortexing.
8. Incubate the tube(s) at RT for 10 min to bind large, unwanted
DNA fragments to the beads.
9. During incubation, prepare another silconized 1.5 mL tube
with 245 μL AMPure beads.
10. During incubation, prepare 400 μL of 80% ethanol per sample.
11. During incubation, warm a thermoblock to 37 °C.
12. Place the tube(s) on the magnet to capture the beads. Incubate
until the liquid is clear.
13. Carefully remove the supernatant and transfer it to the
corresponding prepared tube containing AMPure beads (3:1
final ratio). Discard tube(s) containing used beads (see
Note 12).
14. Incubate tube(s) at RT for 10 min to bind the desired DNA
fragments to the beads.
15. Place the tube(s) on the magnet to capture the beads. Incubate
until the liquid is clear.
16. Carefully remove and discard the supernatant.
17. Keeping the tube(s) on the magnet, add 200 μL of freshly
prepared 80% ethanol. On the rack, turn the tubes 180°,
Repli-ATAC-seq 79

forcing the beads through the ethanol to the opposite wall of


the tube.
18. Incubate the tube(s) on the magnet at RT for ≥30 s.
19. Carefully remove and discard the ethanol.
20. Repeat steps 17–19 once. Try to remove all residual ethanol
without disturbing the beads, using a P10 pipette if necessary.
21. Dry the beads at RT for 1–2 min. Caution: Avoid overdrying of
the beads, as it may result in dramatic yield loss.
22. Remove the tube(s) from the magnet. Resuspend the beads in
52 μL of Elution Buffer.
23. Put the tube(s) with lid(s) open in a warmed thermoblock at
37 °C. Cover with a top of a tip box or a piece of aluminum foil
to prevent contamination of open tubes.
24. Incubate for 5–10 min to elute DNA and evaporate residual
ethanol.
25. Place the tube(s) on the magnet to capture the beads. Incubate
until the liquid is clear.
26. Carefully transfer 50 μL of the supernatant to a new
low-binding tube.

3.5 Streptavidin 1. Resuspend the stock of Myone T1 streptavidin beads by


Pulldown vortexing.
2. Pipet 20 μL of bead suspension per sample into a 1.5 mL DNA
low-binding tube. Pellet the beads using a magnetic rack
(≥30 s). Remove and discard the supernatant.
3. Remove the tube from the magnetic rack and add 200 μL of 1X
B & W buffer. Mix by pipetting. Place the tube back to the
magnetic rack to pellet the beads. Remove and discard the
supernatant.
4. Repeat 1X B & W buffer wash 3 times.
5. Resuspend washed streptavidin beads in 50 μL 2X B & W
buffer per sample.
6. Add 50 μL resuspended streptavidin beads to each sample (final
B & W concentration 1X). Mix by pipetting.
7. Incubate tubes for 30 min at RT on a tube rotator. Ensure
beads are continually in suspension.
8. Spin tubes briefly. Pellet beads on a magnetic rack. Remove the
supernatant (see Note 13).
9. Wash beads with 200 μL 1X B & W buffer and mix by
pipetting.
10. Pellet the beads using a magnetic rack (≥30 s). Remove and
discard the supernatant.
80 Kathleen R. Stewart-Morgan and Anja Groth

11. Repeat steps 9 and 10 3 times, waiting 1 min off the magnetic
rack between washes. Perform washes on maximum 4–6 reac-
tions at a time to avoid overdrying the beads.
12. Wash beads as in steps 9 and 10 twice with 200 μL EBT Buffer.
13. Wash beads as in steps 9 and 10 once with 200 μL 10 mM
Tris–HCl pH 7.5.
14. Pellet the beads on a magnetic rack and carefully remove all
supernatant.
15. Resuspend the beads in 10 μL PCR-grade H2O, transfer to a
0.2 mL low-binding tube, and keep on ice. Proceed to Library
Amplification (see Note 14).

3.6 Library 1. Set up the PCR reaction by adding the following reagents to
Amplification (on bead-bound DNA in 0.2 mL tubes: 1.25 μL 25 μM Primer
Beads) 1, 1.25 μL 25 μM Primer 2 (see Note 15), 12.5 μL NEB Next
High-Fidelity 2X PCR Master Mix.
2. Vortex to mix and spin down briefly.
3. Amplify libraries using the following conditions: 72 °C, 5 min;
98 °C, 30 s; 12 cycles of: 98 °C, 10 s; 63 °C, 30 s; 72 °C, 30 s;
4 °C hold.
4. Add 25 μL PCR-grade H2O to each library (final volume:
50 μL).
5. To purify libraries, add 80 μL equilibrated AMPure beads to
each sample (1.6:1 bead ratio).
6. Mix thoroughly by vortexing.
7. Incubate the tube(s) at RT for 10 min to bind DNA fragments
to the beads.
8. During incubation, prepare 400 μL of 80% ethanol per sample.
9. During incubation, warm a thermoblock to 37 °C.
10. Place the tube(s) on the magnet to capture the beads. Incubate
until the liquid is clear.
11. Carefully remove and discard supernatant.
12. Keeping the tube(s) on the magnet, add 200 μL of freshly
prepared 80% ethanol.
13. Incubate the tube(s) on the magnet at RT for ≥30 s, turning
the tubes 180° to ensure all beads pass through the ethanol.
14. Carefully remove and discard the ethanol.
15. Repeat steps 12–14 once. Try to remove all residual ethanol
without disturbing the beads, using a P10 pipette if necessary.
16. Dry the beads at RT for 1–2 min. Caution: Avoid overdrying of
the beads, as it may result in dramatic yield loss.
17. Remove the tube(s) from the magnet. Resuspend the beads in
12 μL of Elution Buffer.
Repli-ATAC-seq 81

Fig. 2 Representative BioAnalyzer profiles of a repli-ATAC-seq library (top) and


its matched, unbound control library (bottom)

18. Put the tube(s) with open lids in a warmed thermoblock at 37 °


C. Cover with a top of a tip box or a piece of aluminum foil to
prevent anything from falling into the open tubes.
19. Incubate for 5–10 min to elute DNA and evaporate residual
ethanol.
20. Place the tube(s) on the magnet to capture the beads. Incubate
until the liquid is clear.
21. Carefully transfer 10 μL of the supernatant to a new 1.5 mL
low-binding tube.

3.7 Quality Control Prior to sequencing, perform quality control of repli-ATAC-seq


libraries by quantifying library concentration with Qubit and
checking library fragment length distribution using an Agilent
BioAnalyzer or an equivalent fragment analyzer (Fig. 2).

3.8 Sequencing and Libraries can be sequenced on an appropriate sequencing platform


Analysis (repli-ATAC-seq was developed using Illumina NextSeq 500).
Paired-end sequencing will provide both positional information
and fragment length for all reads; single-end sequencing will pro-
vide positional information only. To enable analysis specifically of
subnucleosomal fragments, we therefore prefer paired-end
sequencing, though this may not be necessary for all research
82 Kathleen R. Stewart-Morgan and Anja Groth

questions. After sequencing, quality assessment using FastQC and


adaptor trimming using TrimGalore! or a similar software is recom-
mended. PCR duplicates, reads mapping to the mitochondrial
genome, and reads mapping to any relevant sequencing blacklists
(e.g., the 10 mm sequencing blacklist [17]) should be discarded, as
should reads with a MAPQ score < 20. The remaining reads can be
processed using peak-calling and other standard bioinformatic ana-
lyses. If D. melanogaster spike-in was employed, map to both the
target and D. melanogaster genomes. From the D. melanogaster
Binary Alignment Map (BAM) file, calculate the total number of
unique reads. Calculate a spike-in normalization factor for each
sample by dividing 106 by the total number of unique reads from
D. melanogaster. To quantify spiked-in samples by reference-
adjusted reads per million (RRPM), calculate the coverage of each
bin or region of interest in the target genome by computing the
number of unique reads per bin. Then, multiply by the spike-in
normalization factor, prior to any log transformation or further
manipulation of the data.

4 Notes

1. When setting up repli-ATAC-seq experiments, inclusion of an


unlabeled, EdU-negative dish processed in parallel can be use-
ful to ensure the experiment is free from any contamination
from unlabeled DNA fragments in repli-ATAC-seq libraries.
2. Prior to beginning EdU labeling, ensure the following reagents
are ready and at the appropriate temperature:
• Microcentrifuge is cooled to 4 °C.
• Thermomixer is warmed to 37 °C.
• 6 mL of appropriate media per dish is cooled to 4 °C.
• 2 mL trypsin per dish is warmed to 37 °C.
• 20 mL 1X sterile PBS per dish is warmed to 37 °C.
• Lysis buffer is freshly prepared and kept cold on ice.
• EdU media is prepared: 10 mL appropriate media with
20 μM EdU per 10 cm dish, prepared and warmed to 37 °C.
• If generating mature samples (Option 2), prepare 10 mL of
appropriate media with 10 μM thymidine per dish and warm
to 37 °C.
3. For pulse/chase experiments, it is recommended to stagger
your EdU labeling such that all dishes are ready for collection
at the same time because the first pause point following EdU
labeling is after 2 h of processing.
4. The length of EdU labeling may need to be optimized depend-
ing on the proliferation rate of the cell type of interest.
Repli-ATAC-seq 83

5. The remaining cells can be kept short-term on ice and pro-


cessed, e.g., as FACS controls during the transposase digestion.
6. One cell cycle post-pulse, EdU-labeled loci will replicate again;
maturation times should therefore be well below one cell cycle
length for the cell type of interest.
7. Labeling S2 cells with EdU for 40 h will ensure genome-wide
labeling.
8. It is critical to avoid disturbing or losing the cell pellet. To do
this, first pipet out 900 μL supernatant using a P1000, then
switch to a P200 set to 98 μL and, using gel-loading tips to
pipet from the bottom of the sample while not disturbing the
pellet, aspirate the remaining supernatant.
9. During incubation, the remaining, saved cells can be spun
500 × g for 5 min, and resuspended in 300 μL PBS. To fix,
add 700 μL ice-cold 100% EtOH dropwise while vortexing on
low and save at 4 °C for FACS labeling.
10. Purified DNA can be quantitated for quality control at this
stage. It can be kept short-term (1–2 days) at 4 °C or frozen at
-20 °C for long-term storage, but library quality seems poorer
when this is done. The recommendation is to continue imme-
diately to Click biotinylation.
11. The order in which the reagents for Click biotinylation are
added to samples is important because the addition of the
buffer additive starts the Click reaction. It is not recommended
to make a master mix of Click-IT reagents.
12. This double size-selection removes fragments larger than
approximately 600 bp, since longer fragments are both more
difficult to sequence on Illumina platforms and generally less
informative for accessibility studies. If these fragments would
be informative in specific experiments, a 1.8:1 ratio in a single
size-selection could be used to preserve and purify fragments of
all lengths.
13. If desired, the supernatant can be removed and saved in a new
1.5 mL tube. This contains tagmented, non-EdU-labeled
DNA fragments and can be used to generate “unbound”
libraries. Unbound libraries are virtually interchangeable in
terms of coverage with standard bulk ATAC-seq libraries, and
can complement repli-ATAC-seq libraries. To create unbound
libraries, purify supernatant using the MinElute Reaction
Cleanup Kit, eluting in 12 μL PCR-grade H2O or EB buffer.
Amplify and purify libraries as described in Subheading 3.6,
except use 8 cycles of PCR instead of 12 to amplify libraries.
14. Streptavidin-captured DNA can be stored 4 °C for a short
time, but it is recommended to proceed directly to PCR after
streptavidin capture.
84 Kathleen R. Stewart-Morgan and Anja Groth

15. Primer sequences are from ref. [5], except that in repli-ATAC-
seq the final two nucleotides in each primer are joined by a
phosphorothioate bond. Primer 2 is indexed for multiplexing
sequencing lanes.

References
1. Core LJ, Martins AL, Danko CG et al (2014) 9. Annunziato AT (2013) Assembling chromatin:
Analysis of nascent RNA identifies a unified the long and winding road. Biochim Biophys
architecture of initiation regions at mammalian Acta 1819:196–210
promoters and enhancers. Nat Genet 46: 10. Marchal C, Sima J, Gilbert DM (2019) Con-
1311–1320 trol of DNA replication timing in the 3D
2. Li W, Notani D, Rosenfeld MG (2016) Enhan- genome. Nat Rev Mol Cell Biol 20:721–737
cers as non-coding RNA transcription units: 11. Fennessy RT, Owen-Hughes T (2016) Estab-
recent insights and future perspectives. Nat lishment of a promoter-based chromatin archi-
Rev Genet 17:207–223 tecture on recently replicated DNA can
3. Boyle AP, Davis S, Shulha HP et al (2008) accommodate variable inter-nucleosome
High-resolution mapping and characterization spacing. Nucleic Acids Res 44:7189–7203
of open chromatin across the genome. Cell 12. Vasseur P, Tonazzini S, Ziane R et al (2016)
132:311–322 Dynamics of nucleosome positioning matura-
4. Furey TS (2013) ChIP-seq and beyond: new tion following genomic replication. Cell Rep
and improved methodologies to detect and 16:2651–2665
characterize protein-DNA interactions. Nat 13. Gutierrez MP, MacAlpine HK, MacAlpine DM
Rev Genet 13:840–852 (2019) Nascent chromatin occupancy profiling
5. Buenrostro JD, Giresi PG, Zaba LC et al reveals locus- and factor-specific chromatin
(2013) Transposition of native chromatin for maturation dynamics behind the DNA replica-
fast and sensitive epigenomic profiling of open tion fork. Genome Res 29:1123–1133
chromatin, DNA-binding proteins and nucleo- 14. Ramachandran S, Henikoff S (2016) Tran-
some position. Nat Methods 10:1213 scriptional regulators compete with nucleo-
6. Stewart-Morgan KR, Petryk N, Groth A somes post-replication. Cell 165:580–592
(2020) Chromatin replication and epigenetic 15. Stewart-Morgan KR, Reverón-Gómez N,
cell memory. Nat Cell Biol 22:361–371 Groth A (2019) Transcription restart estab-
7. Petryk N, Dalby M, Wenger A et al (2018) lishes chromatin accessibility after DNA repli-
MCM2 promotes symmetric inheritance of cation. Mol Cell 75:284–297.e6
modified histones during DNA replication. Sci- 16. Luhur A, Klueg KM, Roberts J, Zelhof AC
ence 361:1389–1392 (2019) Thawing, culturing, and cryopreser-
8. Yu C, Gan H, Serra-Cardona A et al (2018) A ving drosophila cell lines. JoVE 146. https://
mechanism for preventing asymmetric histone doi.org/10.3791/59459
segregation onto replicating DNA strands. Sci- 17. ENCODE Project Consortium (2012) An
ence 361:1386–1389 integrated encyclopedia of DNA elements in
the human genome. Nature 489:57–74

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
Chapter 7

Analysis of Chromatin Interaction and Accessibility by


Trac-Looping
Shuai Liu, Qingsong Tang, and Keji Zhao

Abstract
Spatial organization of the genome modulates pivotal biological processes. The emerging new technologies
have provided novel insights into genome structure and its role in regulating cell activities. To examine the
genome-wide chromatin interactions at accessible chromatin regions, we developed a DNA transposase-
mediated analysis of chromatin looping (Trac-looping) method for simultaneously detecting chromatin
interactions and chromatin accessibility. Here, we describe a detailed protocol of generating Trac-looping
libraries.

Key words Genome structure, Tn5, Trac-looping, Chromatin looping, Chromatin accessibility

1 Introduction

Genome architecture plays an important regulatory role in orches-


trating spatial and temporal gene expression [1–4]. Our under-
standing of genome organization has been revolutionized by the
development of cutting-edge technologies for probing chromatin
structures [5–22]. The spatial organization of the genome has a
hierarchical pattern [5, 23]. Each chromosome resides preferen-
tially in separated territory [11]. Based on the openness status of
chromatin regions, they are separated into “compartment A”
(open) and “compartment B” (closed) [11, 12, 24]. At the scale
of hundreds of kilobases to several megabases, chromatins fold into
topologically associating domains (TADs), which are enriched of
chromatin interactions within the domain [25–27]. Spatial chro-
matin interactions are realized by the formation of chromatin loops
[28–32].
The temporospatial gene expression essential for cellular differ-
entiation is controlled by key transcription regulatory elements
such as enhancers that form chromatin loops with target promoters
for their functions. To specifically examine the genome-wide

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_7,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

85
86 Shuai Liu et al.

chromatin looping among promoters and enhancers, we developed


a DNA transposase-mediated analysis of chromatin looping (Trac-
looping) method [33] by utilizing the DNA transposase Tn5 that
preferentially targets accessible chromatin regions [34]. Briefly,
Trac-looping first uses Tn5 to insert a bivalent linker to two spa-
tially proximal accessible chromatin regions, which covalently joints
the two regions together. After digesting the genomic DNA with
restriction enzymes, DNA fragments with the inserted linker are
circularized and subjected to rolling circle amplification. The
library is further amplified and indexed by PCR, then sequenced
on NGS platforms (Fig. 1). With relatively low-sequencing depth,
Trac-looping can produce high-resolution genome-wide chroma-
tin interaction maps together with chromatin accessibility informa-
tion. In this chapter, we will introduce the detailed experimental
procedure of Trac-looping.

2 Materials

2.1 Reagents 1. Competent cells BL21 Gold (DE3) (Agilent, cat# 230132).
2. pET15b-His6Tnp (Addgene plasmid #79807).
3. Nuclease-free water (Life Technologies, cat# AM9930).
4. 1 M Tris–HCl pH 7.4 (Quality Biological, cat# 351-006-101).
5. 1 M Tris–HCl pH 8.0 (KD medical, cat# RGF-3360).
6. 1× PBS (Corning Incorporated – Life Sciences, cat#
21-040-CV).
7. 0.5 M EDTA (Quality Biological, cat# 351-027-721).
8. Ethyl alcohol (Warner-Graham, cat# 64-17-5). Caution:
Highly flammable.
9. Isopropanol (MG Scientific, cat# 6810008227637). Caution:
Highly flammable.
10. 16% Formaldehyde (w/v), Methanol-free (Thermo Fisher Sci-
entific, cat# 28906 or 28908).
11. Phenol–Chloroform (Amresco, cat# 0883-100ML). Caution:
Use in chemical hood.
12. Ni-NTA agarose bead slurry (Qiagen, cat#1018244).
13. Imidazole (Sigma, cat# I5513-25G).
14. ATP (Sigma, cat# A7699-1G).
15. 100% Glycerol (Invitrogen, cat# 15514-011).
16. 1 mg/mL Tn5 transposase (homemade, describe in the
methods).
17. 5 M Sodium Chloride, Molecular Biology Grade (Promega,
cat# V4221).
Analysis of Chromatin Interaction and Accessibility by Trac-Looping 87

Fig. 1 Schematic representation of the main steps of Trac-looping


88 Shuai Liu et al.

18. 10% SDS (KD medical, cat# RGE-3230).


19. 20 mg/mL Protease K (Sigma, cat# 3115828001).
20. 20 mg/mL RNase A (Fisher Scientific, cat# 12091021).
21. 20 mg/mL Glycogen (Millipore Sigma, cat# 10901393001).
22. 3 M Sodium Acetate (pH 5.2) (Quality Biological, cat#
351-035-721).
23. dNTP Mix (10 mM EA, Thermo Fisher Scientific, cat#
18427089).
24. EB buffer (Qiagen, cat# 19086).
25. AMPure XP beads (Beckman Coulter, cat# A63880).
26. NlaIII (10 U/μL, New England BioLabs, cat# R0125L).
27. MluCI (10 U/μL, New England BioLabs, cat# R0538L).
28. Dynabeads™ MyOne™ Streptavidin C1 beads (Fisher Scien-
tific, cat# 65001).
29. 100% Triton X-100 (Sigma, cat# T8787-250ML).
30. 10% Nonidet P40 (Sigma, cat# 11332473001).
31. 100% Tween 20 (Sigma, cat# P9416-100ML).
32. T7 DNA ligase with 2× T7 DNA ligase buffer (3000 U/μL
NEB, cat# M0318L).
33. 15 mg/mL GlycoBlue (Thermo Fisher, cat# AM9515).
34. TempliPhi Amplification Kit (Sigma, cat# GE25-6400-10).
35. Qubit dsDNA HS Assay Kit (Fisher Scientific, cat# Q32851).
36. E-Gel® EX Gel, 2%, 20-Pak (Thermo Fisher Scientific, cat#
G402002).
37. Phusion® High-Fidelity PCR Master Mix with HF Buffer
(New England BioLabs, cat# M0531S).
38. NEBuffer 2 (New England BioLabs, cat# B7002S).
39. FBS (fetal bovine serum, heat inactivated) (Sigma-Aldrich, cat#
F4135-500ML).
40. 1 kb Plus DNA Ladder (Invitrogen, cat# 10488090).
41. MinElute Gel Extraction Kit (Qiagen, cat# 28604).

2.2 Buffers 1. Bacteria lysis buffer (50 mM Tris–HCl pH 8.0, 300 mM NaCl,
20 mM Imidazole, 0.1% Triton X-100, 10 μg/mL Pepstatin A
(Calbiochem, cat# 516481), 10 μg/mL Leupeptin Hemisul-
fate (Calbiochem, cat# 108975), 10 μg/mL Chymostatin
(Calbiochem, cat# 230790), 6 μg/mL Antipain Dihydrochlor-
ide (Sigma, cat# A6191), 1 mg/mL lysozyme (Millipore, cat#
4403)).
2. Ni-NTA beads wash buffer (50 mM Tris–HCl pH 8.0, 1 M
NaCl, 20 mM Imidazole, 0.1% Triton® X-100).
Analysis of Chromatin Interaction and Accessibility by Trac-Looping 89

3. Tnp elution buffer (50 mM Tris–HCl pH 8.0, 1 M NaCl,


250 mM Imidazole, 0.1% Triton® X-100).
4. 10× annealing buffer (0.5 M Tris–acetate pH 7.5, 1.5 M potas-
sium acetate, 40 mM spermidine).
5. 1.25 M glycine: dissolve 4.68 g glycine (Sigma, cat# G8898) in
50 mL ddH2O and filter (Millipore, cat# SLHA033SS)
sterilize it.
6. Lysis buffer: 10 mM Tris–HCl pH 7.4, 10 mM NaCl, 3 mM
MgCl2, 0.5% NP-40.
7. 10× Tn5 reaction buffer: 0.5 M Tris–acetate (pH 7.5), 1.5 M
potassium acetate, 100 mM magnesium acetate, 40 mM
spermidine.
8. EB: 10 mM Tris–HCl pH 8.0.
9. 1× B/W/T buffer: 5 mM Tris–HCl pH 7.5, 0.5 mM EDTA,
1 M NaCl, 0.1% Triton X-100.
10. 2× B/W buffer: 10 mM Tris–HCl pH 7.5, 1 mM EDTA,
2 M NaCl.
11. E-Gel® EX Agarose Gels, 2% (Invitrogen, cat# G401002).
12. 2× T7 DNA ligase reaction buffer: prepared according to New
England BioLabs (NEB) 1× buffer. 1× T7 DNA Ligase Reac-
tion Buffer: 66 mM Tris–HCl, 10 mM MgCl2, 1 mM ATP
(Sigma, cat# A7699-1G), 1 mM DTT, 7.5% Polyethylene
glycol (PEG 6000, Sigma, cat# 81260), pH 7.6 at 25 °C.

2.3 Equipment 1. Thermal mixer (Eppendorf thermal mixer R).


2. Rotating mixer or Tube revolver (Thermo Scientific, Model
No. 88881001).
3. Vortex mixer (Benchmark Scientific Inc., cat# BV1003).
4. Microcentrifuge (Eppendorf centrifuge 5424).
5. Sonicator (Misonix4000).
6. Magnetic Rack for microcentrifuge tubes (Invitrogen, cat#
123210).
7. Nanodrop One spectrophotometer (Thermo Scientific, Model:
Nanodrop One).
8. Qubit fluorometer (Life Technologies, cat# Q32866).
9. MJ Research PTC-200 Thermal Cycler (MJ Research, cat#
8252-30-0001).
10. ProFlex™ 3× 32-well PCR System (Thermo Fisher Scientific,
cat# 4484073).

2.4 Oligos 1. Half_adapter_top: /5phos/CTGTCTCTTATACACATCTCT


GATGGCGCGAGGGA/3ddCTP/.
90 Shuai Liu et al.

2. Half_adapter_bottom: /5AmC6/ TCCCTCGCGCCATCA


GAGATGTGTATAAGAGACAG.
3. Bivalent_linker_top: /5Phos/ CTGTCTCTTATACA
CATCTCCGAGCCCACGAGAC /iBiodT/ CGTCGGCAGC
GTCAGATGTGTATAAGAGACAG.
4. Bivalent_linker_bottom: /5Phos/ CTGTCTCTTATACA
CATCTGACGCTGCCGACGAGTCTCGTGGGCTCGGA
GATGTGTATAAGAGACAG.
5. Illumina_Nextera_PE_PCR_primer_F: AATGATACGGC
GACCACCGAGATCTACAC [8 bp i5 barcode] TCGTCGG
CAGCGTCAGATGTGTATAAGAGACAG. Such as:
N501: AATGATACGGCGACCACCGAGATCTACAC-
TAGATCGCTCGTCGGCAGCGTCAGATGTGTA-
TAAGAGACAG.
N502: AATGATACGGCGACCACCGAGATCTACAC C
TCTCTATTCGTCGGCAGCGTCAGATGTGTA
TAAGAGACAG.
6. Illumina_Nextera_PE_PCR_primer_R: CAAGCAGAA
GACGGCATACGAGAT [8 bp i7 barcode] GTCTCGTGGG
CTCGGAGATGTGTATAAGAGACAG. Such as:
N701: CAAGCAGAAGACGGCATACGAGAT
TCGCCTTA GTCTCGTGGGCTCGGAGATGTGTA
TAAGAGACAG.
N702: CAAGCAGAAGACGGCATACGAGAT C TAG
TACGGTCTCGTGGGCTCGGAGATGTGTATAAGA
GACAG.

3 Methods

3.1 Expression and 1. Transform competent cells BL21 Gold (DE3) with pET15b-
Purification of His6Tnp. Plate the transformed bacteria cells on LB agar plates
Hyperactive Tn5 and containing 100 μg/mL Carbenicillin and incubate the plates at
Annealing of Adapters 37 °C overnight.
2. Inoculate 60 mL LB containing Carbenicillin (100 μg/mL)
with one single colony and incubate at 37 °C with shaking at
200 rpm overnight.
3. Dilute 10 mL of the above culture to 0.6 L of the same media
in a 2 L flask and continue to grow until OD600 reaches 0.8 at
37 °C while shaking at 200 rpm. Use two flasks for a total of
1.2 L media.
4. When the OD600 reaches 0.8, transfer the flasks to ice-water
bath and cool them down for 10 min. Add 300 μL
1 M IPTG (MPbio, cat# 114064112) to 0.5 mM (final
Analysis of Chromatin Interaction and Accessibility by Trac-Looping 91

concentration) and shake the culture at 200 rpm at room


temperature for 4 h.
5. Cool down the culture in ice-water, pellet the cells by spinning
at 3700 rpm at 4 °C for 15 min. Remove the supernatant and
resuspend the cell pellets in 30 mL bacteria lysis buffer; incu-
bate on ice for 30 min. Split into six 15 mL tubes.
6. Lyse the cells in each tube by sonication with a microtip on
Misonix4000 (total process time of 105″ with 15″ ON, 30″
OFF, 90% for amplitude). Pool the lysates into one
centrifuge tube.
7. Spin the lysates at 15,000 rpm for 10 min at 4 °C, transfer
supernatant to a new tube, and add 2ME (2-mecapotoethenal,
final concentration 5 mM), PMSF (1 μM), and NaCl
(1 M). Note: It is important to maintain the 1 M NaCl con-
centration to reduce the DNA contamination.
8. Wash 2 mL 50% Ni-NTA agarose bead slurry with 30 mL of
50 mM Tris–HCl pH 8.0, 300 mM NaCl, and 20 mM Imid-
azole. Spin at 1320 rpm at 4 °C for 5 min. Remove the
supernatant.
9. Add the cleared lysates to the Ni-NTA agarose beads and rotate
at 4 °C for 1 h. Spin the beads at 1320 rpm for 5 min at 4 °C
and remove the supernatant. Wash the beads once with 15 mL
Ni-NTA beads wash buffer.
10. Transfer the beads to a 1 mL syringe with glass wool on the
bottom. Wash the beads with 20 mL wash buffer. Elute the
bound Tn5 using 3 mL of Tnp elution buffer by loading
0.5 mL elution buffer for 6 times and collect the eluates into
six Eppendorf tubes. To each tube of eluates, add dithiothreitol
(final concentration 1 mM) and glycerol (final concentration
50%). Check the purity of eluted Tn5 proteins by SDS-PAGE.
The purified Tnp is stored at -80 °C freezer.
11. To anneal the adapters, dissolve all the primers at 100 μM with
EB. Mix 250 μL 20 μM Bivalent_linker_top with 250 μL
20 μM linker_bottom and 55 μL 10× annealing buffer. Heat
at 98 °C for 5 min followed by slowly cooling to room temper-
ature to form the 9 μM bivalent linker. Mix 250 μL 100 μM
Half_adapter_top and 250 μL 100 μM Half_adapter_bottom.
Heat at 98 °C for 5 min followed by slow cooling to room
temperature to form the 50 μM half adapter. The annealed
adaptors are stored at -20 °C freezer.

3.2 Cell Fixation 1. Harvest 5 × 107 cells and resuspend in 50 mL culture medium
containing 10% FBS in a 50 mL tube.
2. Add 3.33 mL of 16% formaldehyde to the cell suspension (1%
final concentration) and mix by inverting the tube gently.
92 Shuai Liu et al.

Incubate at room temperature for 10 min, with rotation on a


tube revolver (see Note 1).
3. Split the cells equally into two 50 mL tubes. Quench the cross-
linking reaction by adding 2.65 mL 1.25 M glycine (0.125 M
final concentration) into each tube. Mix well and incubate at
room temperature for 5 min by rotation on a tube revolver.
4. Collect cells by centrifugation at 370 × g for 10 min at 4 °C.
Discard the supernatant by aspiration and resuspend the cells
with 25 mL ice cold 1× PBS. Pool the cells into one tube. Pellet
the cells at 370 × g for 10 min at 4 °C. Repeat PBS washing
once. The fixed cells can be resuspended in 10 M cells per
100 μL 1× PBS and stored at -80 °C freezer.

3.3 Assemble the 1. Prepare the Tn5 transposase complex in a 1.5 mL tube by
Tn5 Complex and DNA mixing 16 μL 100% glycerol, 4.5 μL 50 μM annealed half
Transposition Reaction adapter, and 12.5 μL 9 μM annealed bivalent linker. Mix well
before adding Tn5. Add 30 μL 1 mg/mL Tn5 transposase into
the mixture, then mix well gently by pipetting up and down
several times. Incubate at room temperature for 20 min to
assemble the transposase complex. Store the complex on ice.
2. Thaw the fixed 5 × 107 cells on ice-water. Spin down the cells at
1500 rpm for 3 min in a microfuge and remove supernatant.
Resuspend cell pellet with 50 mL lysis buffer and incubate on
ice for 15 min to permeabilize the cells.
3. Centrifuge at 370 × g for 10 min at 4 °C, then remove
supernatant.
4. Resuspend the cell pellet with 1.8 mL lysis buffer, then add
200 μL 10× Tn5 reaction buffer. Mix well by pipetting up and
down several times, then dispense the cell suspension into
100 μL aliquots in 20 1.5 mL tubes (2.5 × 106 cells/tube).
5. Add 1.6 μL Tn5 complex into 100 μL cell suspension and mix
well (see Notes 2 and 3). Incubate on a thermal mixer at 37 °C
for 2 h with interval shaking (shaking at 800 rpm 30 sec ON,
5 min OFF). Then add 1.5 μL of Tn5 complex again into the
reaction and continue with the shaking/incubation overnight.

3.4 Reverse Cross- 1. Stop the reaction by adding 5 μL 0.5 M EDTA (25 mM final
Linking and Purify concentration). Pool the reaction mixtures from every two
Genomic DNA tubes into one tube (216 μL total volume in each of the final
ten tubes).
2. To each tube, add 6 μL 10% SDS (0.3% final concentration) and
5 μL 20 mg/mL Protease K (0.5 mg/mL final concentration).
Incubate on a thermal mixer at 55 °C for 2 h with shaking, then
incubate at 65 °C overnight to reverse cross-linking.
Analysis of Chromatin Interaction and Accessibility by Trac-Looping 93

3. Add 5 μL 20 mg/mL RNase A (0.5 mg/mL final concentra-


tion) into each tube and incubate at 37 °C for 30 min.
4. Add 250 μL Phenol–Chloroform into each tube, then vortex
vigorously for 30 sec. Spin at 12,000 rpm in a microcentrifuge
at 4 °C for 10 min.
5. Transfer the upper aqueous phase into a new tube. Add 1 μL
20 mg/mL glycogen, 25 μL 3 M Sodium Acetate (pH 5.2),
and 650 μL ethanol into each tube and mix well. Keep the
tubes on dry ice for 30 min, then spin at 12,000 rpm for 15 min
at 4 °C.
6. Aspirate to remove the supernatant, then wash the DNA pellet
twice with 70% ethanol. Remove the supernatant after final
wash, then air dry the pellet for 3 min.
7. Add 50 μL EB into each tube to resuspend the DNA pellet.

3.5 Repair DNA Gaps 1. To each tube, add 10 μL 10× NEBuffer 2.1, 34 μL ddH2O,
Between the Bivalent 2 μL 10 mM dNTPs, and 4 μL T4 DNA Polymerase (3 units/μ
Adapter and Genomic L). Mix well and incubate at room temperature for 1 h. Then
DNA add 5 μL 0.5 M EDTA to each tube to stop the reaction.
2. Add 61 μL AMPure XP beads to each tube (volume ratio of
beads to DNA is 0.6). Mix well and incubate at room tempera-
ture for 30 min. DNA fragments over 200 bp are captured and
free linkers are removed at this step.
3. Collect beads on a magnet and remove the supernatant. Wash
beads with 1 mL 70% ethanol twice. Air dry the beads briefly
after final wash.
4. Elute bound DNA from beads with 188 μL 1× CutSmart
buffer.

3.6 DNA Restriction 1. Half of the eluted DNA (five tubes) are subjected to NlaIII
Enzyme Digestion and digestion, and the other half (five tubes) are subjected to
Enrichment via the MluCI digestion. Add 12 μL restriction enzyme separately
Biotinylated Bivalent (10 units/μL) to each tube. Mix well and incubate at 37 °C
Adapter for 3 h. Note: There is no NlaIII nor MluCI site in the adapter.
Only the genomic DNA will be cut.
2. Prepare 40 μL Streptavidin C1 beads for each tube (400 μL for
ten tubes). Wash beads twice with 500 μL 1× B/W buffer, then
resuspend beads with 400 μL 2× B/W buffer.
3. Add 40 μL washed Streptavidin C1 beads to each restriction
digestion tube. Incubate at room temperature for 30 min with
gentle rotation.
4. Collect beads on a magnet and remove the supernatant. Wash
beads 5 times with 1 mL 1× B/W buffer plus 0.1% Triton
X-100 by rotating at room temperature for 5 min. At the final
wash step, transfer the beads in wash buffer to a new tube to
94 Shuai Liu et al.

avoid carryover contamination. Note: It is important to get rid


of the unbound DNA as much as possible.
5. Rinse beads once with 1 mL EB. Collect beads and remove the
supernatant.
6. Prepare Streptavidin C1 beads elution buffer by mixing
870 μL EB, 30 μL 10% SDS, and 100 μL 20 mg/mL Protease
K. Add 100 μL Streptavidin C1 beads elution buffer to each
tube and incubate at 55 °C for 4 h with 800 rpm shaking.
7. Collect beads on a magnet and transfer eluates to a new tube.
Wash beads with 100 μL EB plus 0.5 M NaCl. Pool the eluates
from the same tube.
8. Add 200 μL Phenol–Chloroform to each tube, then vortex
vigorously for 30 sec. Spin at 12,000 rpm in a microcentrifuge
at 4 °C for 10 min. Transfer the upper phase to a new tube.
9. Add 1 μL 20 mg/mL glycogen and 0.5 mL ethanol to each
tube. Mix well and keep on dry ice for 30 min. Then spin at
12,000 rpm for 15 min at 4 °C. Remove the supernatant and
wash the DNA pellets with 1 mL 70% ethanol twice. Air dry the
pellets briefly, then resuspend with 200 μL EB.

3.7 Self- 1. Pool the DNA from all of the ten tubes (2 mL in total) into a
Circularization of 50 mL tube. Add 10 mL 2× T7 DNA ligase buffer, 8 mL
Genomic DNA ddH2O, mix well. Add 40 μL T7 DNA ligase (3000 units/μ
Fragments in a Large L) to the tube. Mix well and incubate at room temperature
Volume overnight.
2. Add 20 mL Phenol–Chloroform to the ligation reaction. Vor-
tex vigorously for 30 s, then spin at 4200 rpm in a centrifuge
for 30 min.
3. Transfer the upper phase to 24 × 1.5 mL tubes (about 0.83 mL
each tube). Spin at 12,000 rpm in a microcentrifuge for
10 min. Transfer the upper phase to a new tube. Add 0.2 μL
GlycoBlue (15 mg/mL), 80 μL 3 M Sodium Acetate (pH 5.2),
and 0.8 mL isopropanol to each tube. Mix well and keep on dry
ice for 30 min.
4. Spin at 12,000 rpm for 30 min at 4 °C. Remove the superna-
tant and wash the pellets twice with 70% ethanol. Air dry the
pellets briefly after final wash. Note: Do not overdry the pellets
before adding the sampling buffer in the next step.

3.8 RCA Reaction in 1. Resuspend the pellet from each tube with 11 μL sampling
a Small Volume buffer from TempliPhi Amplification Kit, then split into two
PCR tubes in an eight-tube strip (5.5 μL each tube). Now the
samples are in 6× eight-tube PCR strips.
2. Heat at 97 °C for 3 min to denature DNA on a thermocycler,
then chill on ice.
Analysis of Chromatin Interaction and Accessibility by Trac-Looping 95

3. Prepare RCA reaction master mix by mixing 5 μL RCA reaction


buffer and 0.2 μL enzyme from the TempliPhi amplification kit
for each tube. Add 5.2 μL master mix to denatured DNA. Put
the tubes on a thermocycler and run the following cycle: 30 °C
16 h, 65 °C 15 min, and 4 °C hold.

3.9 Library Indexing 1. Pool RCA reactions from eight PCR tubes in one strip into one
and Amplification 1.5 mL tube. Rinse the PCR tubes with 10 μL ddH2O and
pool with RCA reactions (six tubes, 160 μL total for each tube;
only use 20 μL in the next step).
2. Transfer 20 μL from each tube to a new 1.5 mL tube. Add
20 μL AMPure XP beads to each tube. Mix well and incubate at
room temperature for 30 min (see Note 4).
3. Collect beads on a magnet, then wash beads with 500 μL 70%
ethanol twice. Air dry beads briefly after final wash, then add
40 μL EB to elute DNA from beads.
4. Measure DNA concentration of eluates using NanoDrop Spec-
trophotometer. Usually the concentration is 30–40 ng/μL (see
Note 5).
5. Prepare PCR reaction by mixing 0.5 μL purified RCA product
as template (about 20 ng), 25 μL 2× NEB Phusion HF master
mix, 22.5 μL ddH2O, 1 μL 10 μM Illumina_Nextera_-
PE_PCR_primer_F (such as N501), and 1 μL 10 μM Illumi-
na_Nextera_PE_PCR_primer_R (such as N701). Run the
following PCR program: 98 °C 30 sec; 11 cycles of 98 °C
10 sec, 65 °C 30 sec, and 72 °C 8 sec; 72 °C 5 min; 4 °C hold.
6. After gel electrophoresis, excise the gel slices containing DNA
fragments between 220 and 700 bp. Purify DNA using MinE-
lute Gel Extraction Kit.
7. Measure the DNA concentration of the library using Qubit
dsDNA HS Kit following manufacturer’s instructions.
8. Proceed to Illumina Hiseq or Novaseq Paired-End 50-8-8-50
format sequencing.
9. Map the sequencing reads to the expected genomes and ana-
lyze chromatin accessibility and interaction as described
previously [33].

4 Notes

1. Different types of cells may require different fixation condi-


tions. We use this protocol for fixing mammalian suspension
cells [33]. For other cell types, the condition may need to be
optimized.
96 Shuai Liu et al.

2. To achieve optimal complexity of Trac-looping libraries, we


started with 5 × 107 cells [33]. It would also work by using
fewer cells. The amount of Tn5 transposase complex should be
adjusted accordingly.
3. It may increase the diversity of libraries by performing
Tn5-mediated linker integration and RCA reactions in multiple
separate tubes. If a centrifuge is available to hold large tubes,
there is no need to split into small tubes in step 3 in
Subheading 3.7.
4. When using AMPure XP beads to purify genomic DNA, due to
the large size of DNA, beads will aggregate and stick to tubes.
Vortexing can increase the efficiency of eluting DNA from
beads.
5. Check the size of RCA products by gel electrophoresis. RCA
products should be over 10 kb.

References
1. Bonev B, Cavalli G (2016) Organization and of genomes: interpreting chromatin interaction
function of the 3D genome. Nat Rev Genet 17: data. Nat Rev Genet 14:390–403. https://doi.
661–678. https://doi.org/10.1038/nrg. org/10.1038/nrg3454
2016.112 9. McCord RP, Kaplan N, Giorgetti L (2020)
2. Zheng H, Xie W (2019) The role of 3D Chromosome conformation capture and
genome organization in development and cell beyond: toward an integrative view of chromo-
differentiation. Nat Rev Mol Cell Biol 20:535– some structure and function. Mol Cell 77:688–
550. https://doi.org/10.1038/s41580-019- 708. https://doi.org/10.1016/j.molcel.
0132-4 2019.12.021
3. Stadhouders R, Filion GJ, Graf T (2019) Tran- 10. Agbleke AA et al (2020) Advances in chroma-
scription factors and 3D genome conformation tin and chromosome research: perspectives
in cell-fate decisions. Nature 569:345–354. from multiple fields. Mol Cell 79:881–901.
https://doi.org/10.1038/s41586-019- https://doi.org/10.1016/j.molcel.2020.
1182-7 07.003
4. Schoenfelder S, Fraser P (2019) Long-range 11. Lieberman-Aiden E et al (2009) Comprehen-
enhancer-promoter contacts in gene expression sive mapping of long-range interactions reveals
control. Nat Rev Genet 20:437–455. https:// folding principles of the human genome. Sci-
doi.org/10.1038/s41576-019-0128-0 ence 326:289–293. https://doi.org/10.
5. Yu M, Ren B (2017) The three-dimensional 1126/science.1181369
organization of mammalian genomes. Annu 12. Rao SS et al (2014) A 3D map of the human
Rev Cell Dev Biol 33:265–289. https://doi. genome at kilobase resolution reveals principles
org/10.1146/annurev-cellbio- of chromatin looping. Cell 159:1665–1680.
100616-060531 https://doi.org/10.1016/j.cell.2014.11.021
6. Kempfer R, Pombo A (2020) Methods for 13. Ren G et al (2017) CTCF-mediated enhancer-
mapping 3D chromosome architecture. Nat promoter interaction is a critical regulator of
Rev Genet 21:207–226. https://doi.org/10. cell-to-cell variation of gene expression. Mol
1038/s41576-019-0195-2 Cell 67:1049–1058 e1046. https://doi.org/
7. Schmitt AD, Hu M, Ren B (2016) Genome- 10.1016/j.molcel.2017.08.026
wide mapping and analysis of chromosome 14. Ma W et al (2015) Fine-scale chromatin inter-
architecture. Nat Rev Mol Cell Biol 17:743– action maps reveal the cis-regulatory landscape
7 5 5 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / n r m . of human lincRNA genes. Nat Methods 12:
2016.104 71–78. https://doi.org/10.1038/nmeth.
8. Dekker J, Marti-Renom MA, Mirny LA (2013) 3205
Exploring the three-dimensional organization
Analysis of Chromatin Interaction and Accessibility by Trac-Looping 97

15. Hsieh TS et al (2020) Resolving the 3D land- 25. Dixon JR et al (2012) Topological domains in
scape of transcription-linked mammalian chro- mammalian genomes identified by analysis of
matin folding. Mol Cell 78:539–553 e538. chromatin interactions. Nature 485:376–380.
https://doi.org/10.1016/j.molcel.2020. https://doi.org/10.1038/nature11082
03.002 26. Nora EP et al (2012) Spatial partitioning of the
16. Krietenstein N et al (2020) Ultrastructural regulatory landscape of the X-inactivation cen-
details of mammalian chromosome architec- tre. Nature 485:381–385. https://doi.org/10.
ture. Mol Cell 78:554–565 e557. https://doi. 1038/nature11049
org/10.1016/j.molcel.2020.03.003 27. Sexton T et al (2012) Three-dimensional fold-
17. Mifsud B et al (2015) Mapping long-range ing and functional organization principles of
promoter contacts in human cells with high- the Drosophila genome. Cell 148:458–472.
resolution capture Hi-C. Nat Genet 47:598– https://doi.org/10.1016/j.cell.2012.01.010
606. https://doi.org/10.1038/ng.3286 28. Sanborn AL et al (2015) Chromatin extrusion
18. Hughes JR et al (2014) Analysis of hundreds of explains key features of loop and domain for-
cis-regulatory landscapes at high resolution in a mation in wild-type and engineered genomes.
single, high-throughput experiment. Nat Proc Natl Acad Sci U S A 112:E6456–E6465.
Genet 46:205–212. https://doi.org/10. https://doi.org/10.1073/pnas.1518552112
1038/ng.2871 29. Fudenberg G et al (2016) Formation of chro-
19. Fullwood MJ et al (2009) An oestrogen-recep- mosomal domains by loop extrusion. Cell Rep
tor-alpha-bound human chromatin interac- 15:2038–2049. https://doi.org/10.1016/j.
tome. Nature 462:58–64. https://doi.org/ celrep.2016.04.085
10.1038/nature08497 30. Davidson IF et al (2019) DNA loop extrusion
20. Kieffer-Kwon KR et al (2013) Interactome by human cohesin. Science 366:1338–1345.
maps of mouse gene regulatory domains reveal https://doi.org/10.1126/science.aaz3418
basic principles of transcriptional regulation. 31. Ganji M et al (2018) Real-time imaging of
Cell 155:1507–1520. https://doi.org/10. DNA loop extrusion by condensin. Science
1016/j.cell.2013.11.039 360:102–105. https://doi.org/10.1126/sci
21. Mumbach MR et al (2016) HiChIP: efficient ence.aar7831
and sensitive analysis of protein-directed 32. Vian L et al (2018) The energetics and physio-
genome architecture. Nat Methods 13:919– logical impact of cohesin extrusion. Cell 173:
922. https://doi.org/10.1038/nmeth.3999 1165–1178 e1120. https://doi.org/10.
22. Mumbach MR et al (2017) Enhancer connec- 1016/j.cell.2018.03.072
tome in primary human cells identifies target 33. Lai B et al (2018) Trac-looping measures
genes of disease-associated DNA elements. Nat genome structure and chromatin accessibility.
Genet 49:1602–1612. https://doi.org/10. Nat Methods 15:741–747. https://doi.org/
1038/ng.3963 10.1038/s41592-018-0107-y
23. Gibcus JH, Dekker J (2013) The hierarchy of 34. Buenrostro JD, Giresi PG, Zaba LC, Chang
the 3D genome. Mol Cell 49:773–782. HY, Greenleaf WJ (2013) Transposition of
https://doi.org/10.1016/j.molcel.2013. native chromatin for fast and sensitive epige-
02.011 nomic profiling of open chromatin,
24. Wang S et al (2016) Spatial organization of DNA-binding proteins and nucleosome posi-
chromatin domains and compartments in sin- tion. Nat Methods 10:1213–1218. https://
gle chromosomes. Science 353:598–602. doi.org/10.1038/nmeth.2688
https://doi.org/10.1126/science.aaf8084
Part II

Methods for Measuring the Absolute Levels


of Occupancy/Accessibility
Chapter 8

Single-Molecule Mapping of Chromatin Accessibility


Using NOMe-seq/dSMF
Michaela Hinks, Georgi K. Marinov, Anshul Kundaje, Lacramioara Bintu,
and William J. Greenleaf

Abstract
The bulk of gene expression regulation in most organisms is accomplished through the action of transcrip-
tion factors (TFs) on cis-regulatory elements (CREs). In eukaryotes, these CREs are generally characterized
by nucleosomal depletion and thus higher physical accessibility of DNA. Many methods exploit this
property to map regions of high average accessibility, and thus putative active CREs, in bulk. However,
these techniques do not provide information about coordinated patterns of accessibility along the same
DNA molecule, nor do they map the absolute levels of occupancy/accessibility. SMF (Single-Molecule
Footprinting) fills these gaps by leveraging recombinant DNA cytosine methyltransferases (MTase) to mark
accessible locations on individual DNA molecules. In this chapter, we discuss current methods and
important considerations for performing SMF experiments.

Key words Enhancers, Promoters, Chromatin accessibility, SMF, High-throughput sequencing

1 Introduction

The development of assays such as ChIP-seq [1, 2] enabled the


direct mapping of genome-wide TF binding, while methods such as
ATAC-seq [3], DNase-seq [4, 5], and MNase-seq [6] have
provided unbiased global mapping of accessible DNA and nucleo-
some positioning, with open chromatin generally being a proxy
indicator of TF occupancy. These methods have enabled identifica-
tion of CREs and the profiling of the average occupancy of TFs
across the genome. While powerful, identifying genome-wide TF
binding in bulk across tens of thousands of cells is insufficient to
fully understand mechanisms of TF action. In contrast, single-
molecule methods such as NOMe-seq [7] (Nucleosome Occu-
pancy and Methylome sequencing) and SMF [8] (single-molecule
footprinting) enable profiling of accessible DNA and TF occupancy
within individual molecules, thus potentially providing invaluable

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_8,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

101
102 Michaela Hinks et al.

information about binding cooperativity and dependencies


between individual accessibility states. The core principle underly-
ing all SMF assays is the use of DNA methyltransferases to deposit
methyl groups on accessible DNA, followed by detection of the
methylation on individual molecules of interest.
Several different versions of the SMF assays can be carried out,
based on which DNA MTase, DNA methylation detection method,
sequencing modality, and sequence enrichment strategy are used.
In this chapter, we provide important considerations for
performing SMF experiments intended for sequencing on Illumina
instruments, in either an unbiased genome-wide or targeted
manner.

2 Materials

Prepare a master stock of the ATAC-RSB buffer without detergents


in a large volume (e.g. 50 mL) and store it 4 ∘C.

2.1 Methylation Prepare the RSB-Lysis and RSB-Wash buffers immediately before
Buffers and Reagents use by adding the necessary detergents; keep on ice:
1. IGEPAL CA-630 detergent (Sigma Cat# 11332465001; sup-
plied as a 10% solution).
2. Tween-20 detergent (Sigma Cat# 11332465001, supplied as a
10% solution; store at 4 ∘C).
3. Digitonin detergent (Promega Cat# G9441, supplied as a 2%
solution in DMSO; store at - 20∘C)).
4. RSB buffer (master stock)
10 mM Tris-HCl pH 7.4
10 mM NaCl
3 mM MgCl2
5. RSB-Lysis buffer
10 mM Tris-HCl pH 7.4
10 mM NaCl
3 mM MgCl2
0.1% IGEPAL CA-630
0.1% Tween-20
0.01% Digitonin
6. Lysis Wash Buffer (RSB-wash)
10 mM Tris-HCl pH 7.4
10 mM NaCl
3 mM MgCl2
0.1% Tween-20
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 103

7. GpC Methyltransferase (M.CviPI) Reaction Buffer (NEB Cat #


B0227SVIAL). This buffer is supplied with GpC Methyltrans-
ferase Cat # M0227S as a 10× stock without the
S-adenosylmethionine). Its final composition (1×) is as follows:
50 mM NaCl
50 mM Tris-HCl (pH 8.5)
10 mM DTT
32 mM S-adenosylmethionine (SAM) (NEB Cat #
B9003SVIAL, supplied with all NEB DNA methyltransfer-
ase enzymes). SAM is to be added immediately prior to use.
Avoid repeated freeze–thawing of SAM as it is an unstable
reagent.
8. GpC MTase (M.CviPI) (NEB Cat # M0227S, supplied at 4000
units/mL).
9. CpG MTase (M.SssI) (NEB Cat # M0226S, supplied at 4000
units/mL).
10. MgCl2 (Thermo Fisher Scientific Cat # AM9530G).
11. 2 M Sucrose solution (Sigma Aldrich Cat # S0389).

2.2 Library Building, 1. Monarch Genomic DNA Purification Kit (NEB, Cat #
Sequencing, and T3010L) or equivalent
Quality Evaluation 2. NEBNext Enzymatic Methyl-seq Kit (EM-seq, NEB, Cat #
E7120L) and associated reagents or EZ-DNA Methylation-
Gold Kit (Zymo Research Cat# D5005 (or equivalent),
depending on the exact type of SMF experiment being per-
formed (see more details below)
3. Optional, required if doing probe-hybridization enrichment of
genomic locations: SureSelectXT Methyl-Seq Library Prepara-
tion kit (Agilent, Cat# G9651A) and associated reagents
4. Optional, required if doing probe-hybridization enrichment of
genomic locations: SureSelectXT Mouse Methyl-Seq target
enrichment panel and associated reagents (Agilent, Cat#
931052) (or equivalent)
5. Optional, required if doing probe-hybridization enrichment of
genomic locations—Dynabeads MyOne Streptavidin T1
(Thermo Fisher Scientific Cat# 65601)
6. Agencourt AMPure XP Kit (Beckman Coulter Genomics Cat#
A63880)
7. 10 M NaOH, molecular biology grade (Sigma Cat# 72068)
8. 100% Ethanol, molecular biology grade (Sigma-Aldrich Cat#
E7023)
9. 1× Low TE Buffer (10 mM Tris-HCl, pH 8.0, 0.1 mM EDTA)
(Thermo Fisher Scientific Cat# 12090015)
104 Michaela Hinks et al.

10. 200-μL PCR tubes


11. Sequencing primers/adapters
12. NEBNext High-Fidelity 2× PCR Master Mix (NEB, Cat#
M0541S)
13. Qubit fluorometer or equivalent
14. QuBit tubes
15. QuBit dsDNA HS Assay Kit (Thermo Fisher Scientific Cat#
Q328500)
16. TapeStation (Agilent) or equivalent, e.g., BioAnalyzer
(Agilent)
17. TapeStation D1000 tape and reagents (Agilent)

2.3 General 1. 1.5-mL microcentrifuge tubes, preferably low protein and


Materials and DNA binding
Equipment 2. 2-mL, 15-mL, and 50-mL tubes
3. Incubator (37 ∘C), or a ThermoMixer
4. Tabletop centrifuge
5. Thermal cycler
6. MinElute PCR Purification Kit (QIAGEN Cat# 28004/
28006), Zymo DNA Clean and Concentrator Kit (Zymo
Cat# D4013/D4014), or equivalent
7. Nuclease-free H2O
8. 1× PBS buffer solution
9. qPCR machine (StepOne or equivalent)
10. Covaris E220 or equivalent method for shearing genomic
DNA (gDNA)

2.4 Software 1. UCSC Genome Browser [9, 10] utilities: http://hgdownload.


Packages cse.ucsc.edu/admin/exe/.
2. R: https://www.r-project.org/.
3. Python (version 2.7 or higher) https://www.python.org/.
4. TGL Kmeans: https://github.com/tanaylab/tglkmeans.
5. SciPy: https://www.scipy.org/.
6. Matplotlib: https://matplotlib.org/.
7. Trimmomatic [11]: http://www.usadellab.org/cms/?page=
trimmomatic.
8. Cutadapt [12]: https://cutadapt.readthedocs.io/en/stable/.
9. TrimGalore: https://www.bioinformatics.babraham.ac.uk/
projects/trim_galore/.
10. bwa-meth [13]: https://github.com/brentp/bwa-meth.
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 105

11. samtools [14]: http://www.htslib.org/.


12. PicardTools: https://broadinstitute.github.io/picard/.
1 3 . M e t h y l D a c k e l : h t t p s : // g i t h u b . c o m / d p r y a n 7 9 /
MethylDackel.
14. Additional scripts: https://github.com/georgimarinov/Geo
rgiScripts. Contains python scripts used in the examples shown
below; some of the scripts depend on having pysam (https://
pysam.readthedocs.io/en/latest/index.html) and pyBigWig
(https://github.com/deeptools/pyBigWig) installed.

3 Methods

The general outline of the dSMF assay is shown in Fig. 1. Nuclei are
first isolated from cells, then chromatin is methylated using a 5mC
methyltransferase, and genomic DNA is purified. Next, base con-
version of unmethylated cytosines into uracils is carried out and
sequencing libraries are prepared. In most cases, a GpC methyl-
transferase is used, e.g., M.CviPI, which methylates cytosines in a
GpC dinucleotide context. This is because the genomes of mam-
mals, plants, and many other species contain endogenous methyla-
tion in CpG context. However, if endogenous CpG methylation is
not present in the samples being analyzed (e.g., yeast, Drosophila,
specially engineered mammalian cells that lack endogenous meth-
ylation [15], and others), an additional CpG methyltransferase can
be used, e.g., M.SssI. This improves the resolution of the assay as
the number of informative positions can be increased by a factor of
two. Historically, the difference between NOMe-seq [7] and dSMF
[8] (dual-enzyme SMF) has been that the latter uses both enzymes.
There are several ways to create a dSMF sequencing library,
including via hybridization-based probe enrichment of genomic
regions [15], targeted PCR amplification of specific loci, or by
unbiased whole-genome sequencing of methylated DNA. Here
we describe a generalized protocol for creating dSMF libraries
following these approaches using commercially available kits.
We also note that it is possible to carry out SMF on crosslinked
material, but we advise that the exact parameters of any such
protocol be individually optimized depending on the specifics of
the experiment. The protocol described here is for native
chromatin.

3.1 Preparation of The first step of the SMF procedure is to prepare nuclei for methyl-
Nuclei ation. The nuclei lysis delineated here is different from most previ-
ously published SMF protocols and identical to the Omni-ATAC
cell lysis procedure [16] as we have found that optimal and consis-
tent results are obtained that way. It will work well for most
106 Michaela Hinks et al.

M.CviPI (GpC 5mC)


M.SssI (CpG 5mC)

- DNA extracon

- DNA shearing

- base conversion - base conversion - base conversion

- whole genome - targeted enrichment - amplicon sequencing


short-read sequencing
- bionylated
probe pulldown

unmethylated
methylated

Fig. 1 Outline of the NOMe-seq/dSMF assay. As a first step, nuclei are isolated from cells, and chromatin is
incubated with the M.CviPI (GpC) and/or M.SssI (CpG) DNA methyltransferases (CpG can usually only be used
in biological contexts in which there is no endogenous CpG DNA methylation). DNA is methylated where it is
accessible, i.e., where it is not protected by nucleosomes and bound transcription factors. DNA is then purified
and fragmented, and chemical or enzymatic conversion is carried out. Three different readout strategies can
be applied subsequently—unbiased whole-genome sequencing (left), targeted enrichment using probe-
hybridization pulldown, or amplicon sequencing (see the text for more details). After sequencing, single-
molecule accessibility maps are generated based on the methylation status of informative positions along DNA

mammalian and insect cell lines. Note that tissues and eukaryotic
cells with cell walls (e.g. yeast and plant cells) will require different
lysis and nuclei isolation procedures:
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 107

1. Count 1×106 live cells (if working with a mammalian-sized


genome) and aliquot into a microcentrifuge tube (see Note 1).
2. Centrifuge cells at 500 g for 5 min at 4∘C.
3. Carefully aspirate the supernatant avoiding the pellet.
4. Wash cells by resuspending in 200 μL ice cold 1× PBS.
5. Centrifuge cells at 500 g for 5 min at 4∘C.
6. Aspirate supernatant.
7. Add 200 μL of cold RSB-Lysis Buffer and pipette up and down
several times.
8. Incubate on ice for 3 min.
9. Add 1.2 mL cold RSB-Wash Buffer and invert several times to
mix well.
10. Centrifuge at 500 g for 10 min at 4∘C.
11. Carefully aspirate the supernatant as fully as possible while
avoiding the pellet. Proceed immediately to methylation.

3.2 Methylation Carry out methylation as follows:


Treatment
1. Without resuspending, add 100 μL of CviPI Reaction Buffer to
cells.
2. To each sample, add 50 μL of GpC MTase M.CviPI (4 U/μL).
Pipette gently 6× to mix (see Note 2).
3. Incubate at 37∘C for 7.5 min in a Thermomixer at 1000 rpm.
4. Add more GpC MTase. Add additional 25 μL of
low-concentration M.CviPI (4 U/μL) and 2.4 μL more
32 mM SAM to the same tube, pipette 3× to mix, and return
to 37∘C with shaking for another 7.5 min.
5. (Optional, see Note 3 and discussion above). Add CpG MTase.
Add 3 μL of high-concentration (20 U/μL) M.SssI and 2.4 μL
more SAM to the same tube, pipette 3× to mix, and return to
37∘C with shaking for another 7.5 min.
6. At this point, proceed immediately to DNA purification or
freeze cells in methylation solution at - 20∘C.

3.3 DNA Purification Quench MTase by adding 175 μL of the lysis buffer from the NEB
Monarch gDNA extraction kit, along with 3 μL RNase A and 1 μL
Proteinase K (supplied with Monarch gDNA kit). See Note 4.
Purify gDNA following the NEB Monarch gDNA extraction
kit instructions.
Following elution, quantify gDNA using Qubit.

3.4 Library If carrying out whole-genome SMF, libraries are to be generated


Preparation—Whole- from this point using standard methods for carrying out whole-
Genome SMF genome cytosine methylation profiling, in which unmethylated
108 Michaela Hinks et al.

cytosines are converted into uracils and final sequencing libraries


are generated from the converted DNA. Two main approaches
exist—bisulfite conversion or enzymatic conversion.
For bisulfite conversion, we recommend the EZ-DNA Methyl-
ation-Gold Kit. However, bisulfite conversion leads to fragmenta-
tion of DNA, often to shorter fragments than what is desired for
SMF experiments, where a key objective is to obtain molecules as
long as can be sequenced on a short-read platform and thus maxi-
mize the information contained within each single molecule. Bisul-
fite conversion generally leads to fragments shorter than 200 bp,
often considerably shorter, which has historically necessitated care-
ful size selection of the subsequently generated libraries in order to
maximize the coverage of long fragments [8].
Enzymatic conversion using the NEB EM-seq kit offers an
attractive alternative as it does not degrade DNA and fragment
size can be carefully controlled. As a first step before entering the
EM-seq procedure, DNA needs to be sheared to the desired size.
The Covaris E220 instrument allows a convenient solution for
controlling fragment length, but other methods for shearing can
be used too.
Note that the EM-seq kit contains important positive and
negative controls—pUC19 and Lambda DNA, that are respectively
fully methylated and unmethylated and are invaluable for monitor-
ing the efficiency of methylation conversion. Either add those to
your samples as a 1% spike-in before shearing, or maintain a stock
of pre-sheared pUC19 and Lambda to be mixed with sonicated
samples prior to conversion. If using bisulfite conversion, use
unsheared controls.
Depending on the exact kit used, follow the manufacturer’s
instructions for final sequencing library generation.

3.5 Library A significant practical challenge to the application of single-


Preparation—Probe- molecule footprinting approaches to mammalian genomes is the
Hybridization very high sequencing depth that needs to be achieved in order to
Enrichment fully take advantage of the wealth of information contained in
single molecules. These reads need to be as long as possible too
(see further discussion below). Consequently, sequencing costs
quickly become a major consideration when working with large
genomes.
However, given that most of the genome is inaccessible and
active CREs comprise only a small portion of it, it is possible to
greatly reduce costs by using hybridization capture to enrich for a
desired subset of the genome. As an example, this approach has
been previously successfully used to apply dSMF to many
thousands of promoters and enhancers in the mouse genome
[15], using the SureSelectXT Mouse Methyl-Seq target enrichment
panel from Agilent. Other probe sets and enrichment protocols are
likely to work as well.
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 109

The exact details of the protocol will vary depending on the


specifics of the kit used. A general outline of the procedure is as
follows:
For a probe-hybridization enrichment dSMF experiment, foot-
printed DNA is sheared using a Covaris sonicator, and end-repaired
and methylated adapters are ligated, creating a pre-capture library.
Adapters need to be methylated in order to block their conversion
during the subsequent steps and enable PCR amplification. The
pre-capture library is then hybridized with a biotinylated set of
target probes and purified using streptavidin bead capture. The
captured molecules are subjected to bisulfite conversion and then
PCR-amplified.

3.6 Library Even greater levels of enrichment and depth of coverage can be
Preparation— obtained by selectively amplifying individual loci. This approach
Amplicon-Targeted works best together with the EM-seq conversion kit because, as
SMF discussed above, it provides better preserved DNA compared to
bisulfite treatment. Footprinted whole-genome DNA is used as
input and carried through the EM-seq procedure up to the last,
final library amplification step. Then PCR primers specific for a
locus (or loci) of interest are used to make the final targeted library.
The challenge when using this approach is that PCR primers need
to be selected and/or designed in such a way that they work on
converted DNA; the exact specifics of that selection will vary
depending on the particulars of the experiment carried out.

3.7 Library Before libraries can be sequenced, they need to be properly quanti-
Quantification and fied, and their quality evaluated. There are two components to this
Evaluation of Library process—first, evaluation of the insert distribution, and second,
Quality quantification:
1. Examination of library size distribution. This step can be car-
ried out using a variety of instruments that are now available for
this purpose, such as a TapeStation or a BioAnalyzer. In our
practice, we prefer to use a TapeStation (with the D1000 or HS
D1000 kits) due to its ease of use, flexibility, and rapid
turnaround time.
2. Quantification of library concentration. For most high-
throughput sequencing applications, where fragment size is
unimodal, this step can be carried out with a sufficient degree
of accuracy using a Qubit fluorometer. Typically, dSMF falls in
that category. For libraries with complex fragment distribu-
tions, as well as for higher accuracy of quantification, qPCR
can be used. Commercial kits, such as the NEBNext Library
Quant Kit for Illumina or the KAPA Library Quantification
Kits, exist for that purposes, and custom in-house quantifica-
tion methods can also be used (see the first chapter in this book
on ATAC-seq for details).
110 Michaela Hinks et al.

3.8 Sequencing The protocol described here generates libraries designed to be


sequenced on Illumina sequencers. Because every molecule in an
SMF library contains information about its unique accessibility
state throughout the sequence, it is advisable to perform longer
read-length sequencing than is necessary to simply align the frag-
ments. It is best to sequence all molecules completely (e.g., a
300-bp insert would be sequenced with 150 cycles in Read 1 and
150 cycles in Read 2). Paired-end sequencing is preferable to
single-end sequencing to improve quality, though single end will
also work provided the reads are sufficiently long. It is recom-
mended to sequence SMF libraries to high depth, i.e., 1000×
coverage of the size of the probed portion of the genome. This
leads to, on average, 1000 unique molecules per genomic locus that
are each read once. Sequencing depth can be adjusted based on the
user probe set and the frequency of accessibility states observed.
Due to the relatively high cost of longer-read Illumina sequenc-
ing, users may wish to perform quality control checks on the library
prior to full sequencing. A useful way to verify that the library is
complex and captures chromatin accessibility is to sequence it at a
fraction of the optimal depth using shorter read-length sequencing
(e.g., as 2 × 36 m). This way, the user can check that methylation is
detected at GpC locations and ensure that there is a diversity of
probed regions represented.

3.9 Computational The overall outline of NOMe-seq/dSMF data processing is shown


Analysis in Fig. 2. Briefly, reads are trimmed of adapters and then aligned
against the genome or a set of target amplicons. These alignments
are then used to evaluate bulk-level accessibility and to carry out
analysis at the level of individual single molecules.

3.9.1 Adapter Trimming If working with EM-seq datasets, Trimmomatic can be used to trim
adapters as follows:

java -jar trimmomatic-0.36.jar PE


EM-seq.read1.fastq.gz EM-seq.read1.fastq.gz
EM-seq.read1.paired.fastq.gz
EM-seq.read1.unpaired.fastq.gz
EM-seq.read2.paired.fastq.gz
EM-seq.read2.unpaired.fastq.gz
ILLUMINACLIP:Trimmomatic-0.36/adapters/adapters.
fa:2:30:10
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

For bisulfite data, we recommend clipping the first 9 bases of


reads due to their usually lower quality in addition to adapter
removal, using trim_galore, as follows:

trim_galore
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 111

raw reads

adapter trimming

methylation-aware alignment

aggregate coverage/methylation single-molecule maps

downstream analysis

Fig. 2 Outline of NOMe-seq/dSMF computational processing. Raw sequencing reads are first trimmed of
adapters (note that it is important to do this properly depending on the type of conversion protocol used for
making the libraries). They are then aligned against the target genome or amplicons in a methylation-aware
manner. Subsequently, alignments are used to make aggregate methylation tracks (if data is to be used to
evaluate bulk accessibility) and single-molecule plots (for actual footprinting)

--path_to_cutadapt ./cutadapt
--clip_R1 9 --clip_R2 9
--three_prime_clip_r1 6
--three_prime_clip_r2 6
--paired EM-seq.read1.fastq.gz
EM-seq.read1.fastq.gz

3.9.2 Read Mapping and We use BWAmeth for alignment of base-converted datasets:
Alignment Filtering

1. The first step is to prepare a reference, as follows:

python bwameth.py index bwameth-indexes/


genome.fa

2. Next, reads are mapped against the reference (while filtering


out low-quality alignments and unmapped reads):

python bwameth.py --reference bwameth-indexes/


genome.fa
EM-seq.read1.paired.fastq.gz EM-seq.read2.
paired.fastq.gz
| samtools view -F 1804 -q 30 -bT
bwameth-indexes/genome.fa - | samtools sort -
EM-seq.bwameth
112 Michaela Hinks et al.

3. The next step is to remove potential PCR duplicates. Note that


this step generally applies only to whole-genome and probe-
capture libraries where there is a diversity of fragment
coordinates.

java -Xmx4G -jar picard-tools-1.99/Mark


Duplicates.jar
INPUT=EM-seq.bwameth.bam OUTPUT=EM-seq.bwameth.
dedup.bam
METRICS_FILE=EM-seq.bwameth.dedup.metric
VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=
true
REMOVE_DUPLICATES=true

4. Use samtools to index the final BAM file:

samtools index EM-seq.bwameth.dedup.bam

3.9.3 Methylation In order to evaluate the efficiency of methylation conversion, use


Conversion Assessment the same procedure as described above to map reads against the
Lambda and pUC19 genomes.
Then use the custom MethylationPercentageContext.
py script to calculate the average methylation levels in GpC and
CpG contexts, e.g., for pUC19:

python MethylationPercentageContext.py
EM-seq.pUC19.dedup.bam pCU19.fa CG,GC
EM-seq.pUC19.dedup.CG-GC-meth-perc

The pUC19 plasmid is the methylated positive control and


should show very high (90%+) levels of specifically CpG methyla-
tion, while Lambda DNA is the unmethylated negative control and
should exhibit minimal levels of methylation.

3.9.4 Methylation Calling The next step is to extract methylation calls, using MethylDackel:

MethylDackel extract --CHG --CHH genome.fa


EM-seq.bwameth.dedup.bam

Note the parameters used so that both CpG and GpC contexts
are included in the output. However, further filtering is needed in
order to specifically obtain GpC positions, described further below.

3.9.5 Bulk Accessibility For the purpose of generating bulk accessibility profiles (this is
or Methylation Profile often useful for genome browser visualization of results), execute
Generation the following steps:
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 113

1. Compress the MethylDackel output:

gzip EM-seq.bwameth.dedup_CHG.bedGraph
gzip EM-seq.bwameth.dedup_CHH.bedGraph
gzip EM-seq.bwameth.dedup_CpG.bedGraph

2. Extract GpC positions from the MethylDackel output, for


GpC positions, using the MethylationPercentageS-
mooth-dSMF.py custom script:

python MethylationPercentageSmooth-dSMF.py
EM-seq.bwameth.dedup_CHH.bedGraph.gz
genome.fa GpC 10 -MethylDackel -minCov 10 >
EM-seq.bwameth.dedup_CHH.GpC-only.minCov10.wig

3. Do the same for CpG positions:

python MethylationPercentageSmooth-dSMF.py
EM-seq.bwameth.dedup_CpG.bedGraph.gz
genome.fa CpG 10 -MethylDackel -minCov 10 >
EM-seq.bwameth.dedup_CHH.CpG-only.minCov10.wig

Note that in this case we also apply a minimal coverage


cutoff of 10 reads. This can be adjusted as needed.
4. These steps create bedGraph files from which BigWig files to be
used on a genome browser can be generated:

UCSC-utils/wigToBigWig
EM-seq.bwameth.dedup_CHH.GpC-only.minCov10.wig
genome.chrom.sizes
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.minCov10
.bigWig

Note that for this step a chrom.sizes file is needed. This


file can be created using the makeChromSizesFromFasta.
py custom script.
5. It is also often useful to know what the raw read coverage is
along the genome. This can be generated using many different
existing tools; in this case, we use the custom makewiggle-
fromBAM-NH.py script:

python makewigglefromBAM-NH.py title


EMs-eq.dedup.bam genome.chrom.sizes
EMs-eq.dedup.coverage.wig -uniqueBAM

Convert into a BigWig file as above.


114 Michaela Hinks et al.

3.9.6 Metaprofile It is often useful to generate metaplots over a defined set of geno-
Evaluation mic features, for quality evaluation (e.g., assessing how strong the
methylation levels are around active promoters) and for other
analysis tasks (e.g., measuring average footprinting by TFs at their
occupancy sites).

1. As a first step, extract the wanted sequence contexts from the


MethylDackel output using the BismarckSequenceCon-
textFilter.py custom script, e.g., as follows for GpC:

python BismarckSequenceContextFilter.py
EM-seq.bwameth.hg38.dedup_CHH.bedGraph.gz GC
genome.fa | gzip >
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.bedGraph
.gz

2. Then generate the metaprofile using the signalAround-


Peaks-nano.py custom script. This script can be run with a
wide variety of inputs and window lengths around the desired
viewpoints. In this example, we use it to generate a metaprofile
around annotated transcription start sites:

python signalAroundPeaks-nano.py
annotation.TSS-0bp.bed 0 1 3 1000 10
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.bedGraph
.gz
EM-seq.bwameth.hg38.dedup_CHH.GpC-only.TSS_
profile -bismark.cov

The annotation.TSS-0bp.bed can be generated from a


GTF files using the TSS_bed_FromGTF.py custom script.

3.9.7 Generating Single- Finally, we illustrate the generation of single-molecule maps. This is
Molecule Maps done using the dSMF-footprints.py script, which has as a
dependency the heatmap.py custom script, and has a wide variety
of options for color adjustment, minimal read coverage filtering,
and others.
It takes as input a BAM file, a BED file with the windows over
which single molecules are to be plotted, the genome sequence,
and the sequence context(s) (GC, CG, or both).

python dSMF-footprints.py EM-seq.bwameth.hg38


.dedup.bam
genome.fa GC region.bed 0 1 2 3 EM-seq
-heatmap heatmap.py 10 10 binary 10,100 -minCov
0.9
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 115

In this case, we filter out all alignments that do not cover at least
90% of the input regions and plot the single molecules using the
binary Matplotlib colormap, meaning that methylated positions
will be shown as light, while protected unmethylated positions will
be shown in black.

4 Expected Results

Figure 3a shows bulk accessibility, CpG methylation maps, and raw


read coverage tracks for previously published [15] probe-capture
dataset in mouse.

A Scale
chr6: 122,690,000 122,695,000 122,700,000
20 kb
122,705,000 122,710,000 122,715,000 122,720,000 122,725,000
mm10
122,730,000 122,735,000 122,740,000 122,745,000
694 _
read coverage
reads

1_
1_
CpG
methylation

0_
1_
GpC

0_

Nanog Slc2a3
Nanog
ENCODE cCREs Nanog
s

B ERR4165154 (TKO_ES159-SMF_MM_TKO_DE_R1) C ERR4165154 (TKO_ES159-SMF_MM_TKO_DE_R1)


1.0 CpG 1.0
CpG
GpC
0.9 0.9 GpC

0.8 0.8

0.7 0.7
1-methylation
1-methylation

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

-1,000 -800 -600 -400 -200 0 200 400 600 800 1,000 -500-450-400-350-300-250-200-150-100 -50 0 50 100 150 200 250 300 350 400 450 500
Position relative to TSS Position relative to CTCT motif

Fig. 3 Aggregate accessibility analysis of NOMe-seq/dSMF datasets. (a) Read coverage and aggregate
CpG/GpC methylation genome browser tracks show accessibility and/or endogenous methylation levels
around the genome. In this case, reduced representation probe-capture dSMF datasets (obtained from
ArrayExpress accessions E-MTAB-9033 and E-MTAB-9123) are shown, thus the uneven coverage. (b)
Metaplot showing average accessibility levels around TSSs in the mouse genome. (c) Metaplot showing
average footprinting levels at occupied CTCF motifs
116 Michaela Hinks et al.

Figures 3b and c show typical CpG (endogenous methylation)


and GpC (accessibility) metaprofiles around transcription start sites
and around occupancy sites of the CTCF transcription factors
(which is well known to be a strong driver of nucleosomal occu-
pancy in the vicinity of its occupancy sites [17]) for the same
dataset.
Examples of single-molecule maps showing footprint protec-
tion around binding sites for the CTCF transcription factor are
shown in Fig. 4.

5 Notes

1. Note that the efficiency of the methylation reaction is poten-


tially dependent on the ratio between the amount of enzyme
present and the quantity of input material. Therefore, one
should be careful to avoid using too many cells as this could
lead to suboptimal level of methylation in open chromatin. The
input cell number should be scaled according to genome size
and ploidy, i.e., a fission yeast cell (a 12-Mbp haploid
genome) contains 500× less chromatin than a typical
human cell (a 3-Gbp diploid genome).
2. We have found that the high concentration of glycerol in the
final methylation reaction is important for maintaining cell
solubility and has little or no adverse effect on methyltransfer-
ase function. As a result, we recommended to use the low
concentration of methyltransferase supplied by NEB at the
time of writing. If using higher concentrations of enzymes
from another source, adding extra glycerol to the methylation
reaction may help to prevent cells from clumping together.
Low concentrations of non-ionic detergents such as Tween-
20 may also prevent cell clumping, but further optimization
would be required.
3. It is possible to do single-molecule footprinting for intended
use with Illumina sequencing with only one or both of GpC or
CpG methyltransferases. The particular application will deter-
mine which option is advisable. When working in organisms
such as Drosophila that contain no endogenous DNA methyla-
tion, using both enzymes is recommended for maximal foot-
printing resolution. CpG methylation exists endogenously in
mammalian cells, so users may opt to only use GpC methyl-
transferase in order to distinguish between natural and syn-
thetic DNA methylation.
4. In our experience, using the Monarch Genomic DNA Purifica-
tion Kit is the easiest way to obtain high-quality, purified
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 117

A 50bp
Spic
B 50bp

CTCF MyoD CTCF


Nfib CTCF FoxA2 E2F Zfp238 CTCF Nr2e1

Fig. 4 Examples of single-molecule accessibility measurements. Shown are dSMF single-molecule maps
(obtained from ArrayExpress accessions E-MTAB-9033 and E-MTAB-9123 [15]). (a) High levels of occupancy
by the CTCF transcription factor (middle). (b) CTCF (middle) and possible nucleosome (left) footprints
118 Michaela Hinks et al.

genomic DNA after methylation. However, other methods,


such as phenol–chloroform extraction, have been demon-
strated to work. It is likely other kits for genomic DNA extrac-
tion also perform well. If using a different column-based
gDNA purification kit, care should be taken to increase the
amount of DNA binding and cell lysis buffers to ensure the
ratio of kit buffer volume to sample buffer volume remains the
same as in the manufacturer’s instructions. Inappropriate
buffer volume ratios may lead to poor DNA binding to the
column and subsequent low yield.

Acknowledgements

The authors thank members of the Greenleaf, Bintu and Kundaje


labs for many helpful discussions. This work was supported by NIH
grants (P50HG007735, RO1 HG008140, U19AI057266, and
UM1HG009442 to W.J.G., 1UM1HG009436 to W.J.G. and A.
K., 1DP2OD022870-01 and 1U01HG009431 to A.K., and
HG006827 to C.H.), the Rita Allen Foundation (to W.J.G.), the
Baxter Foundation Faculty Scholar Grant, and the Human Fron-
tiers Science Program grant RGY006S (to W.J.G). W.J.G is a Chan
Zuckerberg Biohub investigator and acknowledges grants 2017-
174468 and 2018-182817 from the Chan Zuckerberg Initiative.
Fellowship support provided by the Stanford School of Medicine
Dean’s Fellowship (G.K.M.).

References

1. Johnson DS, Mortazavi A, Myers RM, Wold B 6. Schones DE, Cui K, Cuddapah S et al. (2008)
(2007) Genome-wide mapping of in vivo Dynamic regulation of nucleosome positioning
protein-DNA interactions. Science in the human genome. Cell 132(5):887–898
316(5830):1497–1502 7. Kelly TK, Liu Y, Lay FD et al. (2012) Genome-
2. Mikkelsen TS, Ku M, Jaffe DB et al. (2007) wide mapping of nucleosome positioning and
Genome-wide maps of chromatin state in plu- DNA methylation within individual DNA
ripotent and lineage-committed cells. Nature molecules. Genome Res 22:2497–2506
448(7153):553–560 8. Krebs AR, Imanci D, Hoerner L, Gaidatzis D
3. Buenrostro JD, Giresi PG, Zaba LC et al. et al. (2017) Genome-wide Single-Molecule
(2013) Transposition of native chromatin for Footprinting Reveals High RNA Polymerase
fast and sensitive epigenomic profiling of open II Turnover at Paused Promoters. Mol Cell
chromatin, DNA-binding proteins and nucleo- 67:411–422.e4
some position. Nat Methods 10:1213–1218 9. Kuhn RM, Haussler D, Kent WJ (2013) The
4. Crawford GE, Holt IE, Whittle J et al. (2006) UCSC genome browser and associated tools.
Genome-wide mapping of DNase hypersensi- Brief Bioinform 14:144–161
tive sites using massively parallel signature 10. Kent WJ, Zweig AS, Barber G et al. (2010)
sequencing (MPSS). Genome Res 16:123–131 BigWig and BigBed: enabling browsing of
5. Boyle AP, Davis S, Shulha HP et al. (2008) large distributed datasets. Bioinformatics 26:
High-resolution mapping and characterization 2204–2207
of open chromatin across the genome. Cell 11. Bolger AM, Lohse M, Usadel B. 2014. Trim-
132:311–322 momatic: a flexible trimmer for Illumina
Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF 119

sequence data. Bioinformatics 30(15): Benes V, Molina N, Krebs AR (2021) Molecu-


2114–2120 lar co-occupancy identifies transcription factor
12. Martin M (2011) Cutadapt removes adapter binding cooperativity in vivo. Mol Cell 81(2):
sequences from high-throughput sequencing 255–267.e6
reads. EMBnet.Journal 17(1):10–12 16. Corces MR, Trevino AE, Hamilton EG et al.
13. Pedersen BS, Eyring K, De S et al. (2014) Fast (2017) An improved ATAC-seq protocol
and accurate alignment of long bisulfite-seq reduces background and enables interrogation
reads. arXiv 1401.1129 of frozen tissues. Nat Methods 14:959–962
14. Li H, Handsaker B, Wysoker A et al. (2009) 17. Fu Y, Sinha M, Peterson CL, Weng Z (2008)
The sequence alignment/map format and The insulator binding protein CTCF positions
SAMtools. Bioinformatics 25:2078–2079 20 nucleosomes around its binding sites across
15. SÖnmezer C, Kleinendorst R, Imanci D, the human genome. PLoS Genet 4(7):
Barzaghi G, Villacorta L, Schübeler D, e1000138
Chapter 9

ORE-Seq: Genome-Wide Absolute Occupancy Measurement


by Restriction Enzyme Accessibilities
Elisa Oberbeckmann, Michael Roland Wolff, Nils Krietenstein,
Mark Heron, Andrea Schmid, Tobias Straub, Ulrich Gerland,
and Philipp Korber

Abstract
Digestion with restriction enzymes is a classical approach for probing DNA accessibility in chromatin. It
allows to monitor both the cut and the uncut fraction and thereby the determination of accessibility or
occupancy (= 1 - accessibility) in absolute terms as the percentage of cut or uncut molecules, respectively,
out of all molecules. The protocol presented here takes this classical approach to the genome-wide level.
After exhaustive restriction enzyme digestion of chromatin, DNA is purified, sheared, and converted into
libraries for high-throughput sequencing. Bioinformatic analysis counts uncut DNA fragments as well as
DNA ends generated by restriction enzyme digest and derives thereof the fraction of accessible DNA. This
straightforward principle is technically challenged as preparation and sequencing of the libraries leads to
biased scoring of DNA fragments. Our protocol includes two orthogonal approaches to correct for this
bias, the “corrected cut–uncut” and the “cut–all cut” method, so that accurate measurements of absolute
accessibility or occupancy at restriction sites throughout a genome are possible. The protocol is presented
for the example of S. cerevisiae chromatin but may be adapted for any other species.

Key words Chromatin, DNA accessibility, Absolute occupancy, Restriction enzyme, High-through-
put sequencing

1 Introduction

Nucleases are often used to measure DNA accessibility in chroma-


tin. DNA accessibility and the corresponding cleavability by
nucleases is mainly modulated by DNA binding factors, of which
histones are the most common in chromatin. DNA binding of a
factor is characterized by the position where along the DNA
sequence the factor binds (“peak position”) and the occupancy,
i.e., which fraction of DNA molecules is occupied by the factor at

Authors Elisa Oberbeckmann and Michael Roland Wolff have equally contributed to this chapter.

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_9,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

121
122 Elisa Oberbeckmann et al.

this position (“peak height” or, more exactly, “area under the
peak”). Occupancy is related to accessibility as the sum of both
amounts to 100%. Many techniques that map DNA binding of a
factor are good at determining the position but are limited regard-
ing occupancy measurements. This is because they are often yield
methods, i.e., they score either the bound (e.g., MNase-seq [1]) or
unbound (e.g., ATAC-seq [2, 3]) subpopulation but not both. This
allows, at best, to compare occupancies at different conditions
relative to each other (relative occupancy) but not to measure
occupancy in absolute terms. Absolute occupancy is defined as the
percentage of DNA molecules bound by a factor. The measurement
of absolute occupancy requires monitoring (1) simultaneously the
bound and unbound state, (2) under saturating conditions, and
(3) at sufficiently frozen binding-unbinding-dynamics to avoid a
shift of the bound:unbound ratio during mapping. A prominent
counterexample where mapping of absolute occupancy is not pos-
sible is the genome-wide mapping of nucleosomes, i.e., binding of
histone octamers along DNA, by digestion of chromatin with
micrococcal nuclease (MNase). This technique relies on a carefully
titrated and limited MNase digestion degree that removes most
non-nucleosomal DNA while not yet cleaving the DNA wrapped
around the histone octamers. This protected DNA is detected by
high-throughput sequencing (MNase-seq). While MNase-seq
readily determines nucleosome positions, it does not track the
unbound state and does not work at saturation and therefore
cannot measure absolute nucleosome occupancy.
Our ORE-seq protocol presented here offers a genome-wide
and complementary technique for the determination of absolute
occupancies. It employs type II restriction endonucleases (restric-
tion enzymes, REs) that cleave DNA as dimers by embracing [4]
defined short DNA sequences (RE sites) with high specificity. This
has the advantages that their cleavage is prevented by most DNA
binders, leads to double strand breaks (DSBs) at predictable sites
and with predictable DNA end properties and can be carried out to
saturation without losing any DNA. Saturated digestion can be
ensured by following a digestion time course or by comparing
digestion with two sufficiently different RE concentrations over
the same digestion time. The combination of both can be used to
ensure that chromatin dynamics are sufficiently frozen, i.e., if two
different RE concentrations yield similar accessibilities at two dif-
ferent time points each. After cleavage of all accessible RE sites on
the level of chromatin, the DNA is purified so that all occluding
DNA binders are removed, and DSBs flanking the RE sites are
quantitatively introduced in a secondary cleavage step. This gener-
ates DNA fragments with all combinations of either one or two
ends generated by the RE (RE end) and/or by the secondary
cleavage. From the viewpoint of one particular RE site, there can
be DNA fragments that span the RE site (uncut fragments) or that
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 123

start/end with this RE site (cut fragments). As the RE cleavage


generates two cut fragments that is compared to only one uncut
fragment, the absolute accessibility at a certain RE site is given by
the formula: “number of cut fragments/(number of cut fragments
+ 2× number of uncut fragments)” and the absolute occupancy is
“1 - absolute accessibility.”
This RE accessibility approach that relies on the ratio between
cut and uncut fragments was established early on for single loci via
Southern blot detection [5]. Here we adopt this approach for
genome-wide measurements and call it the “cut–uncut” method.
Secondary cleavage is achieved by DNA shearing to a degree that
the DNA fragments are short enough for scoring their ends by
Illumina high-throughput sequencing. To allow a fine-grained
determination of absolute occupancy, the sequencing coverage
has to be at least 40-fold. In theory, this should measure absolute
accessibility or occupancy by counting the respective DNA ends
right away.
However, in practice, we identified confounding factors and
established ways to correct for them in our first application of this
protocol [6]. Scoring too many RE ends due to fortuitous shearing
at RE sites is controlled for by analysis of genome regions far away
of RE sites and is usually not a major problem. Conversely, scoring
too few RE ends due to DNA end resection by contaminating
exonucleases in chromatin is quite common but taken into account
by scoring RE ends around RE sites within a window size deter-
mined from the fragment end distributions around RE sites. The
Illumina sequencing platform shows a bias against long fragments
(>500 bp). Therefore, only fragments of <500 bp size are included
in the bioinformatics analysis. Finally, by using calibration samples
consisting of mixtures between undigested and totally digested
genomic DNA (gDNA) at known percentages, we observed a bias
that may reflect, at least in part, a preference for RE ends over DNA
ends generated by shearing (shearing ends). Such a preference may
be due to different ligation efficiencies at RE versus shearing ends as
they differ in DNA sequence and as shearing ends generated by
sonication may have an unusual chemistry [7] and be no or poor
substrates for the enzymes used in DNA end polishing and adapter
ligation during the preparation of sequencing libraries. In contrast,
the REs provide 5′-phosphate DNA ends right away, which are, as
blunt ends, required for the ligase. Our speculation that the bias
observed with the calibration samples is linked to end polishing and
adapter ligation is supported as this bias became less pronounced
when we used a commercial and highly optimized sequencing
library preparation kit (NEB) in the protocol presented here in
contrast to our own enzyme mix used in our prior study [6]. How-
ever, this could not completely eliminate this bias, which was also
true in a similar approach that used a commercial DNA repair
enzyme kit in addition [8]. Nonetheless, this bias can be compen-
sated by a correction factor derived from the calibration curve.
124 Elisa Oberbeckmann et al.

Alternatively, this bias can be circumvented by what we call the


“cut–all cut” method. For this, a bioinformatically tractable spike-
in gDNA, e.g., from a sufficiently different species that was
completely digested with an REspike-in different from the RE used
on chromatin, is added to the purified DNA after RE digestion of
chromatin and prior to DNA shearing. This mixture is divided into
two aliquots and one aliquot is digested again with the same RE
(“second RE cleavage,” not to be confused with the secondary
cleavage by sonication) as was used at the level of chromatin. As
all DNA binders that could block cleavage are gone at this stage,
this second RE cleavage will cleave all RE sites (“all cut” or “100%
accessibility” or just “1”). The other aliquot gets only a mock
second RE digest, therefore retains only the RE cuts that were
generated in the presence of chromatin and represent the unknown
and to be measured “X%” of accessibility. Both the “100%” and the
“X%” samples are processed in parallel by Illumina sequencing and
the accessibility at a given RE site corresponds to the number of cut
RE ends normalized to the number of cut REspike-in ends in the “X
%” sample divided by the number of cut RE ends per number of cut
REspike-in ends in the “100%” sample. This approach compares only
the same kind of RE ends so that DNA end biases in the down-
stream processing pipeline cancel out.
Nonetheless, there still remains a source of bias as RE sites may
have closely (<500 bp) neighboring RE sites, so that the “100%”
sample contains respective DNA fragments that are delimited by
the RE at both ends and may be on average shorter compared to
the “X%” sample where 100% - X% of the corresponding DNA
fragments have only one RE end and the other end is generated by
shearing at variable and often longer distances than that of the
neighboring RE site. This bias is due to differences in fragment
lengths and maybe also due to a preference for RE over shearing
ends and is corrected for by excluding such “next neighbor” RE
sites.
An overview of the “cut–uncut” versus “cut–all cut” protocols
is given in Fig. 1, the performance of both methods is shown with
their respective calibration curves in Fig. 2, and an example of
ORE-seq measurements with three different REs for an Saccharo-
myces cerevisiae genome region is shown in Fig. 3.
Our ORE-seq protocol is presented here at a scale for use of
one RE, including the mock digest control and using two different
RE concentrations. This can be scaled up for using several REs as
the resolution of the absolute occupancy map increases with the
number of REs, and especially with the use of REs with 4 bp RE
sites.
The protocol is described for chromatin prepared from
S. cerevisiae and for a Schizosaccharomyces pombe gDNA spike-in.
It can be readily adapted for chromatin preparations and spike-in
gDNA from other species. Especially for metazoan genomes, the
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 125

cut - uncut cut - all cut


Chromatin RE digestion Chromatin RE digestion
e.g. AluI, BamHI, HindIII e.g. AluI, BamHI, HindIII
RE cleavage site RE cleavage site
cell 1 cell 1
cell 2 cell 2
cell 3 cell 3

DNA purification DNA purification

spike-in
+ RE -digested S. pombe spike-in
split sample

No 2nd digest (X%) 2nd digest (100%)


uncut cut cut
uncut cut cut
cut cut cut cut cut cut

DNA shearing to ~150 bp DNA shearing to ~150 bp


sequencing library preparation sequencing library preparation

Illumina® Sequencing Illumina® Sequencing

Accessibility cut fragments cut f. X% / S. pombe spike-in


calculation: 2 x uncut f. + cut f. cut f. 100% / S. pombe spike-in

Fig. 1 Flow chart for cut–uncut and cut–all cut method. For details see text. “f.” abbreviates “fragments”

required high-sequencing coverage becomes rather costly. None-


theless, we suggest that sequencing costs can be reduced if bioti-
nylated adapters are ligated only to the RE ends prior to shearing
and if the biotinylated fragments are enriched by immunoprecipita-
tion after shearing and prior to sequencing library preparation. In
this case the “cut–all cut” method must be used. This enrichment
of RE ends over shearing ends via ligation with tagged adapters is
not part of our protocol but an interesting modification and akin to
methods that were developed to measure DSBs in the context of
DNA repair studies [9].

2 Materials

2.1 Cells and Buffers 1. S. cerevisiae strain of interest and media and growth conditions
for Preparation according to your requirements.
of S. cerevisiae 2. Cold (0–4 °C) distilled or deionized water (dH2O).
Chromatin
3. Preincubation solution: 0.7 M β-mercaptoethanol, 2.8 mM
EDTA pH 8 (see Note 1). Add 2.5 mL 14.3 M
β-mercaptoethanol and 278 μL 0.5 M EDTA pH 8 to a
50 mL tube. Add dH2O to yield a final volume of 50 mL.
126 Elisa Oberbeckmann et al.

AluI BamHI HindIII

100

75

not corrected
cut− uncut
50

25

individual correction factor


100

75

cut−uncut
measured mean absolute occupancy / %

50

25

combined correction factor


100

75

cut− uncut
50

25

100

cut − all cut


75

50

25

0 25 50 75 100 0 25 50 75 100 0 25 50 75 100

prepared fraction of uncut DNA molecules / %

Fig. 2 Calibration curves for the indicated REs. Mixtures of given percentages of S. cerevisiae gDNA cut with
the indicated REs were prepared as in Subheading 3.8 and analyzed both via the cut–uncut method
(uncorrected or corrected by individual factor derived for each RE or by a combined correction factor derived
from the combined RE calibration samples, see Subheading 4.1) and via the cut–all cut method. Circles denote
the average and error bars denote the standard deviation of all RE sites in the S. cerevisiae genome included in
the analysis. Data are deposited at GEO under the accession number GSE189142

Wipe off any spills from the tube and wrap it with parafilm as
this solution smells. Store at -20 °C.
4. 1 M sorbitol: Dissolve 182.2 g sorbitol in 1 L dH2O. Store at -
20 °C.
5. Sorbitol + β-mercaptoethanol solution: 1 M sorbitol, 5 mM
β-mercaptoethanol. Top up 17.5 μL 14.3 M β-mercaptoetha-
nol with 1 M sorbitol to 50 mL. Prepare freshly.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 127

ChrII:437,298-451,553

ORE-seq (combined)

ORE-seq (BamHI)

ORE-seq (HindIII)

ORE-seq (AluI)

Chemical Cleavage
GSM2561057

VPS15 MMS4 FES1 EXO64 SIF2 YMC2 VID24

Fig. 3 Absolute occupancies as measured for the indicated REs by ORE-seq for an exemplary region of the
S. cerevisiae genome. Data for ORE-seq with BamHI, HindIII, AluI, and their combination are taken from
[6]. Nucleosome mapping data by chemical cleavage [18] is shown for reference

6. 0.1 M EGTA pH 8.0: Add 3.8 g EGTA (Titriplex VI) to a


100 mL beaker. Add 50 mL dH2O. Adjust to pH 8 with 5 M
KOH, otherwise EGTA will not dissolve completely. Adjust
volume in a measuring cylinder to 100 mL with dH2O. Store
at room temperature (RT).
7. Zymolyase: Weigh in Zymolyase (ICN Biochemicals, 100,000
u/mg) in a 1.5 mL tube. Top up with dH2O (RT) to obtain a
concentration of 20 mg/mL. Prepare freshly just before use.
Zymolyase does not dissolve, therefore mix this suspension
gently before pipetting.
8. Ficoll solution: 18% Ficoll (Sigma Ficoll Type 400 F4375),
20 mM KH2PO4, 1 mM MgCl2, 0.25 mM EGTA, 0.25 mM
EDTA. Add 90 g Ficoll, 1.36 g KH2PO4, 0.5 mL 1 M MgCl2,
1.25 mL 0.1 M EGTA pH 8, and 250 μL 0.5 M EDTA pH 8 to
a 0.5 L beaker. Add 400 mL dH2O and stir the solution
overnight at RT as it dissolves very slowly. Cover the beaker
with parafilm or foil. After dissolution, adjust pH to 6.8 with
KOH. Use a 5 M KOH stock in the beginning and switch to a
1 M stock later to prevent overtitration. Adjust volume to 0.5 L
with dH2O in a 0.5 L measuring cylinder. Mix and aliquotize to
50 mL tubes. Store at -20 °C.

2.2 Cells and Buffers The S. pombe spike-in gDNA is only necessary if the cut–all cut
for Preparation of S. method is applied.
cerevisiae and
1. S. cerevisiae and
S. pombe Genomic S. pombe wild-type strains (in our case BY4741 and h-972).
DNA (gDNA)
2. S. cerevisiae YPD medium: 20 g/L Bacto Peptone (Becton,
Dickinson and Company), 10 g/L yeast extract (Biolife),
20 g/L D-glucose, autoclave, and store at RT.
3. S. pombe YES medium: 5 g/L yeast extract (Difco), 30 g/L
D-glucose, 0.7 g/L amino acid mix (0.1 g/L each of adenine,
leucine, histidine, uracil, lysine, arginine, glutamate), use Milli-
pore treated or equivalent water quality, filter sterilize, and
store at RT.
128 Elisa Oberbeckmann et al.

4. Blood & Cell Culture DNA Midi Kit (QIAGEN), including


Buffer Y1 (including 14 mM β-mercaptoethanol, see Note 1),
Buffer G2 (including RNaseA), Buffer QBT, Buffer QC, Buffer
QF, and QIAGEN Proteinase K.

2.3 Buffers and 1. Suitable restriction enzymes (REs; e.g., NEB) and
Enzymes for Digestion corresponding RE-buffers, e.g., BamHI-HF and 10×
of Chromatin and CutSmart-Buffer. Store at -20 °C. Dilute 10× CutSmart-
S. cerevisiae/S. pombe Buffer with double distilled water (ddH2O) to 1× CutSmart-
gDNA with Restriction Buffer (50 mM potassium acetate, 20 mM Tris–acetate, 10 mM
Enzymes and DNA magnesium acetate, 100 μg/mL BSA pH 7.9). Store at -20 °
Purification C.
2. 10× STOP-Buffer: 4% SDS, 100 mM EDTA, 50 mM Tris–HCl
pH 7.5. Add 2.5 mL 1 M Tris–HCl pH 7.5, 10 mL 0.5 M
EDTA pH 8 and 27.5 mL ddH2O to a 50 mL tube. Mix and
add 10 mL 20% SDS. Store at RT.
3. 20 mg/mL proteinase K (Genaxxon) solution in ddH2O. Store
at -20 °C. Aliquots may be refrozen.
4. 10 mg/mL RNase A (Roche) in ddH2O. Remove DNases by
incubation at 95 °C for 15 min. Store at -20 °C. Aliquots may
be refrozen.
5. 5 M NaClO4 in ddH2O. Store at RT.
6. 100% and 70% ethanol. Store at RT.
7. Isopropanol. Store at RT.
8. Phenol for DNA extraction, equilibrated at pH ~8 (Sigma).
Store 50 mL aliquots at -20 °C.
9. Chloroform/isoamylalcohol (24:1): Under the fume hood,
add 20 mL isoamylalcohol to 480 mL chloroform. Store at
RT under the hood.
10. TE-buffer: 5 mM Tris–HCl pH 8, 1 mM EDTA. Store at RT.
11. 1 M KOAc in ddH2O. Store at RT.
12. 0.2 M EDTA pH 8. Store at RT.

2.4 DNA Shearing 1. MicroTUBE AFA Fiber Pre slit 6 × 16 mm.


and Purification After 2. Covaris S220 sonicator.
RE Digestion and
3. AMPure XP beads (Beckman Coulter).
Preparation of DNA
Sequencing Libraries 4. 5 mM Tris–HCl pH 8.0.
5. Magnetic rack (Invitrogen).
6. Agarose gel electrophoresis or Bioanalyzer or TapeStation or
Fragmentanalyzer.
7. Qubit™ Fluorometer and Qubit™ dsDNA HS Assay Kit.
8. NEBNext® Ultra™ II DNA Library Prep Kit for Illumina®,
including NEBNext Ultra II End Prep Enzyme Mix, NEBNext
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 129

Ultra II End Prep Reaction Buffer, NEBNext Ultra II Ligation


Master Mix, NEBNext Ultra II Ligation Enhancer, NEBNext
Ultra II Q5 Master Mix.
9. Thermocycler.
10. NEBNext® Multiplex Oligos, including NEBNext Adaptor for
Illumina and USER Enzyme.
11. Illumina sequencer, including all sequencing reagents.

3 Methods

3.1 Preparation 1. Grow 1.5 × 109 cells (corresponding to ca. 30 μg gDNA) of


of S. cerevisiae your chosen S. cerevisiae strain at your chosen biological con-
Chromatin (“Nuclei,” dition (see Note 3).
See Note 2) 2. Harvest cells by centrifugation for 10 min at 8959 × g (e.g.,
6000 rpm, Beckman JLA 8.1 rotor) at 4 °C (see Note 4).
Discard the supernatant.
3. Resuspend cell pellet(s) in 45 mL cold dH2O per liter of cell
culture. Twirling with an inoculation loop and vortexing helps
to resuspend the often rather compact pellets.
4. Centrifuge for 10 min at 3220 × g (e.g., 4000 rpm Eppendorf
5810R) and 4 °C, discard the supernatant, and determine mass
of this wet cell pellet. This is the “g wet cell pellet” that is
referred to in the following steps (see Note 5).
5. Resuspend cell pellet in 2 mL preincubation solution (see Note
1) per g wet cell pellet and incubate for 30 min at 30 °C while
shaking.
6. Centrifuge for 5 min as in step 4, discard the supernatant, and
resuspend in 40 mL cold 1 M sorbitol per g wet cell pellet.
7. Centrifuge for 5 min as in step 4, discard the supernatant, and
resuspend in 5 mL 1 M sorbitol + β-mercaptoethanol per g wet
cell pellet.
8. Dilute an aliquot (e.g., 50 μL) of this suspension with dH2O
such that OD600 can be measured in a reasonable range in your
photometer, e.g., around 0.5.
9. Add 100 μL 20 mg/mL zymolyase suspension per g wet cell
pellet and incubate for 30 min at 30 °C while shaking.
10. Control spheroplasting efficiency by repeating step 8, which
should yield at most 40% of the OD600 value measured in step
8 (see Note 6).
11. Centrifuge for 8 min as in step 4, discard the supernatant, and
resuspend in 40 mL cold 1 M sorbitol per g wet cell pellet (see
Note 7).
130 Elisa Oberbeckmann et al.

12. Centrifuge for 8 min as in step 4, discard the supernatant, and


resuspend in 7 mL cold Ficoll solution per g wet cell pellet.
13. Make aliquot(s) corresponding to 0.3 g wet cell pellet and
centrifuge each aliquot for 30 min at 21,546 × g
(15,000 rpm Beckmann, JA20.1) at 4 °C.
14. Decant the supernatant (see Note 8), close tube with parafilm
or lid, and shock freeze chromatin pellets for 10 min in dry
ice/ethanol bath (see Note 9).

3.2 Preparation 1. Grow 7 × 109 cells of S. cerevisiae or S. pombe wild-type strain in


of S. cerevisiae and S. YPD or YES medium, respectively, to a density of approx.
pombe gDNA 3 × 108 cells/mL.
(See Note 10) 2. Purify gDNA with Blood & Cell Culture DNA Midi Kit (QIA-
GEN) using 100/G QIAGEN Genomic-tip (see Note 11).
Start by harvesting yeast by centrifuging at 3000–5000 × g
for 5–10 min at 4 °C.
3. Remove the supernatant and resuspend cell pellet in 4 mL of
TE-buffer by vortexing.
4. Centrifuge as in step 2, discard the supernatant, and thor-
oughly resuspend pellet in 4 mL Buffer Y1 supplemented
with β-mercaptoethanol by vigorous vortexing.
5. Add 250 μL zymolyase suspension and incubate for 30 min at
30 °C.
6. Centrifuge at 5000 × g for 10 min at 4 °C.
7. Remove the supernatant and thoroughly resuspend cell pellet
in 5 mL Buffer G2 (including RNaseA) by inverting tube or
vortexing.
8. Add 100 μL of QIAGEN Proteinase K and incubate for 30 min
at 50 °C.
9. Centrifuge as in step 6.
10. Keep the supernatant and discard the pellet.
11. Equilibrate QIAGEN Genomic-tip 100/G with 4 mL Buffer
QBT by gravity flow.
12. Vortex the supernatant of step 10 for 10 s at top speed and
load onto the QIAGEN Genomic-tip equilibrated in step 11.
13. Wash QIAGEN Genomic-tip with 2× 7.5 mL Buffer QC by
gravity flow.
14. Elute with 5 mL Buffer QF by gravity flow into a clean tube.
15. Add 3.5 mL (0.7 volumes) isopropanol to the eluate of step 14
and mix.
16. Centrifuge at >5000 × g for at least 15 min at 4 °C.
17. Remove the supernatant without disturbing or losing the
glassy DNA pellet.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 131

18. Wash DNA pellet with 2 mL cold 70% ethanol by brief


vortexing.
19. Centrifuge at >5000 × g for 10 min at 4 °C.
20. Repeat step 17.
21. Resuspend air-dried (5–10 min) DNA in 0.5 mL of TE-buffer
and incubate overnight at RT or for 1–2 h at 55 °C for
complete dissolution.
22. Determine gDNA concentration with Qubit.

3.3 Optional: Digestion of S. pombe gDNA is necessary only for the cut–all cut
Restriction Enzyme method to obtain the S. pombe gDNA spike-in.
Digest of S. pombe
1. Digest 20 μg gDNA with 100 Units of your chosen RE in
gDNA 200 μL of respective 1× RE buffer, e.g., 1× CutSmart Buffer
(NEB), for 1.5 h at the temperature according to used RE, e.g.,
37 °C. Do NOT use the same RE as for chromatin digestion.
2. Stop the digest by addition of 50 μL 10× STOP-Buffer and
proteinase K to a final concentration of 0.5 μg/μL.
3. Incubate for 45 min at 37 °C. Store gDNA at 4 °C (see Note
12).

3.4 Chromatin Digest 1. Per RE, thaw chromatin pellet corresponding to 0.3 g wet cell
with Restriction pellet and keep on ice. Chromatin pellet corresponding to 0.1 g
Enzymes and DNA wet cell pellet is used per individual RE-digest or mock sample,
Purification e.g., per zero/low/high RE concentration (see Note 13).
2. Resuspend chromatin pellet corresponding to 0.3 g wet cell
pellet in 2 mL ice-cold 1× RE-Buffer (e.g., 1× CutSmart or
specific 1× RE-buffer) by vortexing. Make sure that the sample
does not get too warm and that no clumps remain.
3. Centrifuge for 5–8 min at ~750 × g (2000 rpm Eppendorf
5810R) and 4 °C.
4. Decant the supernatant, resuspend pellet in 0.6 mL 1×
RE-Buffer by vortexing, and aliquotize into three 200 μL ali-
quots in 1.5 mL tubes.
5. Add RE to desired concentration, e.g., 0, 5, and 20 μL 20 U/μ
L RE (NEB) (see Note 14). To each sample, add the same total
volume of either RE or RE storage buffer.
6. Incubate for 0.5 h (see Note 15) at the temperature according
to used RE, e.g., 37 °C for BamHI.
7. Stop reaction with 1/10 volume of 10× STOP-Buffer, vortex,
and add 1/20 volume proteinase K (see Note 16).
8. Incubate for 0.5–1 h or up to overnight at 37 °C.
9. Optional: For cut–all cut method, add 12 μL (amounts to
ca. 1 μg, i.e., ca. 10% of RE digested chromatin sample’s
132 Elisa Oberbeckmann et al.

DNA mass) of RE-digested S. pombe gDNA spike-in prepared


in Subheading 3.3.
10. Add 1/5 volume 5 M NaClO4 and vortex.
11. Add 1 volume phenol (see Note 17) and vortex for 5 s.
12. Add 1 volume chloroform/isoamyl alcohol and vortex for 5 s.
13. Centrifuge for 5 min at 21,230 × g (15,000 rpm Eppendorf
5424R) at RT.
14. Transfer the upper aqueous phase to a fresh 1.5 mL tube and
repeat steps 11–14 once.
15. Add 2.5 volumes 100% ethanol, mix by inverting, and incubate
on ice for 5 min.
16. Centrifuge for 20 min at ≥20,000 × g at 4 °C.
17. Discard the supernatant and add 700 μL 70% ethanol to pellet.
18. Centrifuge for 5 min at ≥20,000 × g at 4 °C.
19. Discard the supernatant, centrifuge again briefly, and remove
the remaining ethanol with a pipette tip. Avoid disturbing the
pellet.
20. Air-dry for 2 min or until ethanol is gone. Avoid overdrying as
this makes resuspension more difficult.
21. Resuspend the pellet in 200 μL TE-buffer.
22. Add 5 μL RNase A, mix, and incubate sample for 5 min at 65 °
C to increase digestion of some folded RNA.
23. Incubate on ice for 1 min and proceed with RNase-digest for
1 h at 37 °C.
24. Add 1/10 volume 1 M KOAc and vortex.
25. Add 0.8 volumes isopropanol, mix by inverting, incubate for
2 min at RT.
26. Centrifuge for 10 min at 21,230 × g (15,000 rpm Eppendorf
5424R) at RT.
27. Repeat once steps 17–20.
28. Resuspend the pellet in 85 μL ddH2O. DNA solution can be
stored at -20 °C.

3.5 Optional: Second 1. Take two 40 μL aliquots from the purified DNA sample after
RE Digest for Cut–All step 28 in Subheading 3.4, add 5 μL 10× RE-Buffer (e.g., 10×
Cut Method CutSmart Buffer (NEB)) to each aliquot and vortex.
2. Label one aliquot as “100% digested” and add the same RE as
used for chromatin digestion of this sample, e.g., 4 μL 20 U/μ
L RE. Mix gently.
3. Label the other aliquot as “X% digested” and add same volume
of RE storage buffer as the volume of RE added in step 2.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 133

4. Incubate both samples for 1.5 h at the temperature appropriate


for the RE, e.g., 37 °C for BamHI.
5. Stop reaction by adding 6 μL 0.2 M EDTA pH 8. Optional:
heat-inactive the RE (see Note 18). Samples may be stored at -
20 °C.

3.6 DNA Shearing 1. Add 120 μL TE-buffer to 10 μL sample (corresponding to


and Purification about 0.8–1 μg DNA) of either step 28 in Subheading 3.4
(without second RE digest) or from step 5 in Subheading 3.5
(with second RE digest).
2. Transfer to a MicroTUBE AFA Fiber Pre slit 6 × 16 mm and
shear sample in a Covaris S220 sonicator. Settings: Dutyfactor:
10%, 175 W, 200 cycles per burst, 180 s, 4 °C.
3. Transfer the sheared sample to a 1.5 mL tube.
4. Add 300 μL AMPure XP Beads (2.5×), vortex, spin down
briefly to collect sample, and incubate for 10 min at RT.
5. Collect beads in a magnetic rack and remove the supernatant
with pipet.
6. Add 500 μL freshly prepared 80% ethanol (RT).
7. Incubate for 30 s at RT. Keep tubes in magnetic rack.
8. Decant the supernatant and repeat steps 4–7 once.
9. Decant the supernatant, spin down briefly at RT, and place
tubes back in the magnetic rack.
10. Remove the remaining ethanol with a 10 μL pipette.
11. Remove tubes from rack, add 100 μL 0.1× TE, vortex, and spin
down briefly.
12. Incubate for 2 min at RT, put tubes back to the magnetic rack,
and transfer 98 μL of the cleared solution to a fresh
1.5 mL tube.
13. Determine DNA concentration with Qubit. DNA can be
stored at -20 °C.

3.7 Sequencing 1. Use 100–200 ng (see Note 19) DNA of step 12 in Subheading
Library Preparation 3.6 for library preparation with NEBNext Ultra II DNA
Library Prep Kit. Adjust volume to 50 μL with 1× TE-buffer.
2. Add 7 μL NEBNext Ultra II End Prep Reaction Buffer and
3 μL NEBNext Ultra II End Prep Enzyme Mix and mix thor-
oughly by pipetting up and down.
3. Incubate in thermocycler with lid to at least 75 °C for 30 min at
20 °C, 30 min at 65 °C, and hold at 4 °C.
4. Add 2.5 μL NEBNext Adaptor for Illumina, 30 μL NEBNext
Ultra II Ligation Master Mix (mixed by pipetting prior to
addition, very viscous, ensure proper mixing), and 1 μL
134 Elisa Oberbeckmann et al.

NEBNext Ligation Enhancer. Mix well by pipetting up and


down and spin briefly to collect sample. Ligation Master Mix
and Ligation Enhancer may be combined as a master mix, but
not the adaptor.
5. Incubate for 15 min at 20 °C in thermocycler without lid
heating.
6. Add 3 μL USER Enzyme, mix thoroughly, and incubate for
15 min at 37 °C in thermocycler with lid heated to at least 47 °
C.
7. Prewarm AMPure XP Beads for 30 min at RT, add 87 μL of
these beads to the sample, mix thoroughly by pipetting or
vortexing, and incubate for 5 min at RT.
8. Place in magnetic rack to collect beads for ca. 5 min or until the
solution is clear and discard the supernatant.
9. Add 200 μL of freshly prepared 80% ethanol to tube that is still
in the magnetic rack, incubate for 30 s at RT, and discard the
supernatant.
10. Repeat once step 9.
11. Remove traces of ethanol with pipet tip and air-dry for up to
5 min at RT.
12. Outside of the magnetic rack, elute DNA from beads by adding
17 μL with 5 mM Tris–HCl pH 8. Mix by pipetting, incubate
for more than 2 min at RT, briefly spin down, and put back into
magnetic stand.
13. Incubate for 5 min or until the solution is clear and remove
15 μL of supernatant into clean PCR tube. May be stored at -
20 °C.
14. Add 25 μL NEBNext Ultra II Q5 Master Mix, 5 μL chosen
Index Primer, and 5 μL Universal PCR Primer (or chosen i5
Primer). Mix thoroughly by pipetting. Briefly spin down.
15. Incubate in thermocycler with the following program: 30 s at
98 °C, 3–8 cycles (see Note 20) of 10 s 98 °C, 75 s 65 °C,
5 min at 65 °C, and hold at 4 °C.
16. Prewarm AMPure XP Beads for 30 min at RT, add 45 μL of
these beads to the sample, mix thoroughly by pipetting or
vortexing, and incubate for 5 min at RT.
17. Repeat once steps 8–13, but elute with 33 μL 0.1× TE-buffer
in step 12 and remove 30 μL in step 13. Determine DNA
fragment size distribution and concentration on an Agilent
Bioanalyzer High Sensitivity DNA chip.
18. Sequence the library by Illumina sequencing in 42 or 50 bp
paired-end mode such that >40-fold genome coverage is
ensured, e.g., 5–10 × 106 reads for on average 100–200 bp
long yeast genome fragments.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 135

Table 1
Mixing scheme for calibration curve samples

Percent cut 0% 10% 30% 50% 70% 90% 100% Total amount
Uncut gDNA (= mock digest) (μg) 1 0.9 0.7 0.5 0.3 0.1 0 3.5
RE-digested gDNA (μg) 0 0.1 0.3 0.5 0.7 0.9 1 3.5

3.8 Calibration Curve 1. Treat 25 μL of step 3 in Subheading 3.3 (corresponding to


2 μg RE-digested S. pombe gDNA spike-in) according to steps
15–20 in Subheading 3.4 and resuspend in 20 μL TE-buffer.
2. Mix 8 μg S. cerevisiae gDNA prepared in step 21 of Subheading
3.2 with 0.8 μg S. pombe gDNA spike-in that was ethanol
precipitated according to step 1 and is already digested with
an RE (= REspike-in), which is NOT any of the REs, for which
the calibration curve shall be generated (see Subheading 3.3).
3. Digest 4 μg of this gDNA mixture with the RE, for which the
calibration curve shall be generated, following steps 1–3 in
Subheading 3.3, but scaled down fivefold (see Note 21).
4. Mock digest as in step 3, but with RE-storage buffer instead
of RE.
5. Mix defined (see Note 22) percentages of cut gDNA according
to the scheme given in Table 1.
6. For each percentage mix (1 μg DNA each), top up volume to
130 μL with TE-buffer and follow Subheading 3.6 starting at
step 2.
7. Prepare sequencing libraries from these purified DNA samples
as in Subheading 3.7.

3.9 Bioinformatics Overviews of the bioinformatics steps for the cut–all cut and the
Analysis cut–uncut method are shown in Figs. 4 and 5.
1. Map fragments with bowtie2 using the combined S. cerevisiae
and S. pombe reference genome (see ‘reference_genome/Scer-
AndSpomWithMT.fsa’: with chromosomes named as follows:
chr01–chr16 for the 16 S. cerevisiae and chrI, chrII, chrIII for
the three S. pombe chromosomes). Use alignment parameters:
‘-X 500 --no-discordant --no-mixed --no-unal’.
2. Remove multiply mapped reads using ‘samtools view -hf 0x2’.
3. Index BAM file using ‘samtools index’.
4. Download this repository (https://github.com/gerland-
group/ORE-seq_analysis) (see Subheading 4.1).
5. Install R & packages detailed in ‘restriction_enzyme/
RE_Rprofile.R’.
136 Elisa Oberbeckmann et al.

Fig. 4 Flow chart for bioinformatic analysis steps for cut–all cut method

Fig. 5 Flow chart for bioinformatic analysis steps for cut–uncut method

6. Make new folder ‘< Example>’ in folder “restriction_enzyme”


for your analysis.
7. Make new folders ‘data/bam’ within ‘<Example>’.
8. Put bam and bai files into ‘<Example>/data/bam’. Name files
according to the following rules.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 137

9. Start R and set working directory to ‘<Example>’.


10. ‘source(“../RE_analysis.R”)’ or run ‘RE_analysis.R’ step by
step in RStudio from within ‘<Example>’.
11. Find desired output files and plots:
a. The script creates a folder structure beginning with the
folder ‘analysis_results’, which contains different subfolders
for each type of plot and result files.
b. Depending on the parameters chosen in the script, the main
results path is ‘analysis_results/window_limit_times_1_-
max_length_500/close_distances_200_300/background_-
Michael’ (in the following called ‘MAIN’), with plot folders
for certain intermediate results along the way.
c. Genomic mean accessibilities are saved in ‘MAIN/acc_site_-
means_simple.txt’, with all_mean = cut–all cut results,
cut_uncut_all_3 = uncorrected cut–uncut result, cut_un-
cut_4 = corrected cut–uncut result.
d. Histograms of site accessibilities are saved in ‘MAIN/acces-
sibility_histograms/’ for plus/minus strand and starting/
ending fragments as well as combined results (last column)
with all_mean = cut–all cut results, cut_uncut_all_3 =
uncorrected cut–uncut result, cut_uncut_4 = corrected
cut–uncut result.
e. Individual site results are saved in an R dataframe in
‘MAIN/occs_df_list.RData’, with columns chr, enzyme,
pos, eff_coverage, eff_cuts, occ_X_1 (= cut–all cut), occ_-
cut_uncut_2 (=uncorrected cut–uncut), and occ_cut_un-
cut_4 (= corrected cut–uncut).

3.9.1 Sample Naming Bam files within ‘data/bam’ need to follow these naming
Rules conventions:
1. The script needs bam files for both samples (X% cut and 100%
cut) with identical file name except the ending: Samples with
one RE digest end with ‘_X.bam’, while samples with second
RE digest end with ‘_1.bam’.
2. If there was no second digest and only cut–uncut analysis is
wanted, the ‘_1.bam’ file can be a copy/hard link of the ‘_X.
bam’ file and the cut–all cut results should then be ignored.
3. File names must contain the RE name of the enzyme present in
the sample after an “_” sign, e.g., ‘<Strain>_BamHI-
HF_ < RE units>_X.bam’, where the information in
‘<Strain>’ and ‘ < RE units>’ is not used by the script and
‘<RE units>’ could be omitted.
138 Elisa Oberbeckmann et al.

4. If the spike-in (if present) used a different RE, then add


‘<spike-in RE> -norm’, e.g., ‘<Strain>_AluI_EcoRI-
norm_<RE units>_X.bam’.
5. Usable RE names can be checked and added in the ‘RE_info.
txt’ file.
6. Multiple REs can be used on the main genome (not the spike-
in): ‘<Strain>_BamHI-HF_<RE units>_KpnI_<RE
units>_X.bam’, which will be analyzed accordingly.
7. To only get results of one RE if others were present in the same
sample, set parentheses to ignore REs: ‘<Strain>_BamHI-
HF_<RE units>_(KpnI_400)_X.bam’. This will be analyzed
similarly to ‘<Strain>_BamHI-HF_<RE units>_X’, with the
following difference:
8. The sites of the ignored RE (and their neighborhoods) will still
be excluded when calculating the background.
9. The sites of the ignored RE might exclude sites of the “main”
enzyme when they are close to each other.
10. For calibration samples to be used for fitting the uncut correc-
tion factor, add the cut percentage with ‘X_pct_cut’ as in this
example: ‘<Strain>_AluI_10_pct_cut.bam’.

3.9.2 How to Use Other Add the required information to ‘RE_info.txt’, see ‘RE_info_R-
REs EADME.txt’.

3.9.3 How to Use Other Other genomes than S. cerevisiae (and S. pombe for spike-in) are not
Genomes supported by default, as there are unfortunately several references
to the chromosome names within the code. If these are treated
properly, the script should run for other genomes as well.

3.9.4 How to Fit Uncut Run section ‘3.1.1 Calc and plot deviation from calibration sam-
Correction Factors ples’ in the script that is skipped by default.

4 Notes

1. β-Mercaptoethanol is toxic. Use appropriate personal safety


equipment (lab coat, safety glasses, gloves) and handle in
fume hood. Probably β-mercaptoethanol can be replaced by
DTT, probably at lower concentration.
2. The yeast chromatin prepared by such methods is sometimes
called “nuclei,” although this protocol will not yield nuclei in
the clean sense, for example if checked by electron microscopy,
as obtained, for example, for HeLa cell nuclei. For comparison
with other protocols for the preparation of yeast chromatin, see
for example [10–15].
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 139

3. Do not crosslink cells with formaldehyde. This will impair


chromatin digestion by REs and lead to too high occupancy
values [6]. This amount of cells is sufficient for preparing
chromatin for digestion with one type of RE. Linear upscaling
is readily possible if enough chromatin shall be prepared for
analysis with several REs.
4. Cells are still alive until the cell lysis in step 12. So if you need
to monitor some special conditions, e.g., high temperature for
a temperature-sensitive mutant, you may have to keep up these
conditions during chromatin preparation [16]. The tempera-
tures given in the protocol are for our routine chromatin pre-
parations from wild-type or mutant cells, like an isw1 or gcn5
mutant, grown to log phase at 30 °C.
5. This is a somewhat shaky measure as it depends on the residual
amount of water in the pellet. Nonetheless, for most purposes
it works just fine. If a more exact measure is needed, you could
go by cell number via OD600 or cell counting.
6. If lysis is not efficient, as happens for example with stationary
cells or very sick strains, either use more zymolyase (more
effective, but may introduce more protease contamination
alongside with zymolyase) or incubate longer (may not help
much, but longer incubation usually does not harm as cells are
still closed at this point) or go to 35 °C (optimal temperature
for zymolyase).
7. From here on, pellets are not so tight and stable anymore. Take
care not to lose the pellet! Maybe reduce vortex speed from
now on to about half. Resuspending is now more difficult due
to clumps. Use less volume first and stirring with inoculation
loop for better resuspension efficiency.
8. Usually, there remains some sticky stuff at the tube walls that
cannot be poured off and that is also not part of the pellet.
Remove this sticky stuff, for example, with a cotton swab or
tissue paper wrapped around a spatula, without disturbing the
pellet. This also allows to remove residual Ficoll solution.
9. Tube labeling may wash off in dry ice/ethanol bath.
10. S. pombe gDNA is only required for the cut–all cut method.
11. Refer to the manufacturer’s detailed instructions for using this
kit. This protocol states the basic steps only.
12. EDTA from the STOP-Buffer chelates all Mg2+ and other
divalent cations so that DNases that usually depend on divalent
cations are inhibited and the gDNA is stable at 4 °C. As the
proteinase K will inactivate the RE and as the S. pombe gDNA
spike-in is usually added to crude chromatin, purification of the
S. pombe gDNA is not necessary at this point. Omitting the
purification and storage at 4 °C instead of -20 °C avoids
140 Elisa Oberbeckmann et al.

unnecessary DNA breaks in the very long gDNA. Nonetheless,


the S. pombe gDNA is purified in the context of the calibration
curve (see step 1 in Subheading 3.8), as this is done also with
purified S. cerevisiae gDNA and not crude chromatin.
13. It is of principal importance to ensure saturated digestion. We
routinely use two sufficiently different, e.g., fourfold different,
RE concentrations. Only if both of them yield approximately
the same accessibility value within the same digestion time,
e.g., within 5 percent points or some other criterion depending
on the application and desired accuracy, then RE digestion was
not limiting. This logic only applies if the dynamics of the
chromatin substrate were sufficiently frozen, i.e., DNA accessi-
bility did not change over time. This can only be tested by
following digestion time courses. Sufficient chromatin stability
is usually the case in the absence of ATP and the presence of
>2 mM Mg2+, but may be an issue for S. cerevisiae chromatin at
low Mg2+ concentrations [6]. If several different REs are used,
the mock digestion sample does have to be prepared for all
of them.
14. RE preparations and RE storage buffer usually contain a high
glycerol concentration. Upon addition of RE or RE storage
buffer, avoid a final concentration of >5% glycerol if the RE is
prone to star activity.
15. The incubation time has to be long enough to reach saturation
for the chosen RE concentrations. However, this RE digestion
occurs in crude chromatin so that overlong incubation
increases side effects. For example, chromatin preparations
may contain endogenous exo- and endonucleases, which may
resect DNA ends introduced by the RE or lead to not
RE-caused DNA breaks, respectively. The former is detected
and controlled for in the bioinformatic analysis (Subheading
3.9) and the latter by comparison with the mock digestion. The
influence of endogenous nucleases can be dampened by lower-
ing RE digestion time and/or temperature and/or Mg2+ con-
centration. While all these measures will also compromise the
RE digestion, this can be compensated by increasing RE con-
centration. The RE digestion conditions given in the protocol
here were found to work well for wild-type S. cerevisiae chro-
matin [6], but may have to be modified for other applications.
16. If the 1× RE-buffer contains enough potassium to cause pre-
cipitation at low temperature in the presence of SDS from the
STOP-Buffer, as the case for 1× CutSmart buffer (NEB), keep
the sample always at 37 °C until phenol/chloroform
extraction.
17. Phenol is toxic and β-Mercaptoethanol is also toxic. Use per-
sonal safety equipment (lab coat, safety glasses, gloves) and
handle in fume hood.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 141

18. Heat inactivation is not always possible depending on the


RE. However, just stopping by EDTA addition is also sufficient
as this will reliably inactivate the RE digest and the RE is later
on removed after shearing by AMPure bead purification.
19. We noted (see Fig. 2) a scoring bias that may stem from a bias
toward RE-generated versus shearing-generated DNA ends
and attribute this to incomplete DNA end repair and impaired
sequencing adapter ligation of the latter in comparison to the
former DNA ends. This may be due to incomplete removal of
3′ phosphate groups or other chemical modifications gener-
ated by shearing [7], but not by REs, that are not efficiently
processed by end repair enzymes in the library preparation kit.
Additional pretreatment with the DNA repair Pre-CR kit
(NEB, M0309) may help, but others still observed a similar
bias in their calibration curve despite using this kit [8]. To
increase the likelihood of sufficient DNA end repair, we rec-
ommend to increase the ratio of repair enzymes relative to
DNA and therefore to use only 100–200 ng, although the
kit’s manufacturer recommend to use up to 1 μg of DNA.
20. Avoid using too many PCR cycles as this will increase the
occurrence of clonal PCR duplicates. A maximum of 8 cycles
is recommended, usually 6 cycles are sufficient and preferred.
21. Some REs, especially the HF (high fidelity) version sold by
NEB may stick to the DNA ends after cutting their site. This
may inhibit the adapter ligation reaction during sequencing
library preparation. Therefore, it is recommended that the RE
is removed by proteinase K digestion and further DNA
purification.
Treatment of mock-digested gDNA mix in parallel ensures
that the DNA concentration for RE-digested and undigested
gDNA stays the same, i.e., generating the defined percentages
in the following step 6 can be done via mixing corresponding
volumes without measuring DNA concentrations.
22. Your pipetting precision and accuracy will decide how well
defined these percentages are. Follow the recommendations
of the manufacturer of your pipets. Use the same pipet and
setting for each percentage if you prepare calibration curves for
several REs. The DNA masses given in Table 1 can be esti-
mated from steps 2 and 3. The exact DNA amount is less
important than the exact volume ratio of cut versus uncut
gDNA (see Note 21).

4.1 Bioinformatic The bioinformatics analysis is a modified version based on the


Notes analysis in our first application [6] and in MRW’s PhD thesis
[17]. Our script deposited at GitHub (https://github.com/
gerland-group/ORE-seq_analysis) is written for use with several
142 Elisa Oberbeckmann et al.

REs (thoroughly tested with AluI, BamHI, and HindIII) and the
S. cerevisiae genome and with an S. pombe spike-in for the cut–all
cut method. For custom applications of other REs and especially
other genomes, the script has to be modified or newly written by a
bioinformatics expert. As we cannot foresee future applications by
other users, we just state in the following in detail the underlying
rationale and mathematical background.
In the following, steps marked with * are only needed for the
cut–all cut method, which for example needs a normalization
between the cut and all cut sample using an S. pombe gDNA
spike-in (Fig. 4). Likewise, steps marked with ° are only needed
for the cut–uncut method, like the counts of uncut fragments at a
given cut site (Fig. 5). Note that this description and our script are
an all-in-one solution that calculates the outcome according to
both methods at the same time.
First follow the steps for mapping/indexing and download the
script files as described in Subheading 3.9 and have a look at the
readme file of the repository.
The script then performs the following actions:

4.1.1 Map/Filter Reads 1. Extract paired-end read information: chromosome, start, end,
and strand information, with end positions shifted by +1 bp.
2. Remove fragments that are longer than 500 bp.
3. Remove rDNA fragments by excluding the following loci:
S. cerevisiae chr. 12: 451500–495000
S. pombe chr. 3: 0–30000
S. pombe chr. 3: 2430000–2452883

4.1.2 Count Cut and 4. Count the starting/ending fragments on plus and minus strand
Uncut° Fragments cτ(x) for each genomic position x with τ = 1, 2, 3, 4, denoting
starts on plus, starts on minus, ends on plus, and ends on minus
strands, respectively. For starting reads, we count the position
of the first base pair, and for ending reads, we count the
position after the last base pair (i.e., end positions are shifted
by +1 bp). We use the notation c 1τ ðx Þ and c 2τ ðx Þ for the sample
without and with second RE digest, respectively. For later
modeling, we assume that one single given fragment with
RE-cut or sheared fragment start or end at x will on average
yield pxτ counts after PCR and Illumina sequencing.
5. For the cut–uncut method, we need the uncut fragments for
fixed genomic positions x, i.e., fragments that start before x - d
and end after x + d (end positions are shifted by +1 bp) in the
sample without second RE digest. The extension by d is needed
due to the fact that not all RE cut both strands at the same
position, as explained later. We denote this number of uncut
fragments with u1τ ðx, d Þ, also using the index τ as in step 4 to
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 143

differentiate between plus (τ = 1 or 3) and minus strand (τ = 2


or 4). We assume that one such uncut fragment at x will on
average results in q xτ counts after PCR and Illumina sequencing.

4.1.3 Determine Cut Site 6. Determine the cut site positions, i. e., the positions of the RE
Positions with RE Motif recognition motif, on both* genomes including generation of
the actual DNA ends by end polishing in the following way. We
define xi as the position of the first base pair of the recognition
motif of cut site i plus half the length of the recognition motif,
which usually has an even length.
HindIII as an example with ‘|’ denoting the cut in both
strands:
Position xi is given by the underlined base:
+ strand: 5′-...A|A G C T T...-3′
- strand: 3′-...T T C G A|A...-5′
In case of a 5′ overhang, the 3′ end is elongated to match
the 5′ end during DNA end polishing by a DNA polymerase.
Conversely, a 3′ overhang is digested to match the recessed 5′
end during DNA end polishing by a 5′–3′ exonuclease. For
such end-polished HindIII ends, we get the following double
stranded fragment ends:
Position xi is given by the underlined base:
+ strand: ending: 5′-...A A G C T-3′ and starting: 5′-A
G C T T...-3′
-strand: ending: 3′-...T T C G A-5′ and starting: 3′-T
C G A A...-5′
Let Δs be the shift length from the pattern center to the cut
position of the + strand in upstream direction, which corre-
sponds to half the length of the 5′ overhang of the cleavage
product in bp. For HindIII, Δs = + 2, Δs = 0 for blunt end
cutting RE, whereas in case of an RE with 3′ overhangs, Δs is
negative.

Recognition motif (vertical line indicates cut Shift length


RE position) Δs
AluI AG|CT 0
BamHI G|GATCC 2
HindIII A|AGCTT 2
EcoRI G|AATTC 2
HhaI GCG|C -1
KpnI GGTAC|C -2
144 Elisa Oberbeckmann et al.

Assuming proper end polishing of cut fragments as described


above, we have the following counts for site i:
   
Counts of starting reads on + strand: c 11 x i - Δs and c 21 x i - Δs
   
Counts of starting reads on - strand: c 12 x i - Δs and c 22 x i - Δs
   
Counts of ending reads on + strand: c 13 x i þ Δs and c 23 x i þ Δs
   
Counts of ending reads on - strand: c 14 x i þ Δs and c 24 x i þ Δs
 
Uncut read on + strand°: u11 x i , Δs
 
Uncut read on - strand°: u12 x i , Δs
To obtain the number of fragments not cut° by the RE at a
given site, we count all fragments  that
 start before xi - Δs and end
after x + Δs, yielding u1τ x i , Δs . For easier notation, we set
i

x i1 = xi2 =x i -Δs and x i3 = x i4 = x i þ Δs , yielding the cuts at site


i as c 1τ x iτ , c 2τ x iτ , τ = 1, 2, 3, 4.
If the cut–all cut method is used, this step needs to be done on
both the S. cerevisiae and S. pombe genomes. In the S. pombe gDNA
spike-in, a different RE can be used.

4.1.4 Remove RE Sites 7. Especially for X% samples derived from RE digestion of chro-
with Close Neighbor RE matin, uncut fragment counts are increased at cut sites with any
Sites neighboring cut site within approx. 150 bp. Thus, we ignore
RE sites completely if they have a neighbor within 200 bp in
either direction. This cutoff may be adjusted depending on
given samples. We denote the set of leftover sites with I and J,
for the S. cerevisiae and S. pombe* genomes, respectively.
8. As shown in Fig. 6, we often saw dependencies between the
fragment counts C iτ and A iτ (defined below) and the distance to
the next neighboring RE site, ranging up to 300–500 bp, e.g.,
for starting reads and the downstream distance to the next
neighbor RE site (Fig. 6b). Thus, we ignore start or end cut
counts of an RE site and near the RE site (see RE site window
approach below), if the next RE site downstream or upstream,
respectively, is closer than 300 bp, respectively. Note that this
value can be further tuned to the experimental conditions,
although for our calibration samples shown in Fig. 2, there
was hardly any difference between this limit set to 300 bp or a
more conservative 500 bp. In general, the higher the degree of
shearing, i.e., the shorter the average fragment length, the
lower this limit can be. See also legend to Fig. 6.
9. In our protocol here, we modified the original protocol [6]
such that different REs are used for digesting S. cerevisiae
chromatin or S. pombe gDNA spike-in, for example BamHI
and EcoRI, respectively. In this case, the EcoRI sites in the
S. cerevisiae genome are not considered when determining
close RE sites. However, since the second RE digest is applied
after including the S. pombe gDNA spike-in, the BamHI sites
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 145

Fig. 6 Scoring bias due to close next neighbor RE sites. Exemplary selection from the “cut_counts_vs_nn_-
distance” plots for AluI 50% cut calibration sample (as in Fig. 2) are shown. Our script automatically generates
such plots for the “X%” (a, b) and “100%” (c, d) samples (X or 1 in y-axis label) that show the number of reads
146 Elisa Oberbeckmann et al.

need to be considered when removing the close EcoRI site on


the S. pombe genome, i.e., remove all EcoRI sites with a close
EcoRI neighbor or a close BamHI neighbor as described in this
subheading above.

4.1.5 Collect Cut and 10. Due to endogenous exonucleases that may be present in the
Uncut° Counts Within chromatin preparations and trim DNA ends after RE cleavage,
Window Near Cut Sites to some fragment ends do not match the RE cut site positions any
Correct for Resection more, even though they were generated by the RE. Thus we
ä

Fig. 6 (continued) (position of the marker along the y-axis) that map to the indicated combinations (y-axis
label) of the plus strand of the chromosome and start (a, b) or end (c, d) (y-axis label) at a given RE site (0 on x-
axis) that have a next neighbor (nn) site for the same RE at a given upstream (a, c) or downstream (b, d)
distance (x-axis label) in bp (position of the marker along the x-axis). Analogous plots are also generated for
minus end reads. We found that the strand identity does not matter, but rather the orientation (upstream
versus downstream) relative to whether a read starts or ends at the RE site. (e, f) Plots for sequencing reads
analogous to those in panels a and b, but for uncut fragments where the given RE site was not cut. The green
lines correspond to the average at a given x-axis position. Our interpretation of the observed curve shapes is
as follows. (a) If the sequencing reads stem from fragments starting with the RE site, then next neighbor RE
sites upstream are irrelevant for scoring efficiency measured via the obtained read number as they are not
contiguous with the sequenced fragment anymore due to the RE cut and therefore do not affect scoring by
adapter ligation and sequencing. (d) The same is true for next neighbor RE sites downstream of an RE site
where a read ends. In contrast, next neighbor RE sites downstream (b) or upstream (c) of an RE site where a
read starts or ends, respectively, may be cut (are indeed cut to 100% in our calibration samples shown here)
and therefore generate DNA fragments with two RE cut ends and a consistent length that may be shorter than
the average length generated by the combination of one RE and one shearing end. Such fragments with two
RE ends are scored more efficiently (= above averages shown in panels A and D) than fragments with one RE
and one sheared end. The paucity in fragments <100 bp reflects the DNA fragment length cutoff of the
AMPure bead purification during this particular sequencing library preparation. Note that fragments of
>500 bp length are excluded from the analysis as Illumina sequencing becomes biased against longer
fragments, which explains that the curves level off to the average level (similar to green line in panels a and d)
beyond 500 bp next neighbor distance. If shearing is more extensive, i.e., if the average fragment length is
much shorter than 500 bp, then the curve will approach the average level at a next neighbor distance close to
the average fragment length as next neighbor sites beyond the average fragment length will not be contiguous
anymore
Note that especially for the “X%” samples generated from chromatin digestion, the x-axis need not reflect the
actual fragment length that gave rise to a certain sequencing read, but denotes a property of the genome
sequence (distance to the next neighbor RE site). Nonetheless, for calibration samples shown here, the x-axis
does mostly reflect actual fragment length as virtually all RE ends stem from the 100% cut S. cerevisiae gDNA
that was mixed with uncut S. cerevisiae gDNA. Fortuitous ends at RE sites due to shearing are negligible (e.g.,
less than 10 counts on y-axis here).
Finally, the bias due to next neighbor RE sites can also be apparent for uncut fragments where there is a
potentially cut RE site within approx. 150 bp of the view point RE site (0 on x-axis) in either the upstream (E) or
downstream (F) direction. While this is not much pronounced for the samples shown here as the uncut
fragments stem from mock digests in these calibration samples, it may be considerable in chromatin samples
and also calls for excluding such next neighbor sites.
The bioinformatics procedure that corrects for the next neighbor site bias by excluding these RE sites is
detailed in see Subheading 4.1.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 147

need to count the starting and ending fragments not only at


the exact cut positions, but also at some distance from it. The
amount of strand resection varies between samples, so its cor-
rection needs to be tailored to each pair of samples without and
with* second RE digest.
We define count windows for each fragment type: For read
starts, W1 = W2 = {0, 1, 2, . . ., w} to apply a window in the
downstream direction and for read ends, W3 = W4 = {0, -1,
-2, . . ., -w} to apply a window in the upstream direction. The
algorithm to find the optimal value for w is described at the end
of this step. C iτ denotes the number of cut fragments in the
sample without second RE digest (“cut sample”) and Aiτ
denotes the number of cut fragments in the sample with second
RE digest* (“all cut sample”):
X   X  
C iτ = a∈W τ
c 1 x iτ þ a and A iτ = c 2 x iτ þ a
a∈W τ
τ τ

w is determined using the sample without second RE


digest and for the data of the S. cerevisiae genome and then*
the same value is used for the sample with second RE digest
and the S. pombe genome as well. For mock-digested samples,
we set w = 5 to average over-fluctuations in the very low cut
counts at a single position.
In the case of ignored start counts of step 8, we set
C τ = NA and A iτ = NA for τ = 1, 2 and the same for τ = 3,
i

4 in the case of ignored end counts.


For normal samples, we use the following algorithm, which
makes sure that increasing w by 1, 2, 3, 4, or 5 bp does not
increase the summed counts within w by more than 1%, cor-
recting for cut counts from shearing.
Calculate the mean counts (averaged over all cut sites) at
each position -200 bp to 200 bp away from the average cut
site for starts and ends counts and both strands. These cut
counts near the average cut site usually show a single peak at
0, but depending on the conditions there is also a decreasing
shoulder downstream/upstream for starts/ends, respectively.
Averaging the different types and strands (end counts need to
be mirrored at 0 first) yields m(d), d being the distance to the
average cut site. The cut counts need to be corrected by the
average shearing cut counts, which we obtain 100–200 bp
away from the cut site: mc(d) = m(d) - hm(d)id = 100, . . ., 200
(h. . .i indicating the
Paverage). We define the cumulative sum of
counts by S ðwÞ = wd = 0 m c ðd Þ. Finally, we set w equal to the
first integer starting from 0 such that for all n ∈ {1, 2, 3, 4, 5},
the sum of the counts of the next n positions, S(w + n) - S(w),
is less than 1% of S(w). In our samples, typical values for
w ranged from 0 to 20, going up to 40 for samples with very
strong resection.
148 Elisa Oberbeckmann et al.

Uncut fragment counts at any RE cut site are not influ-


enced by endogenous exonucleases as they are still occupied by
a nucleosome or other protein that blocked the RE. For easier
notation, we define the uncut counts at site i by
 
U iτ = u1τ x i , Δs

Pw The c mean resection length is defined as


d =0 dm ð d Þ =S ðw Þ:

4.1.6 Occupancy We seek to estimate the real accessibility αi at cut site i using the cut
Estimation by Cut–All Cut counts of the cut and all cut samples taking into account a bias
Method with Background toward RE versus sheared fragment ends and effective sequencing
Correction and probabilities. We begin with viewing C iτ and A iτ as random variables
Normalization with the expectation values.
   
E C iτ = N C μi piτ with μi = αi þ 1 - αi s
 
E A iτ = N A piτ

where NC and NA are the number of cell cores in the samples


without and with second RE digest, respectively, and piτ is a factor
that combines the sequencing probabilities and the PCR multipli-
cation of fragments of type τ in the window Wτ at cut site i and is an
effective average of the pxτ with x∈x iτ þ W τ described earlier. The
probability that a given (longer) fragment will be cut by shearing
within a fixed region of length w + 1 within the fragment is denoted
by s (not to be confused with the site shift variable Δs).
Since the REs act before the shearing step, only the fraction
that has not been cut by the RE can be cut in the shearing step,
leading to
 
μi = αi þ 1 - αi s:

We assume that in the sample with second RE digest, all counts


near a cut site came from a cut of the RE and all counts far away
from cut sites occurred due to shearing.
We use these four estimators for μi and αi:
Ciτ NA i μiτ - s
b
μiτ ≔
b and b
α τ ≔ :
Aiτ NC 1-s

The estimators for αi are approximately unbiased, as


h i " # !
h i E b μiτ - s  i
i 1 1 NA
E bατ = = E Cτ E i - s ≈ αi ,
1-s 1-s Aτ N C
Because the C iτ and A iτ are statistically
h i independent as they
originate from different samples and E A1i ≈ E 1Ai : Note that the
τ ½ τ
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 149

   
two sets C iτ and A iτ within themselves, however, are statistically
dependent.
αiτ = NA, if A iτ = 0 or A iτ = NA (due to a close neighbor
We set b
in direction of τ).
To obtain NA/NC, we use the S. pombe gDNA spike-in cut
sites, which are completely cut in both samples:
 i
NA A τ i∈J ,τ
=  i
NC C τ i∈J ,τ

with hXi,τii,τ denoting the mean of Xi, τ over i and τ, ignoring NA


values.
Alternatively, the ratio of the number of sequenced S. pombe
reads could be used but gave slightly worse results in our
calibration runs.
To estimate the probability s, we look at the set Z of all genomic
positions in S. cerevisiae that are further away than 300 bp from any
cut site (including the ones with a close neighbor). At these posi-
tions,
 1 all counted starts and ends originate  from shearing. Thus,
c τ ðx Þ i∈Z ,τ is an estimator for N C s 1 pxτ ðx Þ i∈Z ,τ, with s1 being the
probability of shearing a long fragment at one fixed position.
Since piτ are effectively averages of pxτ in the cut site window of
site xi, so their averages over large regions of the genome are (with
good approximation) the same:
 x D E
1   i  1  i
pτ ðx Þ x∈Z ,τ ffi piτ ðx Þ = E A τ i∈I ,τ ffi A
i∈I ,τ NA N A τ i∈I ,τ
 
The average of E Aiτ over i and τ can be well approximated by
the average of A iτ , leading to our value for s1:
 1
N A c τ ðz Þ z∈Z ,τ
s1 =  i
NC A τ i∈I ,τ

Thus, s1 is the (correctly normalized) ratio of the average


fragment number at genomic positions where cuts can happen
only by shearing (counts in the sample without second RE cleavage
(“X%”) away from cut sites) and the average fragment number at
genomic positions where cuts have to happen by the RE (counts in
the sample with second RE cleave (“all cut” or “100%”) at the cut
sites).
Using s1, we can calculate the probability that a fragment is
sheared at least once within a fixed window of length w + 1:
s = (w + 1)s1. If a fragment is sheared more than once within a
window of length w + 1, the new fragments within the window will
be too small and filtered out before PCR and sequencing.
Due to the stochasticity in the values for C iτ and A iτ for fixed
i and τ, the estimators bαiτ can be smaller than 0 or larger than
1, even though the values they estimate, i.e., αi are between 0 and 1.
150 Elisa Oberbeckmann et al.

The lowest possible value of b αiτ is - 1 -s s , with s ≤ 0.15 in most


samples.
It is not useful to restrict the estimators b αiτ to [0; 1] before
averaging, because a 100% accessibility test sample has measured
accessibilities distributed around 1 in both directions. Capping the
estimators b αiτ at 1 would then give a mean value lower than 1.
However, very large outliers influence the mean very strongly,
even though the real value cannot be greater than 1, thus we cap the
values for bαiτ at 1.5 when averaging over τ,
D
E
αi = min b
b αiτ , 1:5 ,
τ

αiτ = NA, it
to obtain one accessibility estimate for each cut site i. If b
is ignored during the averaging step.
To obtain the global accessibility, we average over all sites:
D E
b
α= b αi
i∈I

When comparing the accessibility values of individual sites with


the measured values from other assays, it does make sense to restrict
αi to [0; 1], since this gives the best estimate for each
the values of b
individual site.

4.1.7 Occupancy In the following we only use data from the sample without second
Estimation by Cut–Uncut RE digest to estimate the accessibility and use the ratio of the cut
Method with Background counts C iτ and the counts of uncut fragments U iτ . We choose to
Correction only consider different PCR biases and sequencing biases between
cut and uncut fragments, giving all cut fragments the sequencing
probability p and all uncut fragments the sequencing probability q.
Summing up cut counts and uncut counts, we set.
 
C i ≔Ci1 þ Ci2 þ Ci3 þ Ci4 and Ui ≔2 Ui1 þ Ui:2
for sites without any neighbor within 300 bp and
C i ≔Ci1 þ Ci2 or Ci ≔Ci3 þ Ci4 and Ui ≔Ui1 þ Ui2
for sites with one upstream/downstream neighbor within 300 bp,
respectively. Then define the ratio of cut and uncut fragments,
Ci
κi ≔
b
Ui
If the denominator is 0, we set b κi = 1, which will lead to an
accessibility of 1.
Similar to the previous subheading, we have E[Ci] = 4NCp(-
α + (1 - αi)s1(w + 1)) with s1 being the shearing probability per
i

base pair, but now calculated only using the cut sample, i.e., the
ratio of all cut counts away from sites and the sum of cut and uncut
fragment counts away from cut sites. For Ui, we assume that the
uncut fragment counts are given by fragments that have not been
cut by the RE at x iτ and after that also not been cut by shearing at x iτ.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . . 151

The generally very low sequencing probabilities justify the assump-


tion that C iτ and U iτ are “independent enough” to make the
following approximation:
h i E C i    
4N C p αi þ 1 - αi s 1 ðw þ 1Þ

i
E bκ ≈  i =
E U 4N C q ð1 - ðαi þ ð1 - αi Þs 1 ÞÞ

The ratio of sequencing probabilities of cut and uncut frag-


ments, the “uncut correction factor” γ≔ pq, is fitted to the calibration
samples as described in the subheading below.
We then obtain the following estimator for αi: b αi ≔1 - i 1þσ ,
thus bκγ þ1 - σw

C i - σ ðw þ 1ÞU i γ Ci
αi =
b = i eff i
C - σ ðw þ 1ÞU γ þ ð1 þ σ ÞU γ C eff þ U eff
i i i

hc1τ ðzÞiz∈Z,τ
with σ≔ 1 -s1 s1 = 1
being the corrected ratio of all cut
hu1τ ðzÞiz∈Z,τ
γ

counts away from all cut sites and all uncut fragment counts away
from all cut sites.
C ieff = C i - σ ðw þ 1ÞU i γ and
U ieff = ð1 þ σ ÞU i γ
are the effective counts of cut and uncut fragments, respectively,
both corrected for cuts in the shearing step and different sequenc-
ing probabilities of cut and uncut fragments. C ieff þ U ieff gives an
“effective coverage” of cut and uncut fragments at the site i and we
ignore sites with an effective coverage below 40. This limit may be
adapted for different applications.
D EFinally, the genome-wide aver-
age accessibility is given by b αi
α= b .
i∈I
Fit of γ using prepared calibration samples for RE digests:
For each RE (AluI, BamHI, and HindIII) and each calibration
sample s with 0%, 10%, 30%, 50%, 70%, 90%, and 100% prepared
fraction of uncut DNA molecules, i.e., prepared occupancy ωs = 1 -
αs, we calculate the measured genome-wide average occupancy
b s ðγ Þ = 1 -Db
ω αs ðγ Þ for varying
E γ. We then choose γ for each RE
such that b s ðγ ÞÞ2
ðωs - ω is minimized. Additionally, we did a
s
combined fit over all calibration samples of the three REs to use
for REs, for which no specific calibration samples were measured.
The following table shows the best values for γ:

RE AluI BamHI HindIII Combined


γ min 1.282 1.279 1.333 1.300
152 Elisa Oberbeckmann et al.

References
1. Wal M, Pugh BF (2012) Genome-wide resolution, ultrasensitive and quantitative
mapping of nucleosome positions in yeast DNA double-strand break labeling in eukary-
using high-resolution MNase ChIP-Seq. otic cells using i-BLESS. Nat Protoc 16(2):
Methods Enzymol 513:233–250. https://doi. 1034–1061. https://doi.org/10.1038/
org/10.1016/b978-0-12-391938-0.00010-0 s41596-020-00448-3
2. Buenrostro JD, Wu B, Chang HY, Greenleaf 10. Martinez-Campa C, Kent NA, Mellor J (1997)
WJ (2015) ATAC-seq: a method for assaying Rapid isolation of yeast plasmids as native chro-
chromatin accessibility genome-wide. Curr matin. Nucleic Acids Res 25(9):1872–1873
Protoc Mol Biol 109:21.29.21–21.29.29. 11. Aris JP, Blobel G (1991) Isolation of yeast
https://doi.org/10.1002/0471142727. nuclei. Methods Enzymol 194:735–749.
mb2129s109 https://doi.org/10.1016/0076-6879(91)
3. Buenrostro JD, Wu B, Litzenburger UM, 94056-i
Ruff D, Gonzales ML, Snyder MP, Chang 12. Kizer KO, Xiao T, Strahl BD (2006) Acceler-
HY, Greenleaf WJ (2015) Single-cell chroma- ated nuclei preparation and methods for analy-
tin accessibility reveals principles of regulatory sis of histone modifications in yeast. Methods
variation. Nature 523(7561):486–490. (San Diego, Calif) 40(4):296–302. https://
https://doi.org/10.1038/nature14590 doi.org/10.1016/j.ymeth.2006.06.022
4. Kim YC, Grable JC, Love R, Greene PJ, Rosen- 13. Reese JC, Zhang H, Zhang Z (2008) Isolation
berg JM (1990) Refinement of Eco RI endo- of highly purified yeast nuclei for nuclease
nuclease crystal structure: a revised protein mapping of chromatin structure. Methods
chain tracing. Science (New York, NY) Mol Biol (Clifton, NJ) 463:43–53. https://
249(4974):1307–1309. https://doi.org/10. doi.org/10.1007/978-1-59745-406-3_3
1126/science.2399465 14. Zhang Z, Reese JC (2006) Isolation of yeast
5. Gregory PD, Barbaric S, Horz W (1999) nuclei and micrococcal nuclease mapping of
Restriction nucleases as probes for chromatin nucleosome positioning. Methods Mol Biol
structure. Methods Mol Biol (Clifton, NJ) (Clifton, NJ) 313:245–255. https://doi.org/
119:417–425. https://doi.org/10.1385/1- 10.1385/1-59259-958-3:245
59259-681-9:417 15. Kiseleva E, Allen TD, Rutherford SA,
6. Oberbeckmann E, Wolff M, Krietenstein N, Murray S, Morozova K, Gardiner F, Goldberg
Heron M, Ellins JL, Schmid A, Krebs S, MW, Drummond SP (2007) A protocol for
Blum H, Gerland U, Korber P (2019) Abso- isolation and visualization of yeast nuclei by
lute nucleosome occupancy map for the Sac- scanning electron microscopy (SEM). Nat Pro-
charomyces cerevisiae genome. Genome Res toc 2(8):1943–1953. https://doi.org/10.
29(12):1996–2009. https://doi.org/10. 1038/nprot.2007.251
1101/gr.253419.119 16. Schmid A, Fascher KD, Horz W (1992) Nucle-
7. Ohtsubo Y, Sakai K, Nagata Y, Tsuda M osome disruption at the yeast PHO5 promoter
(2019) Properties and efficient scrap-and- upon PHO5 induction occurs in the absence of
build repairing of mechanically sheared 3’ DNA replication. Cell 71(5):853–864
DNA ends. Commun Biol 2:409. https://doi. 17. Wolff MR (2020) Nucleosome occupancy and
org/10.1038/s42003-019-0660-7 dynamics in yeast: genome-wide and
8. Chereji RV, Eriksson PR, Ocampo J, Prajapati promoter-level analyses and modeling. PhD,
HK, Clark DJ (2019) Accessibility of promoter LMU München, München
DNA is not the primary determinant of 18. Chereji RV, Ramachandran S, Bryson TD,
chromatin-mediated gene regulation. Genome Henikoff S (2018) Precise genome-wide
Res 29(12):1985–1995. https://doi.org/10. mapping of single nucleosomes and linkers
1101/gr.249326.119 in vivo. Genome Biol 19(1):19. https://doi.
9. Biernacka A, Skrzypczak M, Zhu Y, Pasero P, org/10.1186/s13059-018-1398-0
Rowicka M, Ginalski K (2021) High-
Part III

Methods for Profiling Chromatin Accessibility


at the Single-Cell Level
Chapter 10

Single-Cell Joint Profiling of Open Chromatin


and Transcriptome by Paired-Seq
Chenxu Zhu, Zhaoning Wang, and Bing Ren

Abstract
Simultaneous detection of chromatin accessibility and transcription from the same cells promises to greatly
facilitate the dissection of cell-type-specific gene regulatory programs in complex tissues. Paired-seq enables
joint analysis of open chromatin and nuclear transcriptome from up to a million cells in parallel. It achieves
ultra-high-throughput single-cell multiomics with the use of a combinatorial barcoding strategy involving
sequential ligation of multiplexed DNA barcodes to chromatin DNA fragments and reverse transcription
products, followed by high-throughput DNA sequencing of the resulting DNA libraries and deconvolution
of single-cell multiomic maps based on cell-specific barcodes.

Key words Paired-seq, Single-cell multiomics, Chromatin accessibility, Gene expression, Epigenome

1 Introduction

Cis-regulation elements (CREs) play a fundamental role in gene


regulation. In eukaryotic cells, binding of transcription regulators
to CREs leads to depletion of nucleosome and hypersensitivity to
nucleases (such as DNase I or micrococcal nucleases) and Tn5
transposases [1–3]. Methods exploring the hypersensitivity of
active CREs have been developed to map these sequences in the
genome, including DNase I hypersensitive sites sequencing
(DNase-seq) [4], micrococcal nuclease digestion with deep
sequencing (MNase-seq) [5], formaldehyde-assisted isolation of
regulatory elements sequencing (FAIRE-seq) [6], and assay for
transposase-accessible chromatin using sequencing (ATAC-seq)
[7, 8]. The advancement of single-cell chromatin accessibility assays
using droplet-based or combinatorial barcoding strategies [9–14]
has enabled deconvolution of cell-type-specific transcriptional pro-
grams from mixed cell populations and primary tissues [15]. How-
ever, measuring individual molecular modalities one at a time in
single cells does not permit a full view of the gene regulatory

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_10,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

155
156 Chenxu Zhu et al.

process in complex tissues and pathogenesis [16, 17]. Co-assay of


gene expression together with DNA methylation [18], histone
modification [19], chromatin accessibility [20], or high-order
chromatin conformation [21] can lead to a better understanding
of cell-type-specific gene regulatory programs and enable a better
assessment of the role of epigenome in transcriptional regulation of
each gene. Several methods have now been reported to enable joint
analysis of nuclear transcriptome and accessible chromatin in individ-
ually isolated cells [22], or thousands of single cells with plate-based
combinatorial indexing [20] and droplet-based barcoding [23].
Paired-seq is a scalable single-cell technology that can assay
gene expression and chromatin accessibility for up to a million
single cells in parallel [24] with the use of a ligation-based combi-
natorial indexing strategy [25]. It begins with fragmentation of
open chromatin by the Tn5 transposases followed by reverse tran-
scription of nuclear mRNA by reverse transcriptase. DNA barcodes
are then subsequently ligated in situ to the chromatin fragments
and reverse transcription products (cDNA) in each nucleus through
a split-and-polling scheme in 96-well plates. Following nuclei lysis,
the chromatin DNA and cDNA are amplified, and then split into
two separate libraries corresponding to each molecular modality for
next-generation DNA sequencing (Fig. 1). The entire Paired-seq
procedure, not including DNA sequencing, spans 2 days (see Note
1). With a reasonable sequencing depth (number of sequenced
reads per nuclei: 25,000 for DNA and 50,000 for RNA), Paired-
seq can generate single-cell multiomics profiles with ~5000 unique
tagmentation loci and ~ 10,000 unique transcripts per nucleus.

a Day 1 Day 2 Pause point


1 hr 1 hr 1.5 hr 4.5 hr 4 hr DNA: 2 hr / RNA: 1.5 hr 1.5 hr

Nuclei Reverse Nuclei Pre Library Library


Tagmentation
preparation transcription barcoding amplification dedicating amplification
DNA Library
CCC
GGG
SbfI
TTT CCC TTT
GGG TTT
TTT
TTT T
TT

RNA Library
TTT
TTT
NotI TTT
BC #1 #2 BC #1 #2 #3 TTT
DNA: DNA: TTT
TTT
AAA AAA TTT TTT
TTT
RNA: TTT RNA: TTT
TTT TTT
TTT
AAA AAA
TTT TTT

b cDNA Tn5-Adaptor1
BC#4#3 #2 #1 Read2 Primer Read1 Primer
5’ TTT 3’ TTT GGG 5’ TTT CCC
TTT GGG 5’ TTT GGG TTT
Cellular barcodes TTT GGG 5’ TTT
Tagmentation
5’ 3’ CCC
GGG
3’ 5’ GGG 5’ 100-500 bp
BC#1 GGG 5’
DNA GGG 5’
FokI FokI
TdT tailing cutting site recognition site
Pre- NotI
amplification
5’ TTT CCCCCCCCCC >1.5 kb
5’ CCCCCCCCC N4
TTT CCC
TTT GGG
Linear amplification TTT
CCC TTT
GGG
5’ TTT CCCCCCCCCC
CCC Read1 Adaptor Read2 Primer Read1 Primer
GGG TTT CCC GGG
NNNN
5’ 5’
TTT GGG SbfI
CCCCCCCCC
GGG
CCC FokI
GGG C
CC G
5’ GG Ligation
100-500 bp
100-500 bp

Fig. 1 Overview of Paired-seq protocol. (a) Paired-seq protocol can be finished in 2 days, pause points are
indicated. (b) Schematics for library preparation strategy of Paired-seq. Both DNA fragments from Tn5
tagmentation and cDNA were pre-amplified with a TdT-based strategy and then split into two portions. For
DNA library, the 2nd adaptor was added by ligation; for RNA library, the 2nd adaptor was added by Tn5
tagmentation
Single-Cell Co-Assay of Open Chromatin and RNA 157

2 Materials

2.1 Reagents 1. Tn5 protein were purified according to ref. [26] and Paired-seq
Preparation primers (Table 1 and see Note 2).
2. RT primers (Table 2).
3. Tn5 barcodes (Table 2).
4. Barcode oligos (Tables 3 and 4).
5. Tris–HCl, pH 7.5 (Invitrogen, Cat# 15567027).
6. NaCl (Sigma, Cat# S7653).
7. Glycerol (Sigma, Cat# G5516).
8. DTT (Sigma, Cat# D9779).
9. 200 μL thin-wall PCR tubes (USA Scientific, Cat# 1402-
3900).
10. 1.5 mL low-bind tubes (Eppendorf, Cat# 022431021).
11. 15 mL tubes (Corning Costar, Cat# 430790).
12. 96-well low-bind PCR plate (Eppendorf, Cat# 0030129512).
13. Sterile Reagent reservoir (Corning Costar, Cat# 07200127).
14. Thermocycler (Bio-Rad, T100).

2.2 Nuclei Isolation 1. Douncing buffer (DB) (1.5 mL per sample).

Stock Final
Reagents concentration Volume concentration
Sucrose (Sigma, Cat# S7903) 1M 0.375 mL 250 mM
KCl (Sigma, Cat# P9333) 2M 18.8 μL 25 mM
MgCl2 (Sigma, Cat# 63069) 1M 7.5 μL 5 mM
Tris–HCl, pH 7.5 (Invitrogen, 1 M 15 μL 10 mM
Cat# 15567027)
DTT 1M 1.5 μL 1 mM
Protease Inhibitor (Sigma, 50X 30 μL 1X
Cat# 04693159001)
SUPERase IN (Invitrogen, 20 U/μL 37.5 μL 0.5 U/μL
Cat# AM2696)
RNase OUT (Invitrogen, Cat# 40 U/μL 18.8 μL 0.5 U/μL
10777019)
H 2O NA 996 μL NA
158

Table 1
Primer sequences

Name Sequence (5′-3′)


pMENTs 5Phos/CTGTCTCTTATACACATCTddC
AdaptorA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
Chenxu Zhu et al.

Linker-R02 CGAATGCTCTGGCCTCTCAAGCACGTGGAT
Blocker-R02 ATCCACGTGCTTGAGAGGCCAGAGCATTCG
Linker-R03 GGTCTGAGTTCGCACCGAAACATCGGCCAC
Quencher-R03 GTGGCCGATGTTTCGGTGCGAACTCAGACC
Anchor-FokI-GH AAGCAGTGGTATCAACGCAGAGTGAAGGATGTGGGGGGGGG*H
P5-FokI ACACTCTTTCCCTACACGACGCTCTTCCGATCT
P5c-NNDC-FokI 5Phos/NNDCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG
P5H-FokI ACACTCTTTCCCTACACGACGCTCTTCCGATCTH
P5Hc-NNDC-FokI 5Phos/NNDCDAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG
PA-F CAGACGTGTGCTCTTCCGATCT
PA-R AAGCAGTGGTATCAACGCAGAGT
N5XX AATGATACGGCGACCACCGAGATCTACACXXXXXXXXTCGTCGGCAGCGTC
P7XX CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCAGACGTGTGCTCTTCCGA
TC
P5 Universal AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T
* denotes Phosphorothioate Bonds modification
N denotes random bases
X denotes Illumina Index sequences
Table 2
Barcode plate #01

Well position Name Sequence


A1 DNA_#01_RE /5Phos/AGGCCAGAGCATTCGACATCGCGGCCGCAGATGTGTATAAGAGACAG
A2 DNA_#02_RE /5Phos/AGGCCAGAGCATTCGAATGAGCGGCCGCAGATGTGTATAAGAGACAG
A3 DNA_#03_RE /5Phos/AGGCCAGAGCATTCGAAGCTGCGGCCGCAGATGTGTATAAGAGACAG
A4 DNA_#04_RE /5Phos/AGGCCAGAGCATTCGAACAGGCGGCCGCAGATGTGTATAAGAGACAG
A5 DNA_#05_RE /5Phos/AGGCCAGAGCATTCGAGAATGCGGCCGCAGATGTGTATAAGAGACAG
A6 DNA_#06_RE /5Phos/AGGCCAGAGCATTCGATACGGCGGCCGCAGATGTGTATAAGAGACAG
A7 DNA_#07_RE /5Phos/AGGCCAGAGCATTCGATTACGCGGCCGCAGATGTGTATAAGAGACAG
A8 DNA_#08_RE /5Phos/AGGCCAGAGCATTCGAGTTGGCGGCCGCAGATGTGTATAAGAGACAG
A9 DNA_#09_RE /5Phos/AGGCCAGAGCATTCGACCGTGCGGCCGCAGATGTGTATAAGAGACAG
A10 DNA_#10_RE /5Phos/AGGCCAGAGCATTCGACGAAGCGGCCGCAGATGTGTATAAGAGACAG
A11 DNA_#11_RE /5Phos/AGGCCAGAGCATTCGATCTAGCGGCCGCAGATGTGTATAAGAGACAG
A12 DNA_#12_RE /5Phos/AGGCCAGAGCATTCGAGGGCGCGGCCGCAGATGTGTATAAGAGACAG
B1 RNA_#01_RE /5Phos/AGGCCAGAGCATTCGTCATCCCTGCAGGTTTTTTTTTTTTTTTTVN
B2 RNA_#02_RE /5Phos/AGGCCAGAGCATTCGTATGACCTGCAGGTTTTTTTTTTTTTTTTVN
B3 RNA_#03_RE /5Phos/AGGCCAGAGCATTCGTAGCTCCTGCAGGTTTTTTTTTTTTTTTTVN
B4 RNA_#04_RE /5Phos/AGGCCAGAGCATTCGTACAGCCTGCAGGTTTTTTTTTTTTTTTTVN
B5 RNA_#05_RE /5Phos/AGGCCAGAGCATTCGTGAATCCTGCAGGTTTTTTTTTTTTTTTTVN
B6 RNA_#06_RE /5Phos/AGGCCAGAGCATTCGTTACGCCTGCAGGTTTTTTTTTTTTTTTTVN
Single-Cell Co-Assay of Open Chromatin and RNA

B7 RNA_#07_RE /5Phos/AGGCCAGAGCATTCGTTTACCCTGCAGGTTTTTTTTTTTTTTTTVN
B8 RNA_#08_RE /5Phos/AGGCCAGAGCATTCGTGTTGCCTGCAGGTTTTTTTTTTTTTTTTVN
159

(continued)
160

Table 2
(continued)

Well position Name Sequence


B9 RNA_#09_RE /5Phos/AGGCCAGAGCATTCGTCCGTCCTGCAGGTTTTTTTTTTTTTTTTVN
Chenxu Zhu et al.

B10 RNA_#10_RE /5Phos/AGGCCAGAGCATTCGTCGAACCTGCAGGTTTTTTTTTTTTTTTTVN


B11 RNA_#11_RE /5Phos/AGGCCAGAGCATTCGTTCTACCTGCAGGTTTTTTTTTTTTTTTTVN
B12 RNA_#12_RE /5Phos/AGGCCAGAGCATTCGTGGGCCCTGCAGGTTTTTTTTTTTTTTTTVN
C1 RNA_#01_NRE /5Phos/AGGCCAGAGCATTCGTCATCCCTGCAGGNNNNNN
C2 RNA_#02_NRE /5Phos/AGGCCAGAGCATTCGTATGACCTGCAGGNNNNNN
C3 RNA_#03_NRE /5Phos/AGGCCAGAGCATTCGTAGCTCCTGCAGGNNNNNN
C4 RNA_#04_NRE /5Phos/AGGCCAGAGCATTCGTACAGCCTGCAGGNNNNNN
C5 RNA_#05_NRE /5Phos/AGGCCAGAGCATTCGTGAATCCTGCAGGNNNNNN
C6 RNA_#06_NRE /5Phos/AGGCCAGAGCATTCGTTACGCCTGCAGGNNNNNN
C7 RNA_#07_NRE /5Phos/AGGCCAGAGCATTCGTTTACCCTGCAGGNNNNNN
C8 RNA_#08_NRE /5Phos/AGGCCAGAGCATTCGTGTTGCCTGCAGGNNNNNN
C9 RNA_#09_NRE /5Phos/AGGCCAGAGCATTCGTCCGTCCTGCAGGNNNNNN
C10 RNA_#10_NRE /5Phos/AGGCCAGAGCATTCGTCGAACCTGCAGGNNNNNN
C11 RNA_#11_NRE /5Phos/AGGCCAGAGCATTCGTTCTACCTGCAGGNNNNNN
C12 RNA_#12_NRE /5Phos/AGGCCAGAGCATTCGTGGGCCCTGCAGGNNNNNN
Single-Cell Co-Assay of Open Chromatin and RNA 161

Table 3
Barcode plate #02

Well position Name Sequence


A1 R02_#01 /5Phos/GTGCGAACTCAGACCAAACCGGATCCACGTGCTTGAG
A2 R02_#02 /5Phos/GTGCGAACTCAGACCAAACGTCATCCACGTGCTTGAG
A3 R02_#03 /5Phos/GTGCGAACTCAGACCAAAGATGATCCACGTGCTTGAG
A4 R02_#04 /5Phos/GTGCGAACTCAGACCAAATCCAATCCACGTGCTTGAG
A5 R02_#05 /5Phos/GTGCGAACTCAGACCAAATGAGATCCACGTGCTTGAG
A6 R02_#06 /5Phos/GTGCGAACTCAGACCAACACTGATCCACGTGCTTGAG
A7 R02_#07 /5Phos/GTGCGAACTCAGACCAACGTTTATCCACGTGCTTGAG
A8 R02_#08 /5Phos/GTGCGAACTCAGACCAAGAAGCATCCACGTGCTTGAG
A9 R02_#09 /5Phos/GTGCGAACTCAGACCAAGCCCTATCCACGTGCTTGAG
A10 R02_#10 /5Phos/GTGCGAACTCAGACCAAGCTACATCCACGTGCTTGAG
A11 R02_#11 /5Phos/GTGCGAACTCAGACCAATCTTGATCCACGTGCTTGAG
A12 R02_#12 /5Phos/GTGCGAACTCAGACCACAACACATCCACGTGCTTGAG
B1 R02_#13 /5Phos/GTGCGAACTCAGACCACAGTATATCCACGTGCTTGAG
B2 R02_#14 /5Phos/GTGCGAACTCAGACCACCAAGTATCCACGTGCTTGAG
B3 R02_#15 /5Phos/GTGCGAACTCAGACCACCCTAAATCCACGTGCTTGAG
B4 R02_#16 /5Phos/GTGCGAACTCAGACCACCCTTTATCCACGTGCTTGAG
B5 R02_#17 /5Phos/GTGCGAACTCAGACCACCTCTCATCCACGTGCTTGAG
B6 R02_#18 /5Phos/GTGCGAACTCAGACCACGATTGATCCACGTGCTTGAG
B7 R02_#19 /5Phos/GTGCGAACTCAGACCACGCAGAATCCACGTGCTTGAG
B8 R02_#20 /5Phos/GTGCGAACTCAGACCACGTAAAATCCACGTGCTTGAG
B9 R02_#21 /5Phos/GTGCGAACTCAGACCACTACCTATCCACGTGCTTGAG
B10 R02_#22 /5Phos/GTGCGAACTCAGACCACTCGGTATCCACGTGCTTGAG
B11 R02_#23 /5Phos/GTGCGAACTCAGACCACTGTCGATCCACGTGCTTGAG
B12 R02_#24 /5Phos/GTGCGAACTCAGACCACTTATGATCCACGTGCTTGAG
C1 R02_#25 /5Phos/GTGCGAACTCAGACCAGAAAGGATCCACGTGCTTGAG
C2 R02_#26 /5Phos/GTGCGAACTCAGACCAGAATCTATCCACGTGCTTGAG
C3 R02_#27 /5Phos/GTGCGAACTCAGACCAGACATAATCCACGTGCTTGAG
C4 R02_#28 /5Phos/GTGCGAACTCAGACCAGAGACCATCCACGTGCTTGAG
C5 R02_#29 /5Phos/GTGCGAACTCAGACCAGCCCAAATCCACGTGCTTGAG
C6 R02_#30 /5Phos/GTGCGAACTCAGACCAGCTATTATCCACGTGCTTGAG
C7 R02_#31 /5Phos/GTGCGAACTCAGACCAGGAGGTATCCACGTGCTTGAG
C8 R02_#32 /5Phos/GTGCGAACTCAGACCAGGGCTTATCCACGTGCTTGAG

(continued)
162 Chenxu Zhu et al.

Table 3
(continued)

Well position Name Sequence


C9 R02_#33 /5Phos/GTGCGAACTCAGACCAGGTGTAATCCACGTGCTTGAG
C10 R02_#34 /5Phos/GTGCGAACTCAGACCAGTGCTCATCCACGTGCTTGAG
C11 R02_#35 /5Phos/GTGCGAACTCAGACCAGTGGGAATCCACGTGCTTGAG
C12 R02_#36 /5Phos/GTGCGAACTCAGACCAGTTACGATCCACGTGCTTGAG
D1 R02_#37 /5Phos/GTGCGAACTCAGACCATAAGGGATCCACGTGCTTGAG
D2 R02_#38 /5Phos/GTGCGAACTCAGACCATCATTCATCCACGTGCTTGAG
D3 R02_#39 /5Phos/GTGCGAACTCAGACCATGGAACATCCACGTGCTTGAG
D4 R02_#40 /5Phos/GTGCGAACTCAGACCATGTGCCATCCACGTGCTTGAG
D5 R02_#41 /5Phos/GTGCGAACTCAGACCATTCACCATCCACGTGCTTGAG
D6 R02_#42 /5Phos/GTGCGAACTCAGACCATTCGAGATCCACGTGCTTGAG
D7 R02_#43 /5Phos/GTGCGAACTCAGACCCAAGCCTATCCACGTGCTTGAG
D8 R02_#44 /5Phos/GTGCGAACTCAGACCCACAAGGATCCACGTGCTTGAG
D9 R02_#45 /5Phos/GTGCGAACTCAGACCCACCTTAATCCACGTGCTTGAG
D10 R02_#46 /5Phos/GTGCGAACTCAGACCCAGAGTGATCCACGTGCTTGAG
D11 R02_#47 /5Phos/GTGCGAACTCAGACCCAGCGAAATCCACGTGCTTGAG
D12 R02_#48 /5Phos/GTGCGAACTCAGACCCAGGTCAATCCACGTGCTTGAG
E1 R02_#49 /5Phos/GTGCGAACTCAGACCCATAACTATCCACGTGCTTGAG
E2 R02_#50 /5Phos/GTGCGAACTCAGACCCATATCGATCCACGTGCTTGAG
E3 R02_#51 /5Phos/GTGCGAACTCAGACCCATCGATATCCACGTGCTTGAG
E4 R02_#52 /5Phos/GTGCGAACTCAGACCCATTACAATCCACGTGCTTGAG
E5 R02_#53 /5Phos/GTGCGAACTCAGACCCATTTCCATCCACGTGCTTGAG
E6 R02_#54 /5Phos/GTGCGAACTCAGACCCCAAATGATCCACGTGCTTGAG
E7 R02_#55 /5Phos/GTGCGAACTCAGACCCCACTTGATCCACGTGCTTGAG
E8 R02_#56 /5Phos/GTGCGAACTCAGACCCCGGATAATCCACGTGCTTGAG
E9 R02_#57 /5Phos/GTGCGAACTCAGACCCCGGTTTATCCACGTGCTTGAG
E10 R02_#58 /5Phos/GTGCGAACTCAGACCCCTAAGAATCCACGTGCTTGAG
E11 R02_#59 /5Phos/GTGCGAACTCAGACCCCTAGTCATCCACGTGCTTGAG
E12 R02_#60 /5Phos/GTGCGAACTCAGACCCCTGCAAATCCACGTGCTTGAG
F1 R02_#61 /5Phos/GTGCGAACTCAGACCCGACGTTATCCACGTGCTTGAG
F2 R02_#62 /5Phos/GTGCGAACTCAGACCCGAGTAAATCCACGTGCTTGAG
F3 R02_#63 /5Phos/GTGCGAACTCAGACCCGATTATATCCACGTGCTTGAG
F4 R02_#64 /5Phos/GTGCGAACTCAGACCCGTAGCAATCCACGTGCTTGAG

(continued)
Single-Cell Co-Assay of Open Chromatin and RNA 163

Table 3
(continued)

Well position Name Sequence


F5 R02_#65 /5Phos/GTGCGAACTCAGACCCGTCTGAATCCACGTGCTTGAG
F6 R02_#66 /5Phos/GTGCGAACTCAGACCCTACAGCATCCACGTGCTTGAG
F7 R02_#67 /5Phos/GTGCGAACTCAGACCCTCAATAATCCACGTGCTTGAG
F8 R02_#68 /5Phos/GTGCGAACTCAGACCCTCGTTGATCCACGTGCTTGAG
F9 R02_#69 /5Phos/GTGCGAACTCAGACCCTCTACGATCCACGTGCTTGAG
F10 R02_#70 /5Phos/GTGCGAACTCAGACCCTTGGGTATCCACGTGCTTGAG
F11 R02_#71 /5Phos/GTGCGAACTCAGACCGAAACTCATCCACGTGCTTGAG
F12 R02_#72 /5Phos/GTGCGAACTCAGACCGACTGTCATCCACGTGCTTGAG
G1 R02_#73 /5Phos/GTGCGAACTCAGACCGATACAGATCCACGTGCTTGAG
G2 R02_#74 /5Phos/GTGCGAACTCAGACCGCGATCAATCCACGTGCTTGAG
G3 R02_#75 /5Phos/GTGCGAACTCAGACCGCGTACTATCCACGTGCTTGAG
G4 R02_#76 /5Phos/GTGCGAACTCAGACCGCTCGAAATCCACGTGCTTGAG
G5 R02_#77 /5Phos/GTGCGAACTCAGACCGGAAGAAATCCACGTGCTTGAG
G6 R02_#78 /5Phos/GTGCGAACTCAGACCGGAGATTATCCACGTGCTTGAG
G7 R02_#79 /5Phos/GTGCGAACTCAGACCGGGCTAAATCCACGTGCTTGAG
G8 R02_#80 /5Phos/GTGCGAACTCAGACCGGGTATGATCCACGTGCTTGAG
G9 R02_#81 /5Phos/GTGCGAACTCAGACCGGTAACCATCCACGTGCTTGAG
G10 R02_#82 /5Phos/GTGCGAACTCAGACCGGTAGTGATCCACGTGCTTGAG
G11 R02_#83 /5Phos/GTGCGAACTCAGACCGGTGAAAATCCACGTGCTTGAG
G12 R02_#84 /5Phos/GTGCGAACTCAGACCGTAATCGATCCACGTGCTTGAG
H1 R02_#85 /5Phos/GTGCGAACTCAGACCGTATAAGATCCACGTGCTTGAG
H2 R02_#86 /5Phos/GTGCGAACTCAGACCGTCAGACATCCACGTGCTTGAG
H3 R02_#87 /5Phos/GTGCGAACTCAGACCGTCCCTTATCCACGTGCTTGAG
H4 R02_#88 /5Phos/GTGCGAACTCAGACCGTGCCATATCCACGTGCTTGAG
H5 R02_#89 /5Phos/GTGCGAACTCAGACCGTGGTCTATCCACGTGCTTGAG
H6 R02_#90 /5Phos/GTGCGAACTCAGACCGTTCTCCATCCACGTGCTTGAG
H7 R02_#91 /5Phos/GTGCGAACTCAGACCGTTGCTTATCCACGTGCTTGAG
H8 R02_#92 /5Phos/GTGCGAACTCAGACCTACCCGAATCCACGTGCTTGAG
H9 R02_#93 /5Phos/GTGCGAACTCAGACCTAGACGAATCCACGTGCTTGAG
H10 R02_#94 /5Phos/GTGCGAACTCAGACCTAGTCACATCCACGTGCTTGAG
H11 R02_#95 /5Phos/GTGCGAACTCAGACCTCACATCATCCACGTGCTTGAG
H12 R02_#96 /5Phos/GTGCGAACTCAGACCTCAGCTGATCCACGTGCTTGAG
164 Chenxu Zhu et al.

2. Nuclei isolation buffer (NIB) (1 mL per sample).

Stock Final
Reagents concentration Volume concentration
IGEPAL CA-630 (Sigma, 10% 20 μL 0.2%
Cat# I8896)
BSA in DPBS (Sigma, Cat# 10% 0.5 mL 5%
A1595)
Protease Inhibitor 50X 20 μL 1X
SUPERase IN 20 U/μL 25 μL 0.5 U/μL
RNase OUT 40 U/μL 12.5 μL 0.5 U/μL
DPBS (Gibco, Cat# 1X 422.5 μL NA
14190136)

3. 5% Triton X-100 (diluted from Sigma, Cat# T9284).


4. Dounce tissue grinder set (1.0 mL) (KIMBLE, Cat#
DWK885300-0001).
5. Celltrics filters (30 μm) (Sysmex, Cat# 04-0042-2316).
6. Axygen Maximum Recovery tube (Corning, Cat# MCT-150-
L-C).
7. TC20 Cell Counter (Bio-Rad).
8. 1.5 mL low-bind tubes (Eppendorf, Cat# 022431021).

2.3 Chromatin 1. 10 mM PitStop2 (Millipore, Cat# SML1169).


Tagmentation 2. 2X Tagmentation Buffer (10 mL, store at 4 °C).

Stock Final
Reagents concentration Volume concentration
Tris–Ac, pH 7.5 (Sigma, Cat# 1 M 660 μL 66 mM
93337)
KAc (Sigma, Cat# P5708) 3M 440 μL 132 mM
MgAc2 (Sigma, Cat# M2545) 1 M 200 μL 20 mM
DMF (Millipore, Cat# NA 3200 μL 32%
DX1730)
Ultrapure H2O 1X 5500 μL NA
Single-Cell Co-Assay of Open Chromatin and RNA 165

3. Tagmentation Mix.

Reagents Stock concentration Volume


2X Tagmentation Buffer 2X 66 μL
RNase OUT 40 U/μL 3.3 μL
SUPERase IN 20 U/μL 6.6 μL
Proteinase Inhibitor cocktail 50X 2.7 μL
PitStop2 (Sigma, Cat# SML1169) 10 mM 1 μL
Ultrapure H2O NA 36.6 μL

4. 40 mM EDTA (diluted from Invitrogen, Cat# AM9261).


5. Loaded Tn5 (see step 3 of Subheading 3.1).
6. ThermoMixer (Eppendorf ThermoMixer R).

2.4 Reverse 1. NEBuffer 3.1 (NEB, Cat# B7203S).


Transcription 2. Maxima H minus reverse transcriptase (Invitrogen, Cat#
EP0751).
3. 5% Triton X-100 (diluted from Sigma, Cat# T9284).
4. RT Mix.

Stock
Reagents concentration Volume
5X RT Buffer (with Maxima H minus reverse 5X 52.8 μL
transcriptase)
PBS 1X 52.8 μL
dNTP 10 mM 13.2 μL
RNase OUT 40 U/μL 1.65 μL
SUPERase IN 20 U/μL 3.3 μL
Ultrapure H2O NA 61 μL

5. Thermocycler (Bio-Rad, Cat# T100).

2.5 Adding DNA 1. Ligation Mix.


Barcodes

Stock
Reagents concentration Volume
T4 DNA Ligase Buffer (NEB, Cat# 10X 500 μL
B0202S)

(continued)
166 Chenxu Zhu et al.

Stock
Reagents concentration Volume
BSA (NEB, Cat# B9000S) 20 mg/mL 50 μL
NEBuffer 3.1 (NEB, Cat# B7203S) 10X 100 μL
Ultrapure H2O NA 2250 μL

2. R02 Blocking Solution (see step 6 of Subheading 3.1).


3. R03 Termination Solution (see step 6 of Subheading 3.1).
4. T4 DNA Ligase (NEB, Cat# M0202L).
5. R02 Barcoding Working Plate (see step 4 of Subheading 3.1).
6. R03 Barcoding Working Plate (see step 4 of Subheading 3.1).
7. Proteinase K (NEB, Cat# P8107S).
8. SPRI beads (Beckman Coulter, Cat# B23319).
9. 80% EtOH.
10. 200 μL thin-wall PCR tubes or 96-well PCR plate.
11. Eppendorf ThermoMixer.
12. PCR plate film (Bio-Rad, Microseal B, Cat# MSB1001).

2.6 Library Pre- 1. Terminal Transferase (NEB, Cat# M0315S).


amplification 2. 1 mM dCTP (NEB, Cat# N0446S).
3. Anchor Mix (15 μL per sample).

Reagents Stock concentration Volume


5X KAPA reaction buffer 5X 6 μL
dNTP 10 mM 0.6 μL
Anchor-FokI-GH (Table 1) 10 μM 0.6 μL
Ultrapure H2O NA 7.2 μL
KAPA HiFI HS (KAPA, Cat# KK2502) NA 0.6 μL

4. Preamp Mix (20 μL per sample).

Reagents Stock concentration Volume


5X KAPA reaction buffer 5X 4 μL
dNTP 10 mM 0.5 μL
PA-F (Table 1) 10 μM 2 μL

(continued)
Single-Cell Co-Assay of Open Chromatin and RNA 167

Reagents Stock concentration Volume


PA-R (Table 1) 10 μM 2 μL
Ultrapure H2O NA 11 μL
KAPA HiFI HS (KAPA, Cat# KK2502) NA 0.5 μL

5. Qubit dsDNA HS Assay Kit (Invitrogen, Cat# Q32854).


6. SPRI beads (Beckman).
7. Qubit (ThermoFisher Scientific, Cat# Q33239).
8. 200 μL thin-wall PCR tubes.
9. Thermocycler.

2.7 Library Splitting 1. FokI (NEB, Cat# R0109S).


2. NotI-HF (NEB, Cat# R3189).
3. SbfI-HF (NEB, Cat# R3642).
4. Adaptor Mix (see step 5 of Subheading 3.1).
5. Nextera XT DNA library preparation kit (Illumina, Cat#
FC-131-1024).
6. SPRI beads (Beckman).
7. 200 μL thin-wall PCR tubes.
8. Thermocycler.
9. Magnetic separation rack (Bel-Art, Cat# F19900-0003).

2.8 Library 1. Illumina TruSeq i7 index primers (NEB, Cat# E7600S).


Amplification 2. Illumina TruSeq i5 index primers (NEB, Cat# E7600S).
3. Illumina Nextera i5 index primers (Illumina, Cat#
FC-131-2001).
4. NEBNext 2X HiFi PCR master mix (NEB, Cat# M0541S).
5. KAPA qPCR quantification kit for Illumina (KAPA, Cat#
KK4923/4933/4943/4953/4973).
6. SPRI beads.
7. 200 μL thin-wall PCR tubes.
8. Thermocycler.
9. Magnetic separation rack.
10. Agilent Tapestation (Agilent 4200).

2.9 Sequencing and 1. Illumina Sequencer: HiSeq 2500/4000, NextSeq 550/2000,


Data Preprocessing and NovaSeq 6000 were tested compatible with Paired-seq
libraries.
Table 4
168

Barcode plate #02

Well position Name Sequence


A1 R04_#01 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAACCGGGTGGCCGATGTTTCG
A2 R04_#02 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAACGTCGTGGCCGATGTTTCG
A3 R04_#03 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAAGATGGTGGCCGATGTTTCG
Chenxu Zhu et al.

A4 R04_#04 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAATCCANGTGGCCGATGTTTCG
A5 R04_#05 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAATGAGNGTGGCCGATGTTTCG
A6 R04_#06 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAACACTGNGTGGCCGATGTTTCG
A7 R04_#07 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAACGTTTNNGTGGCCGATGTTTCG
A8 R04_#08 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAGAAGCNNGTGGCCGATGTTTCG
A9 R04_#09 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAGCCCTNNGTGGCCGATGTTTCG
A10 R04_#10 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAGCTACNNNGTGGCCGATGTTTCG
A11 R04_#11 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAATCTTGNNNGTGGCCGATGTTTCG
A12 R04_#12 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACAACACNNNGTGGCCGATGTTTCG
B1 R04_#13 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACAGTATGTGGCCGATGTTTCG
B2 R04_#14 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCAAGTGTGGCCGATGTTTCG
B3 R04_#15 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCCTAAGTGGCCGATGTTTCG
B4 R04_#16 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCCTTTNGTGGCCGATGTTTCG
B5 R04_#17 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCTCTCNGTGGCCGATGTTTCG
B6 R04_#18 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACGATTGNGTGGCCGATGTTTCG
B7 R04_#19 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACGCAGANNGTGGCCGATGTTTCG
B8 R04_#20 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACGTAAANNGTGGCCGATGTTTCG
B9 R04_#21 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTACCTNNGTGGCCGATGTTTCG
B10 R04_#22 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTCGGTNNNGTGGCCGATGTTTCG
B11 R04_#23 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTGTCGNNNGTGGCCGATGTTTCG
B12 R04_#24 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTTATGNNNGTGGCCGATGTTTCG
C1 R04_#25 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGAAAGGGTGGCCGATGTTTCG
C2 R04_#26 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGAATCTGTGGCCGATGTTTCG
C3 R04_#27 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGACATAGTGGCCGATGTTTCG
C4 R04_#28 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGAGACCNGTGGCCGATGTTTCG
C5 R04_#29 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGCCCAANGTGGCCGATGTTTCG
C6 R04_#30 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGCTATTNGTGGCCGATGTTTCG
C7 R04_#31 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGGAGGTNNGTGGCCGATGTTTCG
C8 R04_#32 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGGGCTTNNGTGGCCGATGTTTCG
C9 R04_#33 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGGTGTANNGTGGCCGATGTTTCG
C10 R04_#34 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGTGCTCNNNGTGGCCGATGTTTCG
C11 R04_#35 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGTGGGANNNGTGGCCGATGTTTCG
C12 R04_#36 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGTTACGNNNGTGGCCGATGTTTCG
D1 R04_#37 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATAAGGGGTGGCCGATGTTTCG
D2 R04_#38 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATCATTCGTGGCCGATGTTTCG
D3 R04_#39 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATGGAACGTGGCCGATGTTTCG
D4 R04_#40 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATGTGCCNGTGGCCGATGTTTCG
D5 R04_#41 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATTCACCNGTGGCCGATGTTTCG
Single-Cell Co-Assay of Open Chromatin and RNA

D6 R04_#42 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATTCGAGNGTGGCCGATGTTTCG
D7 R04_#43 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAAGCCTNNGTGGCCGATGTTTCG
169

(continued)
170

Table 4
(continued)

Well position Name Sequence


D8 R04_#44 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCACAAGGNNGTGGCCGATGTTTCG
D9 R04_#45 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCACCTTANNGTGGCCGATGTTTCG
D10 R04_#46 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAGAGTGNNNGTGGCCGATGTTTCG
Chenxu Zhu et al.

D11 R04_#47 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAGCGAANNNGTGGCCGATGTTTCG


D12 R04_#48 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAGGTCANNNGTGGCCGATGTTTCG
E1 R04_#49 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATAACTGTGGCCGATGTTTCG
E2 R04_#50 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATATCGGTGGCCGATGTTTCG
E3 R04_#51 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATCGATGTGGCCGATGTTTCG
E4 R04_#52 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATTACANGTGGCCGATGTTTCG
E5 R04_#53 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATTTCCNGTGGCCGATGTTTCG
E6 R04_#54 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCAAATGNGTGGCCGATGTTTCG
E7 R04_#55 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCACTTGNNGTGGCCGATGTTTCG
E8 R04_#56 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCGGATANNGTGGCCGATGTTTCG
E9 R04_#57 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCGGTTTNNGTGGCCGATGTTTCG
E10 R04_#58 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCTAAGANNNGTGGCCGATGTTTCG
E11 R04_#59 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCTAGTCNNNGTGGCCGATGTTTCG
E12 R04_#60 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCTGCAANNNGTGGCCGATGTTTCG
F1 R04_#61 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGACGTTGTGGCCGATGTTTCG
F2 R04_#62 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGAGTAAGTGGCCGATGTTTCG
F3 R04_#63 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGATTATGTGGCCGATGTTTCG
F4 R04_#64 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGTAGCANGTGGCCGATGTTTCG
F5 R04_#65 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGTCTGANGTGGCCGATGTTTCG
F6 R04_#66 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTACAGCNGTGGCCGATGTTTCG
F7 R04_#67 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTCAATANNGTGGCCGATGTTTCG
F8 R04_#68 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTCGTTGNNGTGGCCGATGTTTCG
F9 R04_#69 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTCTACGNNGTGGCCGATGTTTCG
F10 R04_#70 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTTGGGTNNNGTGGCCGATGTTTCG
F11 R04_#71 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGAAACTCNNNGTGGCCGATGTTTCG
F12 R04_#72 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGACTGTCNNNGTGGCCGATGTTTCG
G1 R04_#73 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGATACAGGTGGCCGATGTTTCG
G2 R04_#74 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGCGATCAGTGGCCGATGTTTCG
G3 R04_#75 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGCGTACTGTGGCCGATGTTTCG
G4 R04_#76 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGCTCGAANGTGGCCGATGTTTCG
G5 R04_#77 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGAAGAANGTGGCCGATGTTTCG
G6 R04_#78 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGAGATTNGTGGCCGATGTTTCG
G7 R04_#79 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGGCTAANNGTGGCCGATGTTTCG
G8 R04_#80 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGGTATGNNGTGGCCGATGTTTCG
G9 R04_#81 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGTAACCNNGTGGCCGATGTTTCG
G10 R04_#82 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGTAGTGNNNGTGGCCGATGTTTCG
G11 R04_#83 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGTGAAANNNGTGGCCGATGTTTCG
G12 R04_#84 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTAATCGNNNGTGGCCGATGTTTCG
Single-Cell Co-Assay of Open Chromatin and RNA

H1 R04_#85 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTATAAGGTGGCCGATGTTTCG
H2 R04_#86 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTCAGACGTGGCCGATGTTTCG
171

(continued)
172

Table 4
Chenxu Zhu et al.

(continued)

Well position Name Sequence


H3 R04_#87 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTCCCTTGTGGCCGATGTTTCG
H4 R04_#88 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTGCCATNGTGGCCGATGTTTCG
H5 R04_#89 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTGGTCTNGTGGCCGATGTTTCG
H6 R04_#90 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTTCTCCNGTGGCCGATGTTTCG
H7 R04_#91 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTTGCTTNNGTGGCCGATGTTTCG
H8 R04_#92 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTACCCGANNGTGGCCGATGTTTCG
H9 R04_#93 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTAGACGANNGTGGCCGATGTTTCG
H10 R04_#94 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTAGTCACNNNGTGGCCGATGTTTCG
H11 R04_#95 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTCACATCNNNGTGGCCGATGTTTCG
H12 R04_#96 CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTCAGCTGNNNGTGGCCGATGTTTCG
Single-Cell Co-Assay of Open Chromatin and RNA 173

2. Computation resources: A server with 16 cores and 128 GB


RAM or above is recommended; storage space dependents on
the number of cells analyzed, typically 1 TB of storage is
needed for analysis of 100 k cells.

3 Methods

3.1 Reagents 1. All oligo DNA sequences in this subheading are listed in
Preparation Tables 1 and 2. To prepare the RT primer mix, mix 12.5 μL
of barcoded T15VN primer (RNA_#XX_RE, 100 μM),
12.5 μL barcoded N6 primer (RNA_#XX_NRE, 100 μM),
and 75 μL ultrapure nuclease-free water in PCR tubes. Vortex
to mix and store at -20 °C.
2. To prepare barcoded Tn5 adaptors, mix 10 μL barcoded Tn5
adaptor (DNA_#XX_RE, 100 μM) and 10 μL pMENTs
(100 μM) in PCR tubes. Using a thermocycler, heat the mix
at 95 °C for 5 min and slowly cool down to 20 °C (0.1 °C/s).
Store the annealed adaptors at -20 °C or immediately use for
step 3.
3. To prepare barcoded Tn5 complex, add 5 μL of barcoded
annealed Tn5 adaptors (from step 2) to 1.5 mL low-bind
tubes. Add 35 μL 0.5 mg/mL unloaded Tn5 protein to each
tube and pipette to mix 5 times. Then vortex to mix for 3–5 s
and spin down quickly. Incubate at room temperature for
30 min, then transfer to ice and sit for 5 min. Store at -20 °C.
4. To prepare R02 and R03 barcode plates, add 6 μL of R02 or
R03 barcoded oligo (BC Plate#02 or BC Plate#03, 100 μM),
5.5 μL of Linker-R02 or Linker-R03 oligo (100 μM), and
38.5 μL ultrapure nuclease-free water to each well of a
low-bind 96-well PCR plate and seal the plate (annealing
plate). Heat at 95 °C for 5 min and slowly cool down to 20 °
C (0.1 °C/s). Aliquot 10 μL of annealed barcoded oligos from
each well of annealing plate to four low-bind 96-well PCR
plates (working plates). Store the working plates at -20 °C.
5. To prepare Adaptor Mix: (a) prepare P5-complex (25 μL
100 μM P5-FokI and 25 μL 100 μM P5c-NNDC-FokI) and
P5H-complex (25 μL 100 μM P5H-FokI and 25 μL 100 μM
P5Hc-NNDC-FokI) in two different tubes; (b) in a thermo-
cycler, heat the mixtures for 5 min at 95 °C and slowly cool
down to 20 °C (-0.1 °C/s); (c) mix 15 μL of P5-complex with
45 μL of P5H-complex on ice and pipette to mix, then add
240 μL cold ultrapure water (to dilute from 50 to 10 μM) and
store at -20 °C.
6. To prepare R02 Blocking Solution, add 264 μL 100 μM
Blocker-R02, 250 μL 10X T4 DNA Ligase Buffer, and
174 Chenxu Zhu et al.

486 μL ultrapure water to a 1.5 mL tube and mix. To prepare


R03 Termination Solution, add 264 μL 100 μM Quencher-
R02, 500 μL 0.5 M EDTA, and 236 μL ultrapure water to a
1.5 mL tube and mix. Both R02 Blocking Solution and R03
Termination Solution should be kept on ice for later use.

3.2 Nuclei Isolation 1. Preparation of single-cell resuspension is required for nuclei


isolation, which has different preferred protocols [27]. Here
we take nuclei preparation from frozen mouse brain as an
example (see Note 3).
2. For each sample, prepare 1.5 mL douncing buffer (DB) and
1 mL nuclei isolation buffer (NIB) freshly each time before
performing the experiments. Prechill any tubes or tools. Set the
centrifuge to 4 °C (see Note 4).
3. Wash the douncer with 1 mL of ultrapure water. Prechill the
dounce and pestle (1 mL) on ice (avoid contamination by
placing them on a parafilm or in a tube).
4. Add 0.5 mL of DB into douncer, and then add 10 μL 5% Triton
X-100.
5. Transfer ~20–50 mg dissected frozen mouse brain tissue
directly to the douncer with DB.
6. Apply the loose pestle gently 5–10 times on ice, and avoid
introducing bubbles.
7. Apply the tight pestle gently 15–30 times on ice, and avoid
introducing bubbles.
8. Filter the single-cell suspension with a 30 μm Celltrics filter
into a 1.5 mL Axygen Maximum Recovery tube. Spin down at
1000 × g for 10 min at 4 °C and carefully discard the
supernatant.
9. Gently resuspend the cell pellet in 0.5 mL of NIB. Spin down
again at 1000 × g for 10 min at 4 °C, and discard the
supernatant.
10. Gently resuspend the cell pellet in 0.5 mL of NIB and incubate
on ice for 5–10 min. Take out 10 μL to measure the nuclei
concentration with the cell counter.

3.3 Chromatin 1. Freshly prepare the Tagmentation Mix and keep on ice.
Tagmentation 2. Label 12 tubes for tagmentation. Aliquot a total of
1200–2400 k nuclei into 12 tubes on ice, each tube with
100–200 k nuclei. Different samples or replicates can be multi-
plexed here, differed by their 1st round barcode (sample bar-
code) (see Note 5).
3. Spin down the 12 tubes at 1000 × g for 10 min at 4 °C, and
carefully discard the supernatant. Samples should be kept
on ice.
Single-Cell Co-Assay of Open Chromatin and RNA 175

4. For each tube, resuspend the nuclei pellet in 9 μL Tagmentation


Mix. Add 1 μL of barcoded Tn5 into the corresponding tube.
5. Incubate in a ThermoMixer set at 37 °C, 550 rpm for 30 min.
6. Immediately add 5 μL of 40 mM EDTA and gently pipette to
mix. Spin down at 1000 × g for 10 min at 4 °C, and carefully
remove all the supernatant. Keep the nuclei on ice and proceed
to Subheading 3.4 immediately.

3.4 Reverse 1. Freshly prepare the RT mix and keep on ice.


Transcription 2. Add each of the 4 μL barcoded RT primers into the
12 corresponding 200 μL PCR tubes.
3. Resuspend the 12 tubes of nuclei pellet with 14 μL RT mix and
transfer to 12 of 200 μL PCR tubes from the previous step with
barcoded RT primers.
4. Add 2 μL Maxima H minus reverse transcriptase to each tube.
Tap to mix and briefly spin down.
5. Perform the reverse transcription program in a thermocycler
using the program set up as below:

Temperature
Step no. (°C) Time
1 50 10 min
2 8 12 s
15 45 s
20 45 s
30 30 s
42 2 min
50 5 min; repeat step 2 for additional 2 cycles
3 50 10 min
4 12 Hold

6. Transfer the 12 tubes to ice. Keep on ice and pool all nuclei into
a 1.5 mL Axygen Maximum Recovery tube, add 4.8 μL 5%
Triton X-100, tap to mix, and quickly spin down.
7. Centrifuge to pellet the nuclei at 1000 × g for 10 min at 4 °C,
and carefully discard the supernatant.
8. Resuspend the nuclei in 1 mL 1X NEBuffer 3.1 and proceed to
Subheading 3.5 immediately.
176 Chenxu Zhu et al.

3.5 Adding DNA 1. Prepare R02 Blocking Solution, R03 Termination Solution,
Barcodes and two tubes of Ligation Mix freshly before the experiment.
2. Prewash two 15 mL Corning tubes by rinsing each tube with
0.5 mL 0.1% BSA in PBS, and discard the liquid (see Note 6).
3. Add the nuclei suspension to the 1st Ligation Mix, add
100 μL T4 DNA Ligase, and gently mix by pipetting up
and down.
4. Transfer the nuclei-Ligation Mix to a reagent reservoir, and
distribute 40 μL of the mixture to each of the 96-well of R02
Barcoding Plate with a multichannel pipette. Seal the plate
with film.
5. Incubate the nuclei–barcode ligation mixture in a Thermo-
Mixer set to 37 °C, 300 rpm for 30 min.
6. Open the seal, add 10 μL of R02 Blocking Solution into each of
the 96-well with a multichannel pipette, and reseal the plate.
7. Continue incubating the nuclei–barcode ligation mixture in a
ThermoMixer set to 37 °C, 300 rpm for another 30 min.
8. Pool all nuclei in a reagent reservoir, and transfer the mixture
containing the nuclei from the reagent reservoir to a 15 mL
tube (prewashed with 0.1% BSA in PBS in step 2).
9. Wash the reagent reservoir with 1 mL of PBS and combine to
the nuclei mixture.
10. Spin down the nuclei with a swing bucket centrifuge at
1000 × g for 10 min at 4 °C, and carefully discard the superna-
tant (see Note 7).
11. Resuspend the nuclei in 1 mL 1X NEBuffer 3.1.
12. Transfer the nuclei suspension to the 2nd Ligation Mix, add
100 μL T4 DNA Ligase, and gently mix by pipetting up
and down.
13. Transfer the nuclei-Ligation Mix to a reagent reservoir, and
distribute 40 μL of the mixture to each of the 96-well of R03
Barcoding Plate with a multichannel pipette. Seal the plate.
14. Incubate the nuclei–barcode ligation mixture in a Thermo-
Mixer set to 37 °C, 300 rpm for 30 min.
15. Open the seal and add 10 μL of R03 Termination Solution into
each of the 96-well with a multichannel pipette.
16. Immediately pool all nuclei in a reagent reservoir, and transfer
the mixture containing the nuclei from the reagent reservoir to
a 15 mL tube (prewashed with 0.1% BSA in PBS in step 2).
17. Wash the reagent reservoir with 1 mL of PBS and intermix with
the nuclei mixture.
Single-Cell Co-Assay of Open Chromatin and RNA 177

18. Spin down the nuclei with a swing bucket centrifuge at


1000 × g for 10 min at 4 °C, and carefully discard the superna-
tant (see Note 7).
19. Resuspend the nuclei in 50 μL PBS (nuclei stock suspension).
Dilute 1 μL of nuclei with 9 μL of PBS and count the concen-
tration of nuclei.
20. Dilute the nuclei stock suspension to 1 k/μL. Aliquot 3 μL of
nuclei (total 3 k) into 200 μL PCR tubes or 96-well low-bind
PCR plates (as sub-libraries) (see Note 9).
21. Prepare the lysis mix as follows: (a) calculate the number of
sub-libraries that need to be lysed; (b) for each sub-library, the
lysis mix contains 18 μL PBS, 3 μL 4 M NaCl, 3 μL 10% SDS,
and 3 μL 20 mg/mL Proteinase K; (c) add and mix the
reagents in the order of PBS, NaCl, SDS, and Proteinase K.
22. Add 27 μL of lysis mix to each sub-library. Incubate in a
ThermoMixer set to 55 °C, 550 rpm for 2 h.
23. Cool the lysis mixture to room temperature. Add 30 μL of
SPRI beads (1X) into each well and mix. Incubate at room
temperature for 5 min. Prepare 80% EtOH (see Note 8).
24. Place the tubes or plate on a magnetic stand, sit for 5 min until
the liquid becomes clear, and carefully discard the supernatant.
25. Add 150 μL of 80% EtOH into each tube/well, sit for 30 s, and
discard the supernatant.
26. Repeat step 25 for a total of two washes.
27. Elute the DNA/cDNA with 12.5 μL ultrapure H2O. The
purified DNA/cDNA can be stored at -20 °C or can be
directly used in Subheading 3.6.

3.6 Library Pre- 1. Add 1.5 μL of 10X Terminal Transferase Buffer and 0.5 μL of
amplification 1 mM dCTP into each sub-library. Close the lid, tap to mix,
and briefly spin down.
2. Incubate at 95 °C for 5 min and immediately chill on ice and sit
for another 5 min.
3. Add 0.5 μL of Terminal Transferase into each tube. Close the
lid, tap to mix, and briefly spin down.
4. Incubate at 37 °C for 30 min, followed by heat inactivating the
reaction at 65 °C for 10 min.
5. Prepare the Anchor Mix freshly. Add 15 μL Anchor Mix into
each tube. Close the lid, tap to mix, and briefly spin down.
6. Carry out the reaction in a thermocycler with the program
below:
178 Chenxu Zhu et al.

Step no. Temperature (°C) Time


1 95 3 min
2 95 15 s
47 1 min
68 2 min
47 1 min
68 2 min; repeat step 2 for additional
15 cycles
3 72 10 min
4 12 Hold

7. Prepare the Preamp Mix freshly. Add 20 μL Preamp Mix to


each tube and gently mix by pipetting up and down.
8. Carry out the reaction in a thermocycler with the program
below:

Step no. Temperature (°C) Time


1 98 3 min
2 98 20 s
65 20 s
72 2.5 min; repeat step 2 for additional
10 cycles
3 72 2 min
4 12 Hold

9. Add 10 μL of SPRI beads (0.2X) into each tube and mix.


Incubate at room temperature for 5 min. Prepare 80% EtOH.
10. Place the tubes to a magnetic stand, and let them sit for 5 min
until the liquid becomes clear.
11. Transfer the supernatant into new tubes, add 32.5 μL SPRI
beads (0.65X + 0.2X = 0.85X) to each tube, and mix. Incu-
bate at room temperature for 5 min.
12. Place the tubes on a magnetic stand, and let them sit for 5 min
until the liquid becomes clear. Carefully discard the
supernatant.
13. Add 150 μL of 80% EtOH into each tube/well, sit for 30 s,
and discard the supernatant.
14. Repeat step 13 for a total of two washes.
Single-Cell Co-Assay of Open Chromatin and RNA 179

15. Elute the amplification product with 40 μL ultrapure


H2O. The purified product can be stored at -20 °C.
16. Quantify the concentration of the amplification product with
Qubit.

3.7 Library Splitting 1. Divide the purified amplification product into two tubes:
20.5 μL for Tn5 tagmentation-derived DNA library prepara-
tion and 17 μL for RNA-derived library preparation.
2. Steps 2–11 are for DNA library preparation. Add 2.5 μL 10X
CutSmart Buffer, 1 μL SbfI-HF, and 1 μL FokI to each 20.5 μL
aliquot of the purified amplification product. Incubate at 37 °C
for 1 h.
3. Add 31.3 μL SPRI beads (1.25X) to each tube and mix. Incu-
bate at room temperature for 5 min.
4. Place the tubes on a magnetic stand, and let them sit for 5 min
until the liquid becomes clear. Carefully discard the
supernatant.
5. Add 150 μL 80% EtOH into each tube/well, sit for 30 s, and
discard the supernatant.
6. Repeat step 5 for a total of two washes.
7. Elute the amplification product with 15 μL ultrapure
H2O. The purified product can be stored at -20 °C.
8. Add 2 μL 10X T4 DNA Ligase Buffer, 1.5 μL Adaptor Mix,
and 1.5 μL T4 DNA Ligase to each tube from the previous
step. Close the lid, tap to mix, and briefly spin down.
9. Carry out the ligation reaction in a thermocycler with the
program as given below. Put the tubes to thermocycler imme-
diately after the temperature reached 4 °C:

Step no. Temperature (°C) Time


1 4 10 min
2 10 5 min
3 16 15 min
4 25 45 min
5 12 Hold

10. Add 25 μL SPRI beads (1.25X) directly to the reaction mix-


ture and mix. Incubate at room temperature for 5 min and
repeat the wash steps as described in steps 4–6.
11. Elute the adaptor-ligated DNA with 21 μL ultrapure
H2O. The purified product can be stored at -20 °C.
180 Chenxu Zhu et al.

12. Steps 12–18 are for RNA library preparation. Add 2 μL 10X
CutSmart Buffer and 1 μL NotI-HF into the 17 μL amplifi-
cation product. Incubate at 37 °C for 1 h.
13. Add 25 μL SPRI beads (1.25X) to each tube and mix. Incu-
bate at room temperature for 5 min and repeat the wash steps
as described in steps 4–6.
14. Elute with 10 μL ultrapure H2O. The purified product can be
store at -20 °C.
15. Use 5 μL of the purified product for tagmentation with
Illumina Nextera XT. Add 10 μL Buffer TD and pipette up
and down to mix.
16. Add 5 μL of Amplicon Tagmentation Mix (ATM) to each
tube, pipette 10 times to mix and close the lid, and quickly
spin down.
17. Incubate the mixture in a thermocycler at 55 °C for 5 min,
cool down to 10 °C, and immediately place the tubes
on ice.
18. Add 5 μL Neutralize Tagment Buffer (NT) to each well,
pipette 10 times to mix, close the lid, and incubate at room
temperature for 5 min. Proceed to step 8 of Subheading 3.8.

3.8 Library 1. Steps 1–7 are for DNA library amplification. Add 2 μL of
Amplification Illumina TruSeq i7 index primers, 2 μL Illumina TruSeq i5
index primers, and 25 μL NEBNext 2X HiFi PCR mix. Use
pipette to mix, close the lid, and quickly spin down.
2. Carry out the PCR reaction in a thermocycler with the pro-
gram as follows (see Note 10):

Temperature
Step no. (°C) Time
1 98 3 min
2 98 10 s
63 30 s
72 1 min; repeat step 2 for additional 11–13 cycles
3 72 5 min
4 12 Hold

3. Add 42.5 μL SPRI beads (0.85X) to each tube and mix. Incu-
bate at room temperature for 5 min.
Single-Cell Co-Assay of Open Chromatin and RNA 181

4. Place the tubes on a magnetic stand, and let them sit for 5 min
until the liquid becomes clear. Carefully discard the
supernatant.
5. Add 150 μL 80% EtOH into each tube/well, sit for 30 s, and
discard the supernatant.
6. Repeat step 5 for a total of two washes.
7. Elute the DNA library with 25 μL ultrapure H2O. The purified
library can be stored at -20 °C.
8. Steps 8–11 are for RNA library amplification. Add 6 μL
ultrapure H2O, 2 μL of Illumina TruSeq i7 index primers,
2 μL of Illumina Nextera i5 index primers, and 15 μL
Nextera PCR Mix (NPM) to each tube. Pipette to mix,
close the lid, and briefly spin down.
9. Carry out the PCR reaction in a thermocycler with the pro-
gram as follows (see Note 10):

Temperature
Step no. (°C) Time
1 72 3 min
2 95 30 s
3 95 10 s
55 30 s
72 1 min; repeat step 2 for additional 11–13 cycles
4 72 5 min
5 12 Hold

10. Purify the RNA library as described in steps 3–6.


11. Elute the RNA library with 25 μL ultrapure H2O. The pur-
ified library can be stored at -20 °C.
12. Quantify the concentration of library with KAPA qPCR quan-
tification kit for Illumina. Check the size distribution of
libraries with Tapestation (see Notes 11–13).

3.9 Sequencing and 1. DNA and RNA libraries with different combinations of indices
Data Preprocessing can be multiplexed for sequencing.
2. Paired-seq requires at least 50 cycles for Read1 (insert genomic
sequences), 8 cycles for Index Read1, 8 cycles for Index Read2,
and 100 cycles for Read2 (cellular barcodes) (50 + 8 + 8 + 100).
182 Chenxu Zhu et al.

3. Paired-seq data preprocessing includes: (a) extracting three-


round barcode sequences from Read2, (b) assigning barcode
sequences to individual tube/wells, and (c) mapping reads to
reference genome and generation of cell-counts matrices.
4. All scripts required for Paired-seq data preprocessing and the
analysis steps are available from GitHub (https://github.com/
cxzhu/Paired-Tag).

4 Notes

1. All the safe pause points in the protocol are indicated in Fig. 1a.
2. For 96-well barcode plates, standard desalting purification can
be used. For Index PCR primers, HPLC purification is
required.
3. Native nuclei isolated from snap-frozen tissues or fresh tissue
are preferred. Crosslinked nuclei will reduce the complexities
for Paired-seq DNA libraries.
4. Nuclei preparation, tagmentation, reverse transcription, and
combinatorial DNA barcoding must be carried out in a single
day, which will take ~8 h.
5. The optimal input nuclei number is 1.2 million
(or 100,000 × 12 tubes). Less cell number is acceptable but
will result in a lower recovery rate. We can typically recover
200,000–300,000 from 1.2 million input nuclei (17–25%) and
30,000–50,000 from 500,000 (41,700 × 12 tubes) input
nuclei (6–10%).
6. Prewash the 15 mL tubes with 0.1% BSA in PBS, which can
reduce nuclei sticking to the tube and increase nuclei
recovery rate.
7. During removing supernatants after spin-down steps, remove
the liquid as much as possible. The downstream reaction might
be interfered by residual buffers, salt, or oligos from the previ-
ous step (e.g., EDTA after tagmentation in step 6 of Subhead-
ing 3.3, and adaptors oligos after nuclei barcoding in
Subheading 3.5).
8. During purification of nucleic acids from lysis mixture, make
sure to wash out SDS as the residual SDS may inhibit the
subsequent reactions.
9. The optimal number of nuclei in each sub-library is ~3500,
which gives 6–10% potential barcode collision rate. A higher
number of nuclei in each sub-library will result in higher bar-
code collision. Using nuclei sorting instead of dilution to ali-
quot sub-libraries can reduce the potential barcode collision,
but will also reduce the recovery rate.
Single-Cell Co-Assay of Open Chromatin and RNA 183

a b
600 Paired-seq DNA Library

Normallized Intensity
A

A
N

N
EL

R
bp
1,500 300
1,000
700

500 0
400
00

25

50

00
300 15

10

20

30

50

10
200 c
600 Paired-seq RNA Library
Normallized Intensity

100

50
300

25

0
00
25

50

00
15
10

20

30

50

10
Fig. 2 Representative fragment analysis results of Paired-seq library. (a) Tapestation analysis results of a
representative Paired-seq library. EL electronic ladder. (b) Fragments size distribution of representative DNA
(b) and RNA (c) library of Paired-seq

10. To determine the optimal PCR cycles for step 9 of Subheading


3.8, the PCR reaction can be carried out in two steps:
(a) perform PCR amplification for 10 cycles and put on ice,
take out 0.5 μL of PCR mixture and dilute 1000X, quantify
with qPCR and calculate the additional cycles needed to reach
10 nM concentration; (b) perform PCR amplification with the
needed additional cycles and purify the amplified products, and
store at -20 °C.
11. After preamplification, typically ~40–1200 ng amplified pro-
ducts can be recovered (~1–30 ng/μL as measured by Qubit).
The yields of parallel processed sub-libraries should be compa-
rable with each other.
12. Paired-seq libraries should have a fragment size distribution of
300–1000 bp (Fig. 2). If fragments of ~245 bp (adaptors)
appear as a significant fraction, try to remove them with an
additional round of 0.75X SPRI beads size selection.
13. Quantification of libraries must be carried out by qPCR. Tapes-
tation (or Fragment Analyzer) and Qubit analysis tend to
overestimate Paired-seq library concentrations.
184 Chenxu Zhu et al.

Acknowledgments

We thank QB3 MacroLab for the Tn5 enzyme. This study was
funded by grant nos. 1 U19 MH114831-02, U01MH121282,
and R01AG066018 and the Ludwig Institute for Cancer Research
(to B.R.) and grant no. 1K99HG011483-01 (to C.Z.).

References
1. Lee CK, Shibata Y, Rao B et al (2004) Evidence 10. Lai B, Gao W, Cui K et al (2018) Principles of
for nucleosome depletion at active regulatory nucleosome organization revealed by single-
regions genome-wide. Nat Genet 36(8): cell micrococcal nuclease sequencing. Nature
900–905. https://doi.org/10.1038/ng1400 562(7726):281–285. https://doi.org/10.
2. Thurman RE, Rynes E, Humbert R et al 1038/s41586-018-0567-3
(2012) The accessible chromatin landscape of 11. Cusanovich DA, Daza R, Adey A et al (2015)
the human genome. Nature 489(7414): Multiplex single cell profiling of chromatin
7 5 – 8 2 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / accessibility by combinatorial cellular indexing.
nature11232 Science 348(6237):910–914. https://doi.
3. Yue F, Cheng Y, Breschi A et al (2014) A org/10.1126/science.aab1601
comparative encyclopedia of DNA elements in 12. Buenrostro JD, Wu B, Litzenburger UM et al
the mouse genome. Nature 515(7527): (2015) Single-cell chromatin accessibility
3 5 5 – 3 6 4 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / reveals principles of regulatory variation.
nature13992 Nature 523(7561):486–490. https://doi.
4. Boyle AP, Davis S, Shulha HP et al (2008) org/10.1038/nature14590
High-resolution mapping and characterization 13. Lareau CA, Duarte FM, Chew JG et al (2019)
of open chromatin across the genome. Cell Droplet-based combinatorial indexing for
132(2):311–322. https://doi.org/10.1016/j. massive-scale single-cell chromatin accessibility.
cell.2007.12.014 Nat Biotechnol 37(8):916–924. https://doi.
5. Schones DE, Cui K, Cuddapah S et al (2008) org/10.1038/s41587-019-0147-6
Dynamic regulation of nucleosome positioning 14. Preissl S, Fang R, Huang H et al (2018) Single-
in the human genome. Cell 132(5):887–898. nucleus analysis of accessible chromatin in
https://doi.org/10.1016/j.cell.2008.02.022 developing mouse forebrain reveals cell-type-
6. Giresi PG, Kim J, McDaniell RM et al (2007) specific transcriptional regulation. Nat Neu-
FAIRE (Formaldehyde-Assisted Isolation of rosci 21(3):432–439. https://doi.org/10.
Regulatory Elements) isolates active regulatory 1038/s41593-018-0079-3
elements from human chromatin. Genome Res 15. Kelsey G, Stegle O, Reik W (2017) Single-cell
17(6):877–885. https://doi.org/10.1101/gr. epigenomics: recording the past and predicting
5533506 the future. Science 358(6359):69–75. https://
7. Buenrostro JD, Giresi PG, Zaba LC et al doi.org/10.1126/science.aan6826
(2013) Transposition of native chromatin for 16. Stuart T, Satija R (2019) Integrative single-cell
fast and sensitive epigenomic profiling of open analysis. Nat Rev Genet 20(5):257–272.
chromatin, DNA-binding proteins and nucleo- https://doi.org/10.1038/s41576-019-
some position. Nat Methods 10(12): 0093-7
1213–1218. https://doi.org/10.1038/ 17. Zhu C, Preissl S, Ren B (2020) Single-cell
nmeth.2688 multimodal omics: the power of many. Nat
8. Minnoye L, Marinov GK, Krausgruber T et al Methods 17(1):11–14. https://doi.org/10.
(2021) Chromatin accessibility profiling meth- 1038/s41592-019-0691-5
ods. Nat Rev Methods Prim 1(1):10. https:// 18. Angermueller C, Clark SJ, Lee HJ et al (2016)
doi.org/10.1038/s43586-020-00008-9 Parallel single-cell sequencing links transcrip-
9. Jin W, Tang Q, Wan M et al (2015) Genome- tional and epigenetic heterogeneity. Nat Meth-
wide detection of DNase I hypersensitive sites ods 13(3):229–232. https://doi.org/10.
in single cells and FFPE tissue samples. Nature 1038/nmeth.3728
528(7580):142–146. https://doi.org/10. 19. Zhu C, Zhang Y, Li YE et al (2021) Joint
1038/nature15740 profiling of histone modifications and tran-
scriptome in single cells from mouse brain.
Single-Cell Co-Assay of Open Chromatin and RNA 185

Nat Methods 18(3):283–292. https://doi. 24. Zhu C, Yu M, Huang H et al (2019) An ultra


org/10.1038/s41592-021-01060-3 high-throughput method for single-cell joint
20. Cao J, Cusanovich DA, Ramani V et al (2018) analysis of open chromatin and transcriptome.
Joint profiling of chromatin accessibility and Nat Struct Mol Biol 26(11):1063–1070.
gene expression in thousands of single cells. https://doi.org/10.1038/s41594-019-
Science 361(6409):1380–1385. https://doi. 0323-x
org/10.1126/science.aau0730 25. Rosenberg AB, Roco CM, Muscat RA et al
21. Wei X, Xiang Y, Peters D et al (2022) HiCAR is (2018) Single-cell profiling of the developing
a robust and sensitive method to analyze open- mouse brain and spinal cord with split-pool
chromatin-associated genome organization. 82 barcoding. Science 360(6385):176–182.
(6):1225–1238.e6. https://doi.org/10. https://doi.org/10.1126/science.aam8999
1016/j.molcel.2022.01.023 26. Adey A, Morrison HG, Asan et al (2010)
22. Liu L, Liu C, Quintero A et al (2019) Decon- Rapid, low-input, low-bias construction of
volution of single-cell multi-omics layers shotgun fragment libraries by high-density
reveals regulatory heterogeneity. Nat Commun in vitro transposition. Genome Biol 11(12):
10(1):470. https://doi.org/10.1038/ R119. https://doi.org/10.1186/gb-2010-
s41467-018-08205-7 11-12-r119
23. Chen S, Lake BB, Zhang K (2019) High- 27. Zhang K, Hocker JD, Miller M et al (2021) A
throughput sequencing of the transcriptome single-cell atlas of chromatin accessibility in the
and chromatin accessibility in the same cell. human genome. Cell:184(24):5985–6001.
Nat Biotechnol 37:1452. https://doi.org/10. e19. https://doi.org/10.1016/j.cell.2021.
1038/s41587-019-0290-0 10.024
Chapter 11

Simultaneous Single-Cell Profiling of the Transcriptome


and Accessible Chromatin Using SHARE-seq
Samuel H. Kim, Georgi K. Marinov, S. Tansu Bagdatli, Soon Il Higashino,
Zohar Shipony, Anshul Kundaje, and William J. Greenleaf

Abstract
The ability to analyze the transcriptomic and epigenomic states of individual single cells has in recent years
transformed our ability to measure and understand biological processes. Recent advancements have focused
on increasing sensitivity and throughput to provide richer and deeper biological insights at the cellular level.
The next frontier is the development of multiomic methods capable of analyzing multiple features from the
same cell, such as the simultaneous measurement of the transcriptome and the chromatin accessibility of
candidate regulatory elements. In this chapter, we discuss and describe SHARE-seq (Simultaneous high-
throughput ATAC, and RNA expression with sequencing) for carrying out simultaneous chromatin
accessibility and transcriptome measurements in single cells, together with the experimental and analytical
considerations for achieving optimal results.

Key words scRNA-seq, scATAC-seq, Multiomics, Chromatin accessibility, Transcriptomics,


Split–pool

1 Introduction

The basic unit of biological organization is the individual cell. In


combination with the surrounding cellular microenvironments
within the context of a multicellular organism, each cell integrates
across internal and external stimuli to maintain or alter its state for
biological function. Understanding the cellular state at the single-
cell resolution, therefore, is critical to defining the regulatory pro-
cesses driving health and disease. A key advancement toward under-
standing cellular states has been in the development of
transcriptomic methods. With the advent of high-throughput
sequencing methods in the late 2000s, RNA-seq was developed
to profile transcriptomes at base-pair resolutions [1–4]. Subse-
quently, the molecular biology approaches that enabled ever
improved RNA-seq sensitivity have led to the development of

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_11,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

187
188 Samuel H. Kim et al.

single-cell RNA-seq (scRNA-seq) to measure transcriptomes at the


single-cell level. The first scRNA-seq methods [5–8] were very low
throughput, only able to measure a few cells at a time. Further
technical advancements utilized microfluidics- and plate-based
approaches to increase throughput to the 102–103 range [9, 10],
while droplet- and bead-based methods later boosted it to the 104–
105 range [11–14]. However, the approach that holds the most
promise for ultra-high-throughput single-cell measurements is
combinatorial indexing. The core concept of these approaches is
to dynamically assign barcodes through multiple rounds of splitting
and pooling cells to create a combinatorial set of barcodes that can
be used to uniquely identify each cell. Specifically, a set of cells can
be split into a 96- or 384-well plates, each well given a specific
barcode, and then pooled back together to be randomly split into
another set of plates. Iteratively performing these split–pool rounds
with an optimal number of input cells, barcodes, and the number of
rounds of barcoding, one can create a sufficient diversity of bar-
codes to uniquely assign each cell to a combination of barcodes. In
comparison to physical isolation of each cell in a droplet or a well,
combinatorial indexing provides a scalable platform for single-cell
measurements. This is the basis of all “sci” (single-cell combinato-
rial indexing) methods, such as sci-RNA-seq [15] and SPLiT-
seq [16].
While scRNA-seq measures the current amount of transcripts
in a given cell, it does not provide insight into how that transcrip-
tional state is achieved and maintained through regulation.
Mapping active cis-regulatory elements (cREs) provides key insight
to address this need. A common property of active cREs, originally
recognized more than four decades ago [17–19], is that they are
depleted of nucleosomes and exhibit an open, “accessible” confor-
mation. This property has been the basis for numerous methods
that have been developed over the years to profile these elements
[20], which rely on the preferential enzymatic cleavage or labeling
of open chromatin regions. ATAC-seq [21, 22] (Assay for Trans-
posase-Accessible Chromatin using sequencing) has emerged as
the most versatile instance of such assays. ATAC-seq takes advan-
tage of the preferential insertion of a hyperactive Tn5 [23] trans-
posase, preloaded with sequencing adapters, into open chromatin.
Tn5 had been previously adapted and successfully used for the
generation of high-throughput sequencing libraries from
low-input DNA samples [24]. The realization that it can also be
used to tag open chromatin regions with ready-for-amplification
sequencing adapters in a single reaction allowed for chromatin
accessibility profiling to be carried out in bulk on very low-input
samples (typically 50,000 cells, but also down to just a few thou-
sand [21]), and eventually in single cells, in the form of scATAC-
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 189

seq, in the mid-2010s[25]. As with scRNA-seq, the throughput of


scATAC-seq has also been dramatically increased over the years,
using combinatorial indexing (sciATAC-seq [26–28]), microwell
plates (μATAC-seq [29]), droplet-based methods [30], and com-
binations of combinatorial indexing and droplets (dsciATAC-seq
[31]).
Techniques such as scRNA-seq and scATAC-seq have provided
unprecedented insights into the diversity of cell types, their devel-
opmental dynamics, and cellular responses to external stimuli in a
wide variety of context. However, the ideal measurements would
provide information about all relevant aspects of the state from the
same cell. To this end, a variety of single-cell multiomic methods,
measuring multiple such modalities in the same individual cells,
have been under active development in recent years. These include
methods for sequencing the genomes and transcriptomes of single
cells (G & T-seq [32], PRDD-seq [33], DNTR-seq [34], sci-L3-
RNA/DNA [35], TARGET-seq [36], and others), for sequencing
methylomes and transcriptomes (scTrio-seq [37], scMT-seq [38],
and scM & T-seq [39]), for mapping accessible chromatin and
methylomes (e.g., scNOMe-seq [40]), for measuring proteins and
transcripts (REAP-seq [41], CITE-seq [42], QBC [43], inCITE-
seq [44], iNS-seq [45], using methylation-based labeling of open
chromatin to map accessible DNA and transcripts (COOL-seq
[46], scNMT-seq [47], scNOMeRe-seq [48], snmC2T-seq [49]),
mapping protein occupancy and transcriptomes (CoTECH [50],
Paired-Tag [51], scDam & T-seq [52]), for quantifying proteins
levels and mapping open chromatin (PHAGE-ATAC [53], ASAP-
seq [54]), for quantifying proteins and transcriptome levels and
mapping open chromatin (DOGMA-seq [54], TEA-seq [55]), and
others [56].
As regulatory elements and RNA levels are the two perhaps
most informative modalities, joint scATAC-seq + scRNA-seq meth-
ods are the most sought after multiomic assays. A number of these
have been developed in recent years—sci-CAR-seq [57], Paired-seq
[58], ASTAR-seq [59], SNARE-seq [60], SHARE-seq [61], and
others. The ideal such assay should capture as many of the tran-
scripts present in each cell as possible and also as many of the open
chromatin regions in the nucleus, with high specificity and little
noise. The SHARE-seq assay, which is based on the combinatorial
indexing described above, provides high-quality and high-
throughput transcriptome and accessible chromatin measurements
in the same single cells.
In this chapter, we describe in detail the SHARE-seq procedure
and discuss the key optimization points and considerations for the
generation of high-quality scATAC+scRNA-seq datasets.
190 Samuel H. Kim et al.

2 Materials

2.1 DNA Oligos and All oligonucleotides can be obtained through IDT. The exact scale
Primers and purification methods are listed below:
1. Round 1 linker (1 μmol scale, standard desalting):
CCGAGCCCACGAGACTCGGACGATCATGGG
2. Round 2 linker (1 μmol scale, standard desalting):
CAAGTATGCAGCGCGCTCAAGCACGTGGAT
3. Round 3 linker (1 μmol scale, standard desalting):
AGTCGTACGCCGATGCGAAACATCGGCCAC

4. Round 1 blocking (1 μmol scale, standard desalting):


CCCATGATCGTCCGAGTCTCGTGGGCTCGG
5. Round 2 blocking (1 μmol scale, standard desalting):
ATCCACGTGCTTGAGCGCGCTGCATACTTG
6. Round 3 blocking (1 μmol scale, standard desalting):
GTGGCCGATGTTTCGCATCGGCGTACGACT
7. Read 1 (100 nmol scale, HPLC purified):
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
8. Template Switching Oligo (TSO) (100nmol scale, HPLC
purified):
AAGCAGTGGTATCAACGCAGAGTGAATrGrG+G
9. RNA PCR primer (100 nmol scale, standard desalting):
AAGCAGTGGTATCAACGCAGAGT
10. P7 primer (100 nmol scale, standard desalting):
CAAGCAGAAGACGGCATACGAGAT
11. Phosphorylated Read2 (100 nmol scale, HPLC purified):
/5Phos/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG—
12. Reverse transcription primer (RT primer) (100 nmol scale,
HPLC purified)
/5Phos/GTCTCGTGGGCTCGGAGATGTGTATAAGAGA-
CAGNNNNNNNNNN/iBiodT/TTTTTTTTTTTTTTVN
13. Blocked_ME_Comp (100 nmol scale, HPLC purified):
/5Phos/CTG TCT CTT ATA CA/3ddC/
14. Pool–split ligation Plate R1 (see Note 4:
/5Phos/CGCGCTGCATACTTG[8-bp-barcode]
CCCATGATCGTCCGA
15. Pool–split ligation Plate R2 (see Note 4:
/5Phos/CATCGGCGTACGACT[8-bp-barcode]
ATCCACGTGCTTGAG
16. Pool–split ligation Plate R3 (see Note 4:
CAAGCAGAAGACGGCATACGAGAT[8-bp-barcode]
GTGGCCGATGTTTCG
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 191

17. PCR Library indexing primers plate:


AATGATACGGCGACCACCGAGATCTACAC[8bp-index]
TCGTCGGCAGCGTCAGATGTGTAT

An example set of 96 barcodes are listed below:


AACGTGAT AAGGTACA CACTTCGA GATAGACA TGGAACAA ATCATTCC
AAACATCG ACACAGAA CAGCGTTA GCCACATA TGGCTTCA ATTGGCTC
ATGCCTAA ACAGCAGA CATACCAA GCGAGTAA TGGTGGTA CAAGGAGC
AGTGGTCA ACCTCCAA CCAGTTCA GCTAACGA TTCACGCA CACCTTAC
ACCACTGT ACGCTCGA CCGAAGTA GCTCGGTA AACTCACC CCATCCTC
ACATTGGC ACGTATCA CCGTGAGA GGAGAACA AAGAGATC CCGACAAC
CAGATCTG ACTATGCA CCTCCTGA GGTGCGAA AAGGACAC CCTAATCC
CATCAAGT AGAGTCAA CGAACTTA GTACGCAA AATCCGTC CCTCTATC
CGCTGATC AGATCGCA CGACTGGA GTCGTAGA AATGTTGC CGACACAC
ACAAGCTA AGCAGGAA CGCATACA GTCTGTCA ACACGACC CGGATTGC
CTGTAGCC AGTCACTA CTCAATGA GTGTTCTA ACAGATTC CTAAGGTC
AGTACAAG ATCCTGTA CTGAGCCA TAGGATGA AGATGTAC GAACAGGC
AACAACCA ATTGAGGA CTGGCATA TATCAGCA AGCACCTC GACAGTGC
AACCGAGA CAACCACA GAATCTGA TCCGTCTA AGCCATGC GAGTTAGC
AACGCTTA GACTAGTA CAAGACTA TCTTCACA AGGCTAAC GATGAATC
AAGACGGA CAATGGAA GAGCTGAA TGAAGAGA ATAGCGAC GCCAAGAC

2.2 General 1. Eppendorf ThermoMixer C (96-well plate adapter)


Reagents 2. Tabletop centrifuge
3. Swing bucket centrifuge with temperature control
4. Thermal cycler
5. Cold room
6. qPCR machine (QuantStudio 3)
7. Qubit fluorometer or equivalent
8. E-gel electrophoresis system (Thermo Fisher Scientific)
9. TapeStation (Agilent) or equivalent, e.g., BioAnalyzer
(Agilent).
10. Multichannel pipettes or liquid handling instruments
11. gentleMACS Dissociator (Miltenyi Biotec)
12. Automated cell counter, e.g., Countess 3 (Thermo Fisher Sci-
entific) or equivalent.

2.3 General 1. 1× PBS buffer solution (Thermo Fisher Scientific, Cat


Equipment #10010049)
2. Bovine Albumin Fraction V (7.5% solution) (Thermo Fisher
Scientific, Cat #15260037)
3. Trypan Blue Stain (0.4%) (Thermo Fisher Scientific, Cat
#T10282)
192 Samuel H. Kim et al.

4. Enzymatic RI (Qiagen, Cat #Y9240L)


5. SUPERase RI (Thermo Fisher Scientific, Cat #AM2696)
6. Lucigen RI (Lucigen Cat # 30281-2)
7. Protector RI (Sigma Aldrich Cat # 3335399001)
8. 16% FA (Thermo Fisher Scientific, Cat # 28906)
9. Glycine (Sigma Aldrich, Cat #50049)
10. 1 M Tris HCl pH 7.5 (Thermo Fisher Scientific, Cat
#15567027)
11. 1 M Tris HCl pH 8.0 (Thermo Fisher Scientific, Cat
#15568025)
12. 5 M NaCl (Thermo Fisher Scientific, Cat #AM9760G)
13. 1 M MgCl2 (Sigma Aldrich, Cat #63069)
14. 1 M CaCl2 (Sigma Aldrich, Cat #21115-100ML)
15. DMF (Dimethyl Formamide) (Sigma, Cat #227056)
16. 0.2 M Tris-acetate pH 7.8 (Bioworld, Cat #40120265-2)
17. 5 M Potassium acetate (Sigma Aldrich, Cat #95843-100ML-F)
18. 1 M Magnesium acetate (Sigma Aldrich, Cat #63052-100ML)
19. 10% NP-40 (Thermo Fisher Scientific, Cat #28324)
20. Buffer EB (Qiagen, Cat #19086)
21. PEG 6000 (Sigma Aldrich, Cat #528877)
22. Maxima H Minus Reverse Transcriptase with buffer (Thermo
Fisher Scientific, Cat #EP075)
23. 10 mM dNTPs (NEB, Cat #N0447L)
24. T4 DNA Ligase (NEB, Cat #M0202L)
25. Additional 10× T4 Ligase buffer (NEB, Cat #B0202S)
26. Proteinase K (20 mg/mL) (NEB, Cat #P8107S)
27. 20% SDS (VWR, Cat #97062+440)
28. 100 mM PMSF/IPA (Sigma Aldrich, Cat # P7626)
29. cOmplete Protease Inhibitor Cocktail (Sigma Aldrich, Cat #
11697498001)
30. 0.5 M EDTA (Sigma Aldrich, Cat #AM9260G)
31. Tween-20 (Sigma Aldrich, Cat #P9416-100ML)
32. Digitonin (Promega, Cat #G9441)
33. MyOne C1 Dynabeads (Thermo Fisher Scientific, Cat
#65001)
34. Ficoll PM-400 (20%) (Sigma Aldrich, Cat #F5415-25ML)
35. KAPA HiFi 2× mix (Fisher Scientific, Cat #NC0295239)
36. SPRIselect beads (Beckman Coulter, Cat #B23318)
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 193

37. 100% EtOH


38. 100 mM DTT (Thermo Fisher Scientific, Cat #707265ML)
39. NEBnext 2× Mix (NEB, Cat #M0541L)
40. Glycerol (Thermo Fisher Scientific, Cat #15514011)
41. TD buffer from Nextera kit
42. SYBR Green I Nucleic Acid Gel Stain (Thermo Fisher Scien-
tific, Cat #S7563)
43. EVAGreen Dye, 20x in water (Biotium, Cat #31000)
44. Nuclease-free H2O
45. 96-well plates (Eppendorf, Cat #0030129300) (preferably low
protein and DNA binding; see Note 5)
46. 1.5-mL microcentrifuge tubes, preferably low protein and
DNA binding (see Note 5)
47. 2-mL, 15-mL, and 50-mL tubes
48. gentleMACS M-Tubes (Miltenyi Biotec, Cat #130-093-236)
49. 30 μm Sterile single-pack CellTrics filters (Sysmex, Cat #04-
004-2326)
50. 200-μL PCR tubes
51. Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Cat #
Q32851)
52. TapeStation D1000 and D5000 tape and reagents (Agilent)
53. Tn5 transposase (see Note 1)
54. MinElute PCR Purification Kit (Qiagen Cat# 28004/28006),
Zymo DNA Clean and Concentrator Kit (Zymo Cat# D4013/
D4014), or equivalent

2.4 Buffers and Make all buffers using ultrapure molecular biology-grade ddH2O:
Reagents
1. 2.5M Glycine (50 mL)
9.375 g Glycine (powder)
1× PBS up to 50 mL
Filter through a 0.22 μM filter. Store at room temperature.
2. Tissue Dissociation (MACS) buffer
10 mM Tris-HCl pH 8.0
5 mM CaCl2
5 mM EDTA
3 mM MgAc
0.6 mM DTT
cOmplete Protease Inhibitor
Make fresh every time.
194 Samuel H. Kim et al.

3. Nuclei Isolation Buffer (NIB)


10 mM Tris-HCl pH 7.4
10 mM NaCl
3 mM MgCl2
0.1% IGEPAL CA-630
Store at 4 ∘C.
4. 2× TD buffer
20 mM Tris-HCl pH 7.6
10 mM MgCl2
20% Dimethyl Formamide
Store at - 20∘C.
5. PEG 6000 50%
Mix equal mass of PEG6000 and H2O, heat to 65 ∘C) for
4 min, and then cool down to room temperature.
6. 2× RCB buffer
100 mM Tris pH 8.0
100 mM NaCl
0.40% SDS
Store at room temperature.
7. 2× BW buffer
10 mM Tris pH 8.0
2 M NaCl
1 mM EDTA
Store at 4 ∘C.
8. 1× B & W-T Buffer
5 mM Tris pH 8.0
1 M NaCl
0.5 mM EDTA
0.05% Tween-20
Store at 4 ∘C.
9. Oligo resuspension buffer (IDTE)
10 mM Tris pH 8.0
0.1 mM EDTA
Store at room temperature.
10. Oligo annealing buffer (STE)
10 mM Tris pH 8.0
50 mM NaCl
1 mM EDTA
Store at room temperature.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 195

11. Dilution buffer


50% glycerol
50 mM Tris pH 7.5
100 mM NaCl
0.1 mM EDTA
0.1% NP-40
Store at - 20∘C.

2.5 Software 1. Bowtie [62] (http://bowtie-bio.sourceforge.net/index.


Packages shtml).
2. SAMtools [63]: http://www.htslib.org/
3. PicardTools https://broadinstitute.github.io/picard/
4. UCSC Genome Browser [64, 65] utilities: http://
hgdownload.cse.ucsc.edu/admin/exe/
5. STAR [66] https://github.com/alexdobin/STAR
6. R: https://www.r-project.org/
7. Python (version 2.7 or higher) https://www.python.org/
8. ArchR [67]: https://www.archrproject.com/
9. Seurat [68]: https://satijalab.org/seurat/
10. Additional scripts: https://github.com/georgimarinov/Geo
rgiScripts. Contains python scripts used in the examples shown
below; some of the scripts depend on having pysam (https://
pysam.readthedocs.io/en/latest/index.html) and pyBigWig
(https://github.com/deeptools/pyBigWig) installed.

3 Methods

The general outline of the SHARE-seq assay is shown in Fig. 1. The


first of the two basic ideas behind SHARE-seq and other pool–split-
based assays is to label molecules originating from each cell with a
unique combination of barcodes that are added serially and ran-
domly by pooling cells and then randomly redistributing them
across subsequent sets of barcodes, thus ensuring that statistically
each cell can be identified through a unique combination of bar-
codes. The second is the separation of chromatin and transcriptome
molecules through the use of a biotinylated reverse transcription
(RT) primer, which can then be used for a streptavidin pulldown of
the transcriptome.
In brief, before the beginning of a SHARE-seq experiment, the
needed barcode plates and transposases are prepared and stored.
The experiment itself begins with the isolation of nuclei from cells
in culture or from tissues (see Note 2. Nuclei are then crosslinked,
cells

nuclei isolation and crosslinking

Tagmentation

Tn5 transposase

Reverse transcription
biotin
TTTTTTTTTTTTTT
AAAAAAAAAAAA

3 rounds of pool/split and hybridization

biotin
TTTTTTTTTTTTTT
AAAAAAAAAAAA

Ligation and reverse crosslinking

Biotin pulldown

Supernatant: Beads:

- PCR amplification - cDNA amplification


- ATAClibrary - tagmentation
- PCR amplification
- RNA library

Fig. 1 Outline of the SHARE-seq assay. Nuclei are isolated from cells or tissues and crosslinked. Transposition
is then carried out on chromatin, followed by reverse transcription with a biotinylated RT primer. Three pool–
split rounds of hybridization of barcode oligos are then performed. Hybridized barcodes are then ligated, and
crosslinks are reversed. The ATAC and RNA portions are separated by streptavidin pulldown. The ATAC is
directly amplified, and the RNA is subjected to cDNA amplification, tagmentation, and final library amplification
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 197

usually lightly (see Note 3). Transposition is then carried out,


followed by reverse transcription using a biotinylated RT primer
containing a random unique molecular identifier (UMI). Three
rounds of pool–split hybridization and blocking are then carried
out, after which the hybridized oligos are ligated into single mole-
cules to each other and to the transposed chromatin fragments and
reverse transcribed mRNA. Crosslinks are then reversed, and strep-
tavidin pulldown is used to separate the chromatin from the tran-
scriptome. ATAC libraries are directly amplified from the
supernatant. The transcriptome is first amplified on-beads into
cDNAs, which are then tagmented into sequenceable fragments
and PCR-amplified into final libraries.
The resulting library structures for ATAC and RNA are shown
in Fig. 2. ATAC libraries contain three barcodes, while RNA
libraries also include the UMI. Note that with many Illumina-
based sequencing readouts, the first barcode to be read is actually
the third one added during the pool–split procedure.

3.1 Determining the It is important to carefully track the number of cells going into the
Optimal Cell Number SHARE-seq assays and being retained at each key step of the
procedure. Pool–split assays rely on the statistical uniqueness of
barcode combinations through which cells pass, which in turn
means that having too many cells entering the pool–split procedure
will lead to an unacceptably high rate of doublets (two or more cells
with the same barcode). In the same time, some of the reactions
have an efficiency-imposed limit on the number of cells that can
enter them and need to be distributed into parallel reactions for
optimal results. This applies to the initial transposition and reverse
transcription reactions, as well as to the final amplification, where
the existing protocol is optimized for libraries of size 20,000 cells,
which means that after the final pooling cells are split into separate
subpools of that size and processed into individual sublibraries.
Figure 3 shows the theoretical number of detected cells and
doublet rate for different pool–split setups with three rounds,
accounting for a certain level of cell loss during repeated handling.
Based on these calculations and empirical experience, we usually
start the pool–split rounds with 5× 105 cells for a 96 × 96 ×
96 pool–split experiment.

ATAC
P5 Read 1 R1 linker R2 linker R3 linker P7
5’ AATGATACGGCGACCACCGAGATCTACACTAGATCGCTCGTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG...CTGTCTCTTATACA CCGAGCCCACGAGACTCGGACGATCATGGG CAAGTATGCAGCGCGCTCAAGCACGTGGAT AGTCGTACGCCGATGCGAAACATCGGCCAC
ACATATTCTCTGTC...GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGAGCCTGCTAGTACCCTAGTGCAAGTTCATACGTCGCGCGAGTTCGTGCACCTATAGTGCAATCAGCATGCGGCTACGCTTTGTAGCCGGTGTAGTGCAATAGAGCATACGGCAGAAGACGAAC 5’

Read 2 R1 BC R2 BC R3 BC
RNA
P5 Read 1 UMI R1 linker R2 linker R3 linker P7
5’ AATGATACGGCGACCACCGAGATCTACACTAGATCGCTCGTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG... AAAAAAAAAAAAAAA CCGAGCCCACGAGACTCGGACGATCATGGG CAAGTATGCAGCGCGCTCAAGCACGTGGAT AGTCGTACGCCGATGCGAAACATCGGCCAC
ACATATTCTCTGTC...NVTTTTTTTTTTTTTTTNNNNNNNNNNGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGAGCCTGCTAGTACCCTAGTGCAAGTTCATACGTCGCGCGAGTTCGTGCACCTATAGTGCAATCAGCATGCGGCTACGCTTTGTAGCCGGTGTAGTGCAATAGAGCATACGGCAGAAGACGAAC 5’
Read 2 R1 BC R2 BC R3 BC

Fig. 2 Structure of final SHARE-seq libraries. ATAC (top) and RNA (bottom). Dots represent the actual library
insert
198 Samuel H. Kim et al.

0.20
0.19 96 x 96 x 96
0.18 384 x 96 x 96
0.17
0.16 384 x 384 x 384
0.15
0.14
Doublet fraction

0.13
0.12
0.11
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00

10 4 10 5 10 6 10 7 10 8
Number detected cells

Fig. 3 Combinatorial indexing and SHARE-seq’s throughput. Shown is the number of cells that can be detected
at a given doublet rate; the pool–split process was simulated as a random Poisson loading at a 50% loss of
cells during each pool–split round

3.2 Annealing of In this step, barcode containing oligonucleotides for each round of
Oligo Plates split–pool is annealed and distributed into 96-well plates prior to
the actual assay. These plates can be stored at - 20∘C indefinitely. It
is advisable for the purposes of time saving to prepare sufficiently
many such plates in advance to support multiple experiments. It is
critical to thaw these plates to room temperature prior to use.
See Note 4.
1. Dilute Round 1 linker oligos (120 μL at 1 mM concentration)
with 11,880 μL STE buffer.
2. Mix 90 μL diluted Round 1 linker oligo with 10 μL Round
1 oligo (at 100 μM) in the wells of a multiwell plate.
3. Dilute Round 2 linker oligos (120 μL at 1 mM concentration)
with 9480 μL STE buffer.
4. Mix 88 μL diluted Round 2 linker oligo with 12 μL Round
2 oligo (at 100 μM) in the wells of a multiwell plate.
5. Dilute Round 3 linker oligos (144 μL at 1 mM concentration)
with 9360 μL STE buffer.
6. Mix 86 μL diluted Round 3 linker oligo with 14 μL Round
3 oligo (at 100 μM) in the wells of a multiwell plate.
7. Anneal the Round 1, Round 2, and Round 3 plates as follows in
a thermocycler:
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 199

2 min at 95 ∘C
Slow ramp at - 1∘C per minute to 20 ∘C
2 min at 20 ∘C
Indefinitely at 4 ∘C
8. Check if there has been significant water evaporation for wells
situated at the corners. If yes, add water to equalize volumes.
9. Aliquot 10 μL of the annealed oligos to new plates. This should
be enough for 9 experiments. Store these plates at - 20∘C.

3.3 Anneal Adapter In this step, Tn5 adapters are prepared for both transposition of
Oligos chromatin and tagmentation during cDNA library preparation:
1. Dilute the Phosphorylated Read2, Read1, and Blocked ME
Comp oligos to a 100 μM concentration with the IDTE buffer.
2. Prepare the transposition adapter mix in a PCR tube as follows:
6.5 μL 100 μM Phosphorylated Read2 oligo
6.5 μL 100 μM Read1 oligo
13 μL 100 μM Blocked ME Comp oligo
0.26 μL 1 M Tris pH 8.0
0.26 μL 5 M NaCl
3. Prepare the tagmentation adapter mix in a PCR tube as follows:
13 μL 100 μM Read1 oligo
13 μL 100 μM Blocked ME Comp oligo
0.26 μL 1 M Tris pH 8.0
0.26 μL 5 M NaCl
4. Anneal oligos as follows in a thermocycler:
2 min at 85 ∘C
Slow ramp at - 1∘C per minute to 20 ∘C
2 min at 20 ∘C
Indefinitely at 4 ∘C

5. Heat glycerol to 65 C, and then equilibrate to room
temperature.
6. Mix 25 μL glycerol with 25 μL of annealed oligo.
The annealed adapters can be immediately used or stored at -
20∘C.

3.4 Transposome In this step, Tn5 transposomes are assembled together with the
Assembly annealed adapter oligos:
1. Assemble Tn5 transposomes by mixing the following
components:
200 Samuel H. Kim et al.

0.625× N 1× home-made Tn5


0.625× N dilution buffer
1.25× N annealed transposition adapter with glycerol
Total volume: 2.5× N
2. Incubate at room temperature for 30 min.
The assembled transposome can be stored at - 20∘C for up
to 2 weeks.

3.5 Tissue Here, we describe an example tissue dissociation protocol that has
Dissociation worked successfully in our hands for several human embryonic
tissues. However, users should be aware that generally each tissue
requires separate optimization of dissociation conditions, and it is
likely that a different protocol will have to be adapted in most
situations.
1. Set swing bucket centrifuge to 4 ∘C Fast Temp and thaw
1M DTT.
2. Transfer tissue samples onto dry ice.
3. Prepare MACS buffer (2 mL for each sample) as described
above. Make sure the buffer is cold on ice.
4. Add 10 μL Protector RNase Inhibitor for each 1 mL in Gen-
tleMACS M-tubes. Add 1 mL of MACS buffer to each Gentle-
MACS M-tube and chill on ice.
5. Transfer 30–50 mg of tissue into each GentleMACS M-tube
containing 1 mL MACS buffer.
6. Allow the tissue to thaw in buffer. Transition to a cold room.
7. Homogenize using a Protein_01_01 dissociation protocol
on a GentleMACS Tissue Dissociator instrument.
8. Filter the homogenate through 30 μm CellTrics filter into a
2mL DNA LoBind tube by pipetting directly onto the top of
the filter and gently tapping to allow flow.
9. Wash the GentleMACS M-tube with 1 mL MACS buffer and
filter the wash again through the 30 μm CellTrics filter.
10. Spin down the homogenate in a swing bucket centrifuge at
500 g for 5 min at 4 ∘C (ramp up and down both at 3/9).
11. Remove and discard supernatant.
12. Resuspend in 1mL PBS-2RI.
13. Count cells/nuclei and proceed with a desired number of
cells/nuclei.

3.6 Fixation of Cells The next step, if starting with a dissociated tissue, is to fix the
in Culture and of nuclei. This is also the first step if starting with cells in culture.
Dissociated Nuclei The procedure used is generally the same, with the difference that
from Tissue with nuclei the first step is directly the fixation:
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 201

1. Prepare PBS-2RI Buffer (4 mL) by mixing the following:


4 mL 1× PBS
21.4 μL 7.5% BSA
10 μL Enzymatic RI
5 μL SUPERase RI
Keep on ice.
2. Prepare NIB-RI Buffer (8 mL) by mixing the following:
8 mL NIB
20 μL Enzymatic RI
20 μL SUPERase RI
Keep on ice.
3. Spin down cells at 500 g.
4. Wash cells with 0.5 ml PBS-2RI.
5. Count cells with Trypan blue.
6. Resuspend cells with cold PBS-2RI at a concentration of
1 × 106 cells/mL.
7. For each 1 mL of cells in PBS-2RI, add 66.7 μL of 1.6% FA
(final concentration 0.1% FA) for cells or 66.7 μL of 3.2% FA
for tissues. Mix and incubate at room temperature for 5 min.
8. Quench the reaction by adding to each 1 mL of cells in
PBS-2RI the following:
56.1 μL 2.5 M Glycine
50 μL 1M Tris pH 8.0
13.3 μL of 7.5% BSA
Mix well and incubate on ice for 5 min.
9. Spin down at 500 g. Remove supernatant, and add 0.5 mL
PBS-2RI without disturbing the cell pellet.
10. Prepare RSB-RI by mixing the following:
2.5 μL 1 M Tris-HCl pH 7.5
0.5 μL 5 M NaCl
0.75 μL 1 M MgCl2
2.5 μL 10% Tween-20
2.5 μL 10% NP-40
2.5 μL 1% Digitonin
33.3 μL 7.5% BSA
0.25 μL 1 M DTT
204 μL Ultrapure water
1.25 μL Enzymatic RI
202 Samuel H. Kim et al.

11. Spin down again at 500 g. Remove supernatant, and resuspend


cells in 100 μL RSB-RI and incubate on ice for 3 min.
12. Prepare RSB-T by mixing the following:
25 μL 1 M Tris-HCl pH 7.5
5 μL 5 M NaCl
7.5 μL 1 M MgCl2
25 μL 10% Tween-20
333.3 μL 7.5% BSA
2.5 μL 1 M DTT
2089.5 μL Ultrapure water
12.5 μL Enzymatic RI
13. Pipette 1 mL of RSB-T to cells and mix. Spin down at 500 g for
5 min.

3.7 ATAC Reaction In this step, transposition of the entire sample is performed by
splitting it into 10,000–20,000 cells in 50-μL reactions each in a
96-well plate. The smaller volume and the number of cells per
reaction improve the quality of transposition.
The cell lysis conditions described here are adapted from the
omniATAC bulk ATAC protocol [22] (see Note 7):
1. Prepare PBS-RI by mixing the following:
800 μL PBS
2 μL Enzymatic RI
2. After the last centrifugation, remove supernatant and resus-
pend the cells with PBS-RI to 2× 106 cells/mL.
3. Prepare 2× TB buffer (sufficient for 96 reactions) by mixing the
following:
874.5 μL 0.2 M Tris-acetate
70 μL 5 M Potassium acetate
53 μL 1 M Magnesium acetate
53 μL 10% Tween-20
53 μL 1% Digitonin
848 μL 100% DMF
698.5 μL H2O
4. Prepare 1× TB buffer according to the number of reactions
N to be carried out. N = 1 corresponds to 104 input cells.
25× N 2× TB
16.45× N H2O
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 203

0.2× N PIC
0.85× N Enzymatic RI
Total volume: 42.5× N.
5. Aliquot 5× NμL of the diluted cells to a new tube, e.g., for
10× 105 cells, N = 10, so aliquot 50 μL cells to a new tube.
6. Add 42.5× NμL 1× TB to sample.
7. Add 2.5× NμL of assembled Tn5 to sample. Mix well.
8. Aliquot 50 μL of sample in the wells of a 96- or 384-well plate.
9. Seal the plate and incubate with shaking at 500 rpm for 30 min
at 37 ∘C.
10. Pool the reactions and spin down at 500 g.
11. Add 0.5 mL NIB-RI without disturbing the pellet and spin
down again at 500 g.
12. Resuspend the cells in 60 μL EB.

3.8 Reverse In this step, reverse transcription is performed in situ. The condi-
Transcription tions are optimized for 1×105 cells entering each 50-μL reaction:
1. Prepare the reverse transcription (RT) mix (sufficient for 6 reac-
tions) as follows:
70 μL 5× RT buffer
2.19 μL Enzymatics RNase Inhibitor
4.38 μL SUPERase RI
17.5 μL dNTPs
35 μL RT Primer
10.94 μL H2O
105 μL 50% PEG
35 μL Maxima H Minus Reverse Transcriptase (add right
before RT reaction)
Total volume: 280 μL.
2. Add 240 μL RT mix to 60 μL cells in EB.
3. Aliquot 50 μL to 6 PCR wells.
4. Start thawing the oligo plates, while the RT is ongoing.
5. Run the reverse transcription reaction in a thermocycler as
follows:
50 ∘C for 10 min
3 cycles of:
8 ∘C for 12 s
15 ∘C for 45 s
20 ∘C for 45 s
204 Samuel H. Kim et al.

30 ∘C for 30 s
42 ∘C for 2 min
50 ∘C for 3 min
50 ∘C for 5 min
6. Pool samples and mix with 500 μL NIB-RI.
7. Spin down at 500 g.
8. Wash with 1000 μL NIB.
9. Spin down at 500 g.
10. Resuspend with 1152 μL NIB-RI.

3.9 Hybridization– In this step, cells/nuclei are iteratively split into individual wells to
Ligation and Pool–Split dynamically create a combinatorial index statistically unique to each
cell. All handling is performed at room temperature so make abso-
lutely sure that oligo plates have been fully thawed before
proceeding.
If different samples are multiplexed in a single run, they can be
individually identified based on the first-round barcodes. If such a
strategy is deployed, each sample needs to be processed through
transposition and reverse transcription separately and then loaded
into specified positions in the first-round plate(s).
1. Prepare 3456 μL hybridization buffer as follows:
2761.9 μL H2O
576 μL 10× T4 ligase buffer
14.4 μL SUPERase RI 20 U/μL
46.08 μL Enzymatics RI 40 U/μL
57.60 μL 10% NP40
2. Mix 1152 μL of sample with 3456 μL hybridization buffer.
Keep the sample at RT.
3. Aliquot 40 μL of mixture to a Round 1 plate.
4. Mix and shake at 300 rpm for 30 min at RT.
5. Prepare 1152 μL Blocking Oligo 1 mix as follows:
253.4 μL 100 μM Round 1 blocking oligo
211.2 μL 10× T4 DNA Ligase buffer
687.4 μL H2O
6. Add 10 μL Blocking Oligo 1 mix to each well.
7. Mix and shake at 300 rpm for 30 min at RT.
8. Pool samples from all wells.
9. Aliquot 50 μL of mixture to a Round 2 plate.
10. Mix and shake at 300 rpm for 30 min at RT.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 205

11. Prepare 1152 μL Blocking Oligo 2 mix as follows:


304.1 μL 100 μM Round 2 blocking oligo
211.2 μL 10× T4 DNA Ligase buffer
636.7 μL H2O
12. Add 10 μL Blocking Oligo 2 mix to each well.
13. Mix and shake at 300 rpm for 30 min at RT.
14. Pool samples from all wells.
15. Aliquot 60 μL of mixture to a Round 2 plate.
16. Mix and shake at 300 rpm for 30 min at RT.
17. Prepare 1152 μL Blocking Oligo 3 mix as follows:
265.0 μL 100 μM Round 3 blocking oligo
11.5 μL 10% NP-40
875.5 μL H2O
18. Add 10 μL Blocking Oligo 1 mix to each well.
19. Mix and shake at 300 rpm for 30 min at RT.
20. Pool samples from all wells.
21. Spin down at 500 g 5 min.
22. Wash with 1 mL NIB-RI.
23. Spin down at 500 g 5 min.
24. Resuspend in 80 μL NIB-RI.
25. Prepare 320 μL Ligation mix as follows:
3.2 μL Enzymatics RI
1.00 μL SUPERase RI
40 μL 10× T4 DNA Ligase Ligation buffer
20 μL T4 DNA Ligase 400 U/μL
251.8 μL H2O
4 μL 10% NP40
26. Mix sample with the 320 μL Ligation mix.
27. Aliquot 8× 50 μL in PCR tubes.
28. Shake at 300 rpm for 30 min at RT.
29. Pool samples from all tubes.
30. Spin down at 500 g 5 min.
31. Wash with 1 mL NIB-RI.
32. Spin down at 500 g 5 min.
33. Resuspend in 400 μL NIB-RI.
34. Count the number of nuclei.
206 Samuel H. Kim et al.

Note: If fewer cells are preferred per sub-library, count cells to


desired concentration and add more NIB to make the volume up to
50 μL per sub-library.

3.10 Reverse In this step, cells are reverse crosslinked to release DNA from the
Crosslinking bound proteins so that the ATAC libraries can be amplified. As the
crosslinking is relatively gentle (at 0.1 or 0.2%), a milder reverse
crosslinking condition of 1 h incubation at 55 ∘C is generally
sufficient.
Further reverse crosslinking optimization might be needed if
the crosslinking protocol has been modified:
1. For each N of 50-μL sub-library, add the following:
50 μL 2× RCB
2 μL Proteinase K
1 μL SUPERase RI
2. Incubate at 55 ∘C for 1 h.
3. Add 5 μL 100 mM PMSF/IPA.
4. Incubate at room temperature for 10 min.
Note: this is an optional stopping point. The reverse cross-
linked product can be stored at - 80∘C for a few days.

3.11 Pulldown In this step, the cDNA is separated from the transposition products
by pulling down on the biotin that is part of the reverse transcrip-
tion primer. The supernatant constitutes the transposition products
and is processed separately from the cDNA:
1. Prepare 1× B & W-T/RI buffer by mixing the following:
400× (N + 1) μL 1× B & W-T buffer
4× (N + 1) μL SUPERase RI
2. Prepare 1× B & W/RI buffer by mixing the following:
100× (N + 1) μL 1× BW buffer
2× (N + 1) μL SUPERase RI
3. Prepare 1× STE/RI buffer by mixing the following:
200× (N + 1) μL 1× STE buffer
N + 1 μL SUPERase RI
4. In a fresh tube, mix 10× NμL MyOne C1 Dynabeads with
100× NμL 1× B & W-T.
5. Separate on a magnetic rack and remove supernatant.
6. Wash twice with 100× NμL B & W-T without RI.
7. Wash once with 100× NμL B & W-T/RI.
8. Resuspend beads in 100× NμL 2× B & W/RI.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 207

9. Add 100 μL beads to each sample.


10. Incubate at room temperature on a rotator for 60 min.
11. Place the tube on a magnetic rack.
12. Transfer the supernatant (which contains chromatin frag-
ments) to a new tube for ATAC library preparation. The
ATAC fragments are stable for a few hours at room tempera-
ture and can be processed concurrently or after cDNA library
construction is complete.
13. Wash cDNA/RNA-bound beads three times with 100 μL 1× B
& W-T/RI.
14. Wash with 100 μL 1× STE/RI without resuspending beads.

3.12 ATAC Library In this step, ATAC fragments are purified and amplified into a final
Preparation library ready for sequencing:
1. Clean up the ATAC part of the sample using Zymo DNA Clean
and Concentrate. Elute in 11 μL EB buffer, and then elute
again with additional 11 μL EB buffer (a total of 22 μL EB
buffer).
2. Prepare ATAC PCR Master Mix by mixing the following:
225 μL 2× NEBnext Master Mix
9 μL P7 primer 25 μM
27 μL H2O
3. Mix the following:
20 μL sample
29 μL ATAC PCR Master Mix
1 μL of 25 μM Adapter 1 Primer (from the PCR Library
indexing primers plate)
4. Run PCR for 5 cycles as follows:
72 ∘C for 5 min
98 ∘C for 30 s
5 cycles of:
98 ∘C for 10 s
65 ∘C for 30 s
72 ∘C for 30 s
5. Determine additional cycles using qPCR. Add 5 μL of the
pre-amplified reaction to 10 μL qPCR Master Mix for a total
qPCR reaction of 15 μL as follows:
5 μL NEBnext Master Mix
0.2 μL 25 μM Adapter 1.1
0.2 μL 25 μM P7
208 Samuel H. Kim et al.

0.9 μL 10x SYBR Green


3.7 μL H2O
6. Assess the amplification profiles and determine the required
number of additional cycles to amplify. Please refer to Figure 2
in Buenrostro et al. [25].
7. Carry out final amplification by placing the remaining 45 μL in
a thermocycler and running the following program:
Nadd cycles of:
98 ∘C for 10 s
65 ∘C for 30 s
72 ∘C for 30 s
where Nadd is the number of additional cycles.
8. Clean up the final library using Zymo DNA Clean & Concen-
trate, eluting in 15 μL.

3.13 RNA Library In this step, RNA library generation is initiated by carrying out
Preparation Step 1. template switching on the pulled down cDNA:
Template Switching
1. Prepare the Template switch mix by mixing the following:
11.25 μL H2O
125 μL 50% PEG 6000
90 μL 5× Maxima RT buffer
90 μL Ficoll PM-400 (20%)
45 μL 10 mM dNTPs
45 μL RNase inhibitor (Lucigen)
11.25 μL 100 μM TSO oligo
22.5 μL Maxima RT Rnase H Minus (add last right before
reaction)
2. Remove all supernatant. Be careful to avoid drying the beads.
3. Resuspend beads in 50 μL Template switch mix.
4. Incubate samples for 30 min at room temperature with
rotation.
5. Incubate samples for 90 min at 42 ∘C at 300 rpm. Resuspend
every 30 min by pipetting up and down.

3.14 RNA Library The next step is to amplify the individual cDNA molecules.
Preparation Step 2.
1. Prepare cDNA PCR Mix by mixing the following:
Amplification of cDNA
247.5 μL KAPA HiFi 2× mix
7.92 μL 25 μM RNA PCR primer
7.92 μL 25 μM P7 primer
231.7 μL H2O
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 209

2. Mix samples with 100 μL H2O.


3. Separate beads on magnet. Wash with 200 μL STE without
resuspending the beads.
4. Mix beads with 55 μL cDNA PCR Mix and transfer to PCR
tubes/plates.
5. Run PCR as follows:
95 ∘C for 3 min
5 cycles of:
98 ∘C for 20 s
65 ∘C for 45 s
72 ∘C for 3 min
6. Determine additional cycles using qPCR. Add 2.5 μL of the
pre-amplified reaction to 7.5 μL qPCR Master Mix in a total
qPCR reaction of 10 μL as follows:
3.75 μL KAPA HiFi 2× mix
0.12 μL 25 μM RNA PCR primer
0.12 μL 25 μM P7 primer
0.5 μL 20x EVAgreen
3.01 μL H2O
7. Determine additional cycles as described above for ATAC
libraries.
5 cycles of:
98 ∘C for 20 s
65 ∘C for 45 s
72 ∘C for 3 min
8. Purify using SPRI beads. Mix the reaction with 0.8× volume of
SPRI beads and incubate at room temperature for 10 min.
Separate the beads on magnet and wash twice with 200 μL
freshly prepared 70% EtOH. Make sure to remove all liquid,
and elute in 20 μL.
9. Optional: check size of the cDNA using the D5000
TapeStation.

3.15 RNA Library The next step is to tagment the amplified cDNA, which will prepare
Preparation Step 3. it for the final library amplification step:
Tagmentation
1. Quantify cDNA concentration using Qubit.
2. Dilute cDNA to a concentration of 5 ng/μL for tagmentation.
Note: Expect more than 50 ng cDNA. If cDNA amount is
low, it can get away with tagmenting 20 ng cDNA; in this case,
adjust the volume of H2O and cDNA accordingly.
210 Samuel H. Kim et al.

3. Prepare tagmentation transposome by mixing the following:


11.25 μL 1× Tn5
11.25 μL Dilution Buffer
22.5 μL annealed tagmentation adapter with glycerol
4. Mix the following:
10 μL 5 ng/μL cDNA
10 μL H2O
25 μL 2× TD buffer
5 μL assembled Tn5
5. Incubate for 5 min at 55 ∘C.
6. Purify tagmented library using the Zymo kit (use 250 μL bind-
ing buffer). Elute twice with 11 μL EB (a total of 22 μL).

3.16 RNA Library Final libraries are generated by PCR.


Preparation Step 4.
1. Prepare post-tagmentation PCR mix by mixing the following:
Final Amplification
20 μL sample
25 μL 2× NEB Next Master Mix
1 μL 25 μM P7 primer
1 μL 25 μM Adapter 1 Primer (from the PCR Library indexing
primers plate)
3 μL H2O
2. Run PCR as follows:
72 ∘C for 5 min
9 cycles of:
98 ∘C for 10 s
65 ∘C for 30 s
72 ∘C for 60 s

3.17 Library Before libraries can be sequenced, they need to be properly quanti-
Quantification and fied and be subjected to quality evaluation. This is done by first,
Evaluation of Library evaluation of the insert distribution, and second, quantification:
Quality 1. Examination of library size distribution. This step can be car-
ried out using several different instruments, such as a TapeSta-
tion or a BioAnalyzer. We prefer to use a TapeStation (with the
D1000 or HS D1000 kits) due to flexibility, ease of use, and
rapid turnaround time.
2. Quantification of library concentration. For most high-
throughput sequencing applications, this step is standardly
carried out using a Qubit fluorometer. While this works well
for libraries with a unimodal fragment-length distribution,
ATAC libraries typically exhibit a multimodal fragment
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 211

distribution and also often contain fragments of length higher


than what can be sequenced on standard Illumina instruments.
As a result, effective library concentrations often differ from
apparent library concentrations measured using Qubit, and the
optimal way for estimating effective library concentration
is qPCR.
3. Estimation of effective library concentration using qPCR.
Standard Illumina library quantification kits can be used to
quantify the concentration of the library that will be able to
be sequenced. Products from NEB or KAPA are appropriate for
this use.

3.18 Sequencing The protocol described here generates libraries designed to be


sequenced on Illumina sequencers, the most widely available of
which is the NextSeq. On a NextSeq, SHARE-seq libraries are
sequenced as follows using a 150-cycle kit:
For the RNA libraries, use a 50 bp × 10 bp × 99 bp × 8 bp
configuration (Read 1 × Read 2 × Index1 × Index2, respectively).
For the ATAC libraries, use a 30 bp × 30 bp × 99 bp × 8 bp
configuration (Read 1 × Read 2 × Index1 × Index2, respectively).
For RNA, the 10bp of Read 2 captures the UMI, and the 50 bp
captures the actual RNA sequence.
For ATAC, fragments are sequenced in a 2 × 30 bp format.
The 8 bp of Index 2 captures the library barcode (if more than
one library is sequenced in a single run). The 99 bp of Index
1 captures the pool–split barcodes.
For other Illumina instruments, different configurations can be
used. For example, using a 200-cycle kit on NovaSeq, run ATAC
libraries in 55 bp × 55 bp × 99 bp × 8 bp configuration and RNA
libraries in a 100 bp × 10 bp × 99 bp × 8 bp configuration.
An important consideration to take into account before
sequencing is that the standard Illumina run recipes do not allow
for the 99-bp index read configuration that is necessary for
SHARE-seq libraries. This necessitates the creation of custom
recipes in which the limits on the length of the index reads are
increased accordingly. However, different methods for creating
these custom recipes are necessary depending on the Illumina
instrument used and the versions of the control software that the
machine is equipped with; resolving this issues may on occasions
require seeking help from Illumina’s customer support service.

4 Computational Processing

At present there is no standard tool for analyzing pool–split-based


multiomics datasets. The pipeline presented here is the one we have
been using in our practice. Its objective is to take the raw SHARE-
212 Samuel H. Kim et al.

ATAC FASTQ RNA FASTQ

barcode assignment

ATAC alignment UMI assignment

filtering and RNA alignment


deduplication

cell assignment gene quantification

downstream analysis downstream analysis


(ArchR) (Seurat)

joint scATAC/RNA
analyses

Fig. 4 Outline of the SHARE-seq computational processing procedures. As a first step, cell barcodes are
annotated for all reads in both ATAC and RNA FASTQ files. Subsequently, UMIs are consolidated and assigned
to reads in the RNA set. RNA reads are then aligned against the genome, and gene expression is quantified in
single cells, resulting in a final data matrix that can be analyzed in Seurat (or other scRNA-seq) tools. ATAC
reads are aligned against the genome, filtered (removing mitochondria-mapping reads), and deduplicated
within each barcode. Alignments are then annotated with their cell barcodes and can be used as input for
further analysis in ArchR. Further joint analysis of the ATAC and RNA can be carried out downstream

seq reads and to produce object that can be used for further analysis
with established tools for scRNA-seq/scATAC-seq processing such
as Seurat and ArchR (e.g., sparse matrices and BAM files). The
outline of the processing is shown in Fig. 4. For both ATAC and
RNA, reads are first assigned their cellular barcodes. RNA reads are
additionally annotated with the sequenced UMIs. RNA reads are
aligned against the genome, a quantification is carried out for each
gene in each cell, and a final sparse matrix is created. For ATAC,
reads are mapped against the genome, then filtered, and dedupli-
cated within each cell, and a final BAM file with cellular barcodes
appended to each alignment is created.

4.1 RNA 1. As a first step in the RNA processing, annotated barcodes for
each read pair, using the SHARE-seq-barcode-annotate.
py script.

python SHARE-seq-barcode-annotate.py
BC1file fieldID pos1 lenBC1 BC2file
fieldID2 pos2 lenBC2 BC3file fieldID3
pos3 lenBC3 [-BCedit N] [-revcompBC]
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 213

The script is flexible and can be used to assign barcodes to


almost any kind of pool–split experiment in which the indexes
are in Index Read 1. It takes as input files containing the
barcodes for each round of pool–split and the column positions
of the barcode sequences in each file (0-based), their position in
Index Read 1 (0-based), their length, and their orientation (use
the [-revcompBC] option if the sequences are reverse com-
plement, depending on the exact format of the sequencing).
Use the [-BCedit] option to increase/decrease the strin-
gency of matching barcode sequences to the master list (the
default value is 1). In this case, the barcode files are in the
following format:

#WellPosition Name Sequence


A1 Round1_01 AACGTGAT
B1 Round1_02 AAACATCG
C1 Round1_03 ATGCCTAA
[...]

And barcodes are assigned in a single step as follows:

python PEFastqToTabDelimited.py RNA.end1.fastq.gz


RNA.end2.fastq.gz | python SHARE-seq-barcode-annotate.py
Plate_R1.tsv 2 15 8 Plate_R2.tsv 2 53 8 Plate_R3.tsv
2 91 8 -revcompBC
| PEFastqToTabDelimited-reverse.py -
RNA.barcodes_annotated

This will produce FASTQ files with headers looking as


follows:

@[readID]:::[GTTAGCCT+TAGTCTTG+TACCGAGC] 1:N:0:
TGGGGNCACAGAGCCAAACCATATCAGCTG
+
AAAAA#EEEEEAEEEEEEEEEEEEEEEEEE

In which barcode combinations have been appended to the


read headers, with nan if no matching barcode was found due
to sequencing errors or other issues, e.g.:

@[readID]:::[GACGGATT+GATAGAGG+nan] 1:N:0:
ACCAANCTGTGCACAAGCGTGAATCAACCT
+
6AAAA#E/EEEEEEEEEAEEEEEEEEEEEE

Note that it is considerably faster to split the FASTQ files


into smaller pieces and process them in parallel.
214 Samuel H. Kim et al.

2. Compress the output files:

gzip RNA.barcodes_annotated.barcodes_annotated.end1.fastq
gzip RNA.barcodes_annotated.barcodes_annotated.end1.fastq

3. Annotated UMIs using the SHARE-seq-RNA-UMI-Add.py


script, which is also flexible and can read UMIs of different
lengths in each read in the pair:

python SHARE-seq-RNA-UMI-Add.py UMIlen read1|read2

As follows:

python PEFastqToTabDelimited.py
RNA.barcodes_annotated.end1.fastq.gz
RNA.barcodes_annotated.end2.fastq.gz |
python SHARE-seq-RNA-UMI-Add.py 10 read2 |
python PEFastqToTabDelimited-reverse.py -
RNA.barcodes_annotated.RNA_UMI

This step will append the UMI sequence to the cell bar-
codes in the read ID:

@[readID]:::[TGACCACT+GGTCGTGT+TGCTGATA+TTTATGATAG]
CCTCTNGCTCAGCCTATATACCGCCATCTTCAGCAAACCCTGATGAAGGC
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEE

4. Compress the output files:

gzip RNA.barcodes_annotated.RNA_UMI.end1.fastq
gzip RNA.barcodes_annotated.RNA_UMI.end2.fastq

5. Merge the individual files:

cat RNA_.barcodes_annotated.RNA_UMI.end1.fastq >


RNA.barcodes_annotated.RNA_UMI.end1.fastq.gz
cat RNA_.barcodes_annotated.RNA_UMI.end2.fastq >
RNA.barcodes_annotated.RNA_UMI.end2.fastq.gz

6. Align the Read 1 FASTQ file against the genome using STAR
as follows (the commands given here use the standard
ENCODE Project Consortium[69] STAR settings):

STAR --limitSjdbInsertNsj 10000000 --genomeDir genome/STAR


--outFileNamePrefix RNA.end1.STAR/
--readFilesIn RNA.barcodes_annotated.RNA_UMI.end1.fastq.gz
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 215

--runThreadN 20 --outSAMunmapped Within --outFilterType


BySJout --outSAMattributes NH HI AS NM MD
--outFilterMultimapNmax 50 --outSAMstrandField intronMotif
--outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax
0.04 --alignIntronMin 10 --alignIntronMax 1000000
--alignMatesGapMax 1000000 --alignSJoverhangMin 8
--alignSJDBoverhangMin 1 --sjdbScore 1 --readFilesCommand
zcat --outSAMtype BAM SortedByCoordinate --outWigStrand
Stranded --twopassMode Basic --twopass1readsN -1
--limitBAMsortRAM 500000000000

7. Index the output BAM file:

samtools index
RNA.end1.STAR/Aligned.sortedByCoord.out.bam

8. Calculate global mapping statistics:

python SAMstats.py
RNA.end1.STAR/Aligned.sortedByCoord.out.bam
SAMstats-RNA.end1.STAR.hg38
-bam genome.chrom.sizes samtools

This script will output the number of mapped reads in


various categories (uniquely mapping, spliced, etc.) as well as
the molecular complexity of the alignment.
9. Calculate read distribution relative to the genome annotation:

python SAM_reads_in_genes3_BAM.py annotation.gtf


RNA.end1.STAR/Aligned.sortedByCoord.out.bam
genome.chrom.sizes
sam_reads_genes-RNA.end1.STAR -nomulti

This script will output the fraction of exonic, intronic, and


intergenic reads. This is important information for single-cell
assays for evaluating to what extent the cytoplasm (which is
enriched for exonic reads relative to the nucleus) is captured in
the final libraries.
10. Make a RPM-normalized (Reads Per Million mapped reads)
global coverage track:

python makewigglefromBAM-NH.py title


RNA.end1.STAR/Aligned.sortedByCoord.out.bam
genome.chrom.sizes
RNA.end1.STAR/Aligned.sortedByCoord.out.wig -RPM
216 Samuel H. Kim et al.

11. Evaluate read coverage along transcripts:

python gene_coverage_wig_gtf.py annotation.gtf


RNA.end1.STAR/Aligned.sortedByCoord.out.wig
1000 coverage-RNA -normalize -singlemodelgenes

This script run with these settings will output the average
read profile over all genes with only a single transcript anno-
tated (in order to avoid confounding by the presence of multi-
ple isoforms) and ≥1000 bp in length. Use a simple annotation
with few isoforms, such as refSeq to get as many genes meeting
these requirements as possible.
12. Calculate UMI counts per gene and per cell barcode combina-
tion using the SHARE-seq_RNA_counts.py. For faster pro-
cessing, run this on each chromosome in parallel, as follows
(shown is chr1):

python SHARE-seq_RNA_counts.py
RNA.end1.STAR/Aligned.sortedByCoord.out.bam
annotation.gtf.chr1 genome.chrom.sizes
RNA.SHARE-seq_RNA_counts.chr1 -UMIedit 1

The [-UMIedit] option can be used to tweak the level of


UMI collapsing (in this case UMIs within an edit distance of
1 from each other will be collapsed into a single UMI).
13. Calculate per-cell statistics by merging the individual outputs
using the SHARE-seq-RNA-BC-sum-across-files.py
script as follows:

python SHARE-seq-RNA-BC-sum-across-files.py
list_of_per_chromosome_outputs
RNA.SHARE-seq_RNA_counts.UMIs_per_cell

This will output a file in the following format:

#BC1+BC2+BC3 rank3 UMIs3 Aligned Positions genes


GCCAATGT+CAGATCTG+TAACGCTG 1 64660 171969 8369
GTTGTCGG+TAAGCGTT+GATCAGCG 2 47079 123008 7864
TGACCACT+GGTCGTGT+TGCTGATA 3 45034 109960 7652

which shows the number of UMIs and the number of


detected genes for each cell barcode combination.
14. Extract cell barcode combinations above a desired threshold,
e.g., ≥500 UMIs into a separate file.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 217

15. Create final sparse matrix format files that can be used as input
to Seurat for further analysis with the SHARE-seq-RNA-
UMIs-sum-across-files.py script:

python SHARE-seq-RNA-UMIs-sum-across-files.py
list_of_per_chromosome_outputs
RNA.SHARE-seq_RNA_counts.UMIs_per_cell.min500 0
RNA.SHARE-seq_RNA_counts.UMIs_per_cell.min500.sparse
-sparse

4.2 ATAC The first steps of the ATAC processing are analogous to those of the
RNA pipeline:
1. First, annotate cellular barcodes:

python PEFastqToTabDelimited.py
ATAC.end1.fastq.gz ATAC.end2.fastq.gz |
python SHARE-seq-barcode-annotate.py
Plate_R1.tsv 2 15 8 Plate_R2.tsv 2 53 8 Plate_R3.tsv 2
91 8 -revcompBC |
PEFastqToTabDelimited-reverse.py -
ATAC.barcodes_annotated

Note as before that it is considerably faster to split the


FASTQ files into smaller pieces and process them in parallel.
2. Compress the output files:

gzip ATAC.barcodes_annotated.end1.fastq
gzip ATAC.barcodes_annotated.end1.fastq

3. Merge the individual files:

cat ATAC_.barcodes_annotated.end1.fastq >


ATAC.barcodes_annotated.end1.fastq.gz
cat ATAC_.barcodes_annotated.end2.fastq >
ATAC.barcodes_annotated.end2.fastq.gz

4. Align reads against the mitochondrial genome with Bowtie as


follows:

python PEFastqToTabDelimited.py
ATAC.barcodes_annotated.end1.fastq.gz
ATAC.barcodes_annotated.end2.fastq.gz -trim 30 30 |
bowtie bowtie-indexes/chrM -p 20 -v 2 -a -t --best
--strata -q -X 1000 --sam --12 - |
samtools view -F4 -bT genome.fa - |
samtools sort - ATAC.2x30mers.chrM
218 Samuel H. Kim et al.

This step is for the purpose of evaluating the extent of


mitochondrial contamination in the overall library.
5. Align reads against the full genome with Bowtie and filter out
mitochondrial reads as follows:

python PEFastqToTabDelimited.py
ATAC.barcodes_annotated.end1.fastq.gz
ATAC.barcodes_annotated.end2.fastq.gz
-trim 30 30 | bowtie bowtie-indexes/genome
-p 20 -v 2 -k 2 -m 1 -t --best --strata -q
-X 1000 --sam --12 - | egrep -v chrM |
samtools view -F4 -bT genome.fa - | samtools sort -
ATAC.2x30mers.unique.nochrM

Adjust accordingly if working a genome in which the mito-


chondrial chromosome/contigs are named differently or there
are multiple contigs to be filtered out (e.g., in plants where
there is also a plastid in addition to the mitochondrion).
6. Index the resulting BAM files.

samtools index ATAC.2x30mers.unique.nochrM.bam


samtools index ATAC.2x30mers.chrM.bam

7. Calculate mapping statistics for the two sets of alignments.

python SAMstats.py ATAC.2x30mers.chrM.bam


SAMstats-ATAC.2x30mers.chrM
-bam genome.chrom.sizes samtools
-paired -noNHinfo
python SAMstats.py ATAC.2x30mers.unique.nochrM.bam
SAMstats-ATAC.2x30mers.unique.nochrM
-bam genome.chrom.sizes samtools
-paired -uniqueBAM

8. Calculate the mitochondrial reads fraction MRF as follows:


|RM |
M RF = ð1Þ
|RM | + |RN |

where RM is the total number of reads that map to the


mitochondrial genome and RN is the number of reads that map
to the nuclear genome after filtering out mito-mapping reads.
9. Evaluate the fragment size distribution over the nuclear
genome:
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 219

python PEInsertDistFromBAM.py
ATAC.2x30mers.unique.nochrM.bam
genome.chrom.sizes
ATAC.2x30mers.unique.nochrM.InsLen
-uniqueBAM -normalize

10. Create a normalized genome coverage track:

python makewigglefromBAM-NH.py title


ATAC.2x30mers.unique.nochrM.bam
genome.chrom.sizes ATAC.2x30mers.unique.nochrM.wig
-notitle -RPM -uniqueBAM

11. Create a BigWig file using the wigToBigWig program from


the UCSC Genome Browser utilities suite.

wigToBigWig ATAC.2x30mers.unique.nochrM.wig
genome.chrom.sizes
ATAC.2x30mers.unique.nochrM.bigWig

12. Calculate the global TSS enrichment. The TSS enrichment TSSE
is the most informative ATAC-seq and is based on generating an
average read distribution profile around annotated transcription
start sites for protein coding genes and then calculating the ratio
between the number of reads in the immediate neighborhood of
the TSS and the number of reads falling in the regions on the
flanks of the TSS peak. The advantage of the TSSE metric is that it
is an internal to the dataset measure independent of peak calling.
We use a TSS window of ±100 bp and a TSS flank distance of
2000 bp, i.e., TSSE is calculated as follows:

|R ∈ [T SS ± 100]|
T SSE =
|R ∈ [T SS − 2050, T SS − 1950]| + |R ∈ [T SS + 1950, T SS + 2050]|

(2)
First, generate the TSS metaprofile:

python signalAroundCoordinate-BW.py
annotation.TSS-0bp.bed 0 1 3 4000
ATAC.2x30mers.unique.nochrM.bigWig
ATAC.2x30mers.unique.nochrM.TSS_profile -normalize

Note that you need a BED file containing the start positions
and the strands of annotated TSSs in the genome, e.g.,

#chr TSS TSS strand geneName


chr1 1000 1000 + GENE1
220 Samuel H. Kim et al.

Second, calculate the TSS score:

python ATACTSSscore.py
ATAC.2x30mers.unique.nochrM.TSS_profile
100 2000 >> ATACTSSscore.txt

13. Deduplicate the BAM file. Note that this step is different from
the typical deduplication carried out in most high-throughput
sequencing pipelines, based on tools such as MarkDups in
picard. Here, we perform deduplication of fragments only
within the same cell barcode, i.e., for two fragments to be
collapsed, they need to have the same coordinates, orientation,
and cell barcode.

python SHARE-seq_ATAC_dedup.py
ATAC.2x30mers.unique.nochrM.bam
genome.chrom.sizes
ATAC.2x30mers.unique.nochrM.BC_dedup.bam
-addBC

Use the [-addBC] to append the cell barcodes to each


alignment as a BC tag, making these final files ready to use with
ArchR.
14. Index the deduplicated BAM file:

samtools index ATAC.2x30mers.unique.nochrM.BC_dedup.bam

15. Calculate alignment stats for the deduplicated BAM file:

python SAMstats.py ATAC.2x30mers.unique.nochrM.BC_dedup.bam


SAMstats-ATAC.2x30mers.unique.nochrM.BC_dedup
-bam genome.chrom.sizes samtools -paired -uniqueBAM

16. Calculate fragment count and TSS enrichment statistics for


each cell barcode.

python SHARE-seq_ATAC_stats_per_cell.py
ATAC.2x30mers.unique.nochrM.BC_dedup.bam
genome.chrom.sizes annotation.TSS-0bp.bed 0 1 2000 200
ATAC.2x30mers.unique.nochrM.BC_dedup.per_cell_stats

This script will output a file containing information about


the number of fragments and TSS enrichment for each barcode
that can be used to filter barcodes for downstream analysis.
More sophisticated filtering, in addition to these simple
metrics, i.e., of doublet cells, can be performed in ArchR [67].
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 221

5 Expected Results

5.1 Sequencing Figure 5 shows the typical fragment profiles for ATAC and RNA
Libraries SHARE-seq libraries. ATAC libraries are expected to show a
nucleosomal signature, with a prominent subnucleosomal, mono-
nucleosomal, and perhaps dinucleosomal peaks, shifted to the right
by the length of the adapters and barcodes added to the original
fragments. In contrast, RNA libraries are primarily unimodal in
length.

5.2 Species Mixing A customary experiment to be carried out when testing, adopting,
Experiments or developing any new single-cell protocol is the species mixing
experiment, in which cells from two different species, usually mouse
and human, are mixed together, and the extent of crosstalk/con-
tamination of individual barcodes or of doublet formation
(in which two cells are processed together with the same barcode)

Fig. 5 Typical fragment-length profiles of SHARE-seq libraries. (a) BioAnalyzer profile of a SHARE-seq ATAC
library. (b) BioAnalyzer profile of a SHARE-seq RNA library
222 Samuel H. Kim et al.

A ATAC B RNA
15,000 50,000

40,000
mm10 fragments per barcode

mm10 UMIs per barcode


10,000
30,000

20,000
5,000

10,000

0 0

0 5,000 10,000 15,000 0 10,000 20,000 30,000 40,000


hg38 fragments per barcode hg38 UMIs per barcode

Fig. 6 Typical results from a species mixing SHARE-seq experiment. Human HEK293 and mouse embryonic
fibroblast (MEF) cells were mixed in equal proportions and carried through the SHARE-seq workflow. (a) ATAC
fragments per cell. (b) RNA UMIs per cell

is assessed based on how many reads in each barcode map to each


species. Ideally, all barcodes should feature reads coming from only
one of the two species. Doublet arise from loading of multiple cells
in the same droplets/wells (depending on the method used) or
from physical clumping of cells early in the protocol that then are
processed together throughout the rest of the procedure.
Figure 6 shows typical species mixing results for a SHARE-seq
experiment. We note that in our hands ATAC experiments usually
show virtually no crosstalk between barcodes and very few doub-
lets. On the other hand, pool–split RNA experiments in general
often exhibit a small fraction of reads resulting from “leakage,”
likely because of some cells opening up during cell handling and
releasing their content into the general reaction pool. This issue
does not significantly affect most analyses, but it should be kept in
mind in the cases in which it could be a confounding factor.

5.3 ATAC Post- Figure 7 shows the key ATAC-seq bulk-level metrics. The fragment-
sequencing Quality length distribution (Fig. 7a) usually shows strong subnucleosomal
Evaluation and nucleosomal peaks as well as a weaker dinucleosomal one. High
TSS enrichment is desirable; in this case (Fig. 7b), it is very high (TSSE
≥25). See Note 8 for more details. Figure 7c shows the fraction of
mitochondrial reads in the human and mouse cells in the species
mixing experiment. Note that the fraction can vary greatly depending
on the properties of the cell type (cancer cell lines and highly meta-
bolically active cells tend to have more mitochondria [70]) and not
just on the experimental variation (which in this case is completely
minimized as the cells were processed together).
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 223

A 0.015
B 0.3
C 1.0 nuclear chrM
TSSE = 27.51
0.9

0.8
Number fragments

0.010 0.2
0.7

Average RPM

Fraction of reads
0.6

0.5

0.005 0.1 0.4

0.3

0.2

0.1
0.000 0.0
0.0
0 100 200 300 400 500 600 700 800

0
1, 0
1, 0
2, 0
2, 0
3, 0
3, 0
4, 0
0
-3 00
-3 0 0
-2 00
-2 00
-1 0 0
-1 00
00
00

50
00
50
00
50
00
50
00
,0
,5
,0
,5
,0
,5
,0
-5
Fragment length HEK293 (hg38) MEF (mm10)

-4
Position relative to TSS

Fig. 7 Basic evaluation of bulk-level ATAC quality and enrichment. (a) Fragment-length distribution. (b) TSS
enrichment. Shown are the same experiments as those featured in Fig. 6. (c) Mitochondrial read fraction for
each species in this experiment

A B
80

60
TSS ratio

40

20

10 100 1000 10000


Number fragments

Fig. 8 Basic evaluation of scATAC-seq-level quality and enrichment. (a) Number fragments per cell
barcode vs. TSS enrichment. (b) Cell barcode rank (by fragment counts) vs. fragment counts per cell barcode

Figure 8 shows the key scATAC metrics. One such metric is the
relationship between the number of fragments per cell barcode and
the TSS enrichment within each cell barcode (Fig. 8a). Another is
the curve of the number of fragments per cell barcode plotted
against the rank (by the number of fragments per cell barcode) of
the cell barcodes (Fig. 8b). Ideally, there should be a clear inflection
point between the cell barcodes with high fragment counts and the
cell barcodes with low fragment counts, indicating that a set of
high-quality cells have been captured and preserved intact through
the full pool–split procedure. A flatter, diagonal-like shape of that
curve can be indicative of loss of cell integrity during handling and
is potentially concerning regarding the biological interpretability of
the experiment if the lack of inflection is too extreme.
224 Samuel H. Kim et al.

A 0.3
B exonic
1.0
intronic intergenic
Coverage (arbitrary units)

0.9

0.8

0.2 0.7

Fraction of reads
0.6

0.5

0.4
0.1
0.3

0.2

0.1

0.0 0.0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 HEK293 (hg38) MEF (mm10)


Position along mRNA 5' -> 3' (for mRNAs >= 1,000 bp)

Fig. 9 Basic evaluation of the bulk-level RNA-seq properties. (a) Read distribution along transcript lengths. (b)
Read distribution relative to the exonic, intronic, and intergenic genomic spaces

5.4 RNA Post- Figure 9 shows the typical parameters to be evaluated for a bulk-
sequencing Quality level RNA-seq dataset. One is the distribution of reads along tran-
Evaluation scripts (Fig. 9a). SHARE-seq is not a 3’-tagging experiment the
way some scRNA-seq approaches are as it attaches UMIs to the 3’
end of transcripts, but cDNAs are tagmented at random after
cDNA amplification; thus the first reads of the RNA part of a
SHARE-seq dataset can be some distance away from the 3’ end.
Another is the distribution of reads relative to the annotation
(Fig. 9b). As is often observed in scRNA-seq datasets, SHARE-seq
RNA libraries contain a significant portion of reads originating
from introns, presumably from unspliced transcripts present in
the nucleus. This is likely due to the fact that the ATAC reaction
has to happen first in the workflow, and thus a substantial portion of
the cytoplasm is lost and the final libraries are enriched for nuclear
material.
Figure 10 shows the key metric for evaluating the success of the
RNA portion of a SHARE-seq experiment. As with ATAC above,
the curve of the number of UMIs per cell barcode plotted against
the rank (by the number of UMIs per cell barcode) of the cell
barcodes should ideally feature a clear inflection point between
the cell barcodes with high UMI counts and the cell barcodes
with low UMI counts (Fig. 10a). There should also be a concor-
dance between the cell barcodes with high ATAC fragment counts
and those with high UMI counts, i.e., the same cells are of high
quality in both modalities, and are thus usable for joint analysis
(Fig. 10b).
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 225

A B
10000

10000
1000

Number_fragments +1
1000
count
Number UMIs

20000
15000
10000
100

100
5000
10

10

1
1

1 10 100 1,000 10,000 100,000 1,000,000 1 10 100 1,000 10,000 100,000


cell barcode rank UMIs + 1

Fig. 10 Basic evaluation of SHARE-seq RNA single-cell-level quality and enrichment. (a) Cell barcode rank
(by UMI counts) vs. UMI counts per cell barcode. (b) UMI counts per barcode vs. ATAC fragment counts per
barcode.

A ATAC B RNA
10

5
UMAP_2

UMAP_2

−5

−10 0 10
UMAP_1
UMAP_1

Fig. 11 Example SHARE-seq output on human embryonic lung samples. (a) ArchR iterative LSI UMAP on the
ATAC-seq dataset. (b) Seurat UMAP on the RNA dataset. Individual ArchR- and Seurat-defined clusters are
colored separately

5.5 Dimensionality Following initial data processing, clusters and cell types can be
Reduction and Cell identified using standard tools for that purpose such as Seurat
Type/Cluster [68] and/or ArchR [67]. Figure 11 shows typical such output in
Identification UMAP space for both the ATAC and RNA sides of a SHARE-seq
experiment from a human embryonic lung tissue sample.
226 Samuel H. Kim et al.

6 Notes

1. The details of the production of hyperactive transposition are


beyond the scope of this chapter. However, detailed instruc-
tions for how to carry it out can be found in Picelli et al.
2014 [71].
2. In this chapter, we presented one of many available protocols
for tissue dissociation and nuclei isolation that has worked in
our hands in some contexts. However, the variety of tissues and
their properties that can be encountered in different organisms
is vast, making it practically impossible to have one common
such protocol for all situations. Thus novel optimal procedures
for tissue dissociation often have to be empirically devised or
adapted.
3. The protocol we described here used light 0.1% FA crosslink-
ing. This does not mean that optimal results will be obtained in
all contexts with the same conditions, and crosslinking may
have to be optimized depending on the specifics of the experi-
mental system being studied.
4. The protocol described here is for a 96 × 96 × 96 indexing.
However, it can be expanded to more cycles and/or more
barcodes, e.g., to a 3-round 384 × 384 × 384 indexing, or
4-round or 5-round 96/384 × 96/384 × 96/384. Pick the
optimal design based on the availability of robotic liquids
handlers (it is generally not practical to carry out pipetting of
384-well plates by hand), the desired throughput, and other
considerations. Note that additional barcodes and linker would
have to be designed so that they are compatible with each other
and with further rounds of barcoding. Aim for as much dis-
tance in sequence space between the 8-bp barcodes (or increase
their length, if the sequencing format allows for it). The set of
8-bp barcodes can be identical throughout all rounds of
indexing.
5. Low-binding tubes are preferable for all reactions in order to
ensure maximum yields.
6. It is optimal in terms of effort to anneal a sufficient amount of
oligos for multiple experiments on many separate plates. These
can then be used immediately when cells/tissues become avail-
able, saving a considerable amount of experimental time.
7. The TB buffer described here is modified from the original
omniATAC protocol with the addition of acetate. In our expe-
rience, this provides superior results compared to the tradi-
tional buffer formulation.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 227

8. In our (and not only ours) experience, experiments in cell lines


always produce much higher quality ATAC datasets than those
obtained from tissues, especially frozen tissues. This is not
limited to SHARE-seq but is what has been observed by
numerous previous studies mapping chromatin accessibility in
tissue samples in contexts such as cancer, development, and
adult tissues [27, 28, 72, 73]. This is likely due to the extensive
handling and freezing and thawing of tissues leading to the
breaking up of nuclei and the release of unprotected free DNA
that is tagmented by Tn5, increasing the background frag-
ments and decreasing the signal to noise. Whether future pro-
tocol optimizations can resolve these issues or they are
fundamentally insurmountable is not known at present.

Acknowledgements

The authors thank Sai Ma and Jason Buenrostro for helpful discus-
sion regarding the SHARE-seq protocol. This work was supported
by NIH grants (P50HG007735, RO1 HG008140, U19AI057266
and UM1HG009442 to W.J.G., 1UM1HG009436 to W.J.G. and
A.K., 1DP2OD022870-01 and 1U01HG009431 to A.K., and
HG006827 to C.H.), the Rita Allen Foundation (to W.J.G.), the
Baxter Foundation Faculty Scholar Grant, and the Human Fron-
tiers Science Program grant RGY006S (to W.J.G). W.J.G is a Chan
Zuckerberg Biohub investigator and acknowledges grants 2017-
174468 and 2018-182817 from the Chan Zuckerberg Initiative.
S.K. is supported by MSTP training grant T32GM007365 and the
Paul and Daisy Soros Fellowship. Fellowship support also provided
by the Stanford School of Medicine Dean’s Fellowship (G.K.M.),
by the EMBO Long-Term Fellowship EMBO ALTF 1119-2016,
and by the Human Frontier Science Program Long-Term Fellow-
ship HFSP LT 000835/2017-L (Z.S.).

References
1. Mortazavi A, Williams BA, McCue K et al. transcriptome surveyed at single-nucleotide
(2008) Mapping and quantifying mammalian resolution. Nature 453(7199):1239–1243.
transcriptomes by RNA-Seq. Nat Methods 5. Tang F, Barbacioru C, Wang Y et al. (2009)
5(7):621–628 mRNA-Seq whole-transcriptome analysis of a
2. Nagalakshmi U, Wang Z, Waern K et al. single cell. Nat Methods 6(5):377–382.
(2008) The transcriptional landscape of the 6. Islam S, Kj€allquist U, Moliner A et al. (2011)
yeast genome defined by RNA sequencing. Sci- Characterization of the single-cell transcrip-
ence 320(5881):1344–1349 tional landscape by highly multiplex RNA-seq.
3. Sultan M, Schulz MH, Richard H et al. (2008) Genome Res 21(7):1160–1167.
A global view of gene activity and alternative 7. Ramsköld D, Luo S, Wang YC et al. (2012)
splicing by deep sequencing of the human tran- Full-length mRNA-Seq from single-cell levels
scriptome. Science 321(5891):956–960. of RNA and individual circulating tumor cells.
4. Wilhelm BT, Marguerat S, Watt S et al. (2008) Nat Biotechnol 30(8):777–782
Dynamic repertoire of a eukaryotic
228 Samuel H. Kim et al.

8. Hashimshony T, Wagner F, Sher N, Yanai I 19. Wu C (1980) The 5′ ends of Drosophila heat
(2012) CEL-seq: single-cell RNA-Seq by mul- shock genes in chromatin are hypersensitive to
tiplexed linear amplification. Cell Rep 2(3): DNase I. Nature 286(5776):854–860
666–673. 20. Minnoye L, Marinov GK, Krausgruber T et al.
9. Shalek AK, Satija R, Adiconis X, Gertner RS, (2021) Chromatin accessibility profiling meth-
Gaublomme JT, Raychowdhury R, Schwartz S, ods. Nat Rev Meth Primers 1:10.
Yosef N, Malboeuf C, Lu D, Trombetta JJ, 21. Buenrostro JD, Giresi PG, Zaba LC et al.
Gennert D, Gnirke A, Goren A, Hacohen N, (2013) Transposition of native chromatin for
Levin JZ, Park H, Regev A (2013) Single-cell fast and sensitive epigenomic profiling of open
transcriptomics reveals bimodality in expres- chromatin, DNA-binding proteins and nucleo-
sion and splicing in immune cells. Nature some position. Nat Methods 10:1213–1218
498(7453):236–240. 22. Corces MR, Trevino AE, Hamilton EG et al.
10. Jaitin DA, Kenigsberg E, Keren-Shaul H, (2017) An improved ATAC-seq protocol
Elefant N, Paul F, Zaretsky I, Mildner A, reduces background and enables interrogation
Cohen N, Jung S, Tanay A, Amit I (2014) of frozen tissues. Nat Methods 14:959–962
Massively parallel single-cell RNA-seq for 23. Reznikoff WS (2008) Transposon Tn5. Annu
marker-free decomposition of tissues into cell Rev Genet 42:269–286
types. Science 343(6172):776–779
24. Adey A, Morrison HG, Asan et al. (2010)
11. Klein AM, Mazutis L, Akartuna I, Rapid, low-input, low-bias construction of
Tallapragada N, Veres A, Li V, Peshkin L, shotgun fragment libraries by high-density
Weitz DA, Kirschner MW (2015) Droplet bar- in vitro transposition. Genome Biol 11(12):
coding for single-cell transcriptomics applied R119
to embryonic stem cells. Cell 161(5):
1187–1201 25. Buenrostro JD, Wu B, Litzenburger UM et al.
(2015) Single-cell chromatin accessibility
12. Macosko EZ, Basu A, Satija R, Nemesh J, reveals principles of regulatory variation.
Shekhar K, Goldman M, Tirosh I, Bialas AR, Nature 523:486–490
Kamitaki N, Martersteck EM, Trombetta JJ,
Weitz DA, Sanes JR, Shalek AK, Regev A, 26. Cusanovich DA, Daza R, Adey A et al. (2015)
McCarroll SA (2015) Highly parallel genome- Multiplex single cell profiling of chromatin
wide expression profiling of individual cells accessibility by combinatorial cellular indexing.
using nanoliter droplets. Cell 161(5): Science 348:910–914
1202–1214 27. Cusanovich DA, Reddington JP, Garfield DA
13. Zheng GX, Terry JM, Belgrader P et al. (2017) et al. (2018) The cis-regulatory dynamics of
Massively parallel digital transcriptional embryonic development at single-cell resolu-
profiling of single cells. Nat Commun 8:14049 tion. Nature 555:538–542
14. Han X, Wang R, Zhou Y et al. (2018) Mapping 28. Preissl S, Fang R, Huang H et al. (2018)
the Mouse Cell Atlas by Microwell-Seq. Cell Single-nucleus analysis of accessible chromatin
172(5):1091–1107.e17 in developing mouse forebrain reveals cell-
type-specific transcriptional regulation. Nat
15. Cao J, Packer JS, Ramani V et al. (2017) Com- Neurosci 21(3):432–439
prehensive single-cell transcriptional profiling
of a multicellular organism. Science 357:661– 29. Mezger A, Klemm S, Mann I et al. (2018)
667 High-throughput chromatin accessibility
profiling at single-cell resolution. Nat Com-
16. Rosenberg AB, Roco CM, Muscat RA et al. mun 9(1):3647
(2018) Single-cell profiling of the developing
mouse brain and spinal cord with split-pool 30. Satpathy AT, Granja JM, Yost KE et al. (2019)
barcoding. Science 360:176–182 Massively parallel single-cell chromatin land-
scapes of human immune cell development
17. McGhee JD, Wood WI, Dolan M et al. (1981) and intratumoral T cell exhaustion. Nat Bio-
A 200 base pair region at the 5′ end of the technol 37:925–936
chicken adult β-globin gene is accessible to
nuclease digestion. Cell 27:45–55 31. Lareau CA, Duarte FM, Chew JG et al. (2019)
Droplet-based combinatorial indexing for
18. Keene MA, Corces V, Lowenhaupt K et al. massive-scale single-cell chromatin accessibility.
(1981) DNase I hypersensitive sites in Dro- Nat Biotechnol 37:916–924
sophila chromatin occur at the 5′ ends of
regions of transcription. Proc Natl Acad Sci U 32. Macaulay IC, Haerty W, Kumar P, et al. 2015.
S A 78:143–146 G & T-seq: parallel sequencing of single-cell
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . 229

genomes and transcriptomes. Nat Methods role of TREM2 in cancer. Cell 182(4):
12(6):519–522 872–885.e19
33. Huang AY, Li P, Rodin RE et al. (2020) Paral- 46. Guo F, Li L, Li J et al. (2017) Single-cell multi-
lel RNA and DNA analysis after deep sequenc- omics sequencing of mouse early embryos and
ing (PRDD-seq) reveals cell type-specific embryonic stem cells. Cell Res 27(8):967–988
lineage patterns in human brain. Proc Natl 47. Clark SJ, Argelaguet R, Kapourani CA et al.
Acad Sci U S A 117(25):13886–13895 (2018) scNMT-seq enables joint profiling of
34. Zachariadis V, Cheng H, Andrews N, Enge M chromatin accessibility DNA methylation and
(2020) A highly scalable method for joint transcription in single cells. Nat Commun 9(1):
whole-genome sequencing and gene- 781
expression profiling of single cells. Mol Cell 48. Wang Y, Yuan P, Yan Z et al. (2021) Single-cell
80(3):541–553.e5 multiomics sequencing reveals the functional
35. Yin Y, Jiang Y, Lam KG et al. (2019) High- regulatory landscape of early embryos. Nat
throughput single-cell sequencing with linear Commun 12(1):1247
amplification. Mol Cell 76(4):676–690.e10 49. Luo C, Liu H, Xie F et al. (2019) Single
36. Rodriguez-Meira A, Buck G, Clark SA et al. nucleus multi-omics links human cortical cell
(2019) Unravelling intratumoral heterogeneity regulatory genome diversity to disease risk var-
through high-sensitivity single-cell mutational iants. bioRxiv 2019.12.11.873398
analysis and parallel rna sequencing. Mol Cell 50. Xiong H, Luo Y, Wang Q et al. (2021) Single-
73(6):1292–1305.e8 cell joint detection of chromatin occupancy
37. Hou Y, Guo H, Cao C et al. (2016) Single-cell and transcriptome enables higher-dimensional
triple omics sequencing reveals genetic, epige- epigenomic reconstructions. Nat Methods
netic, and transcriptomic heterogeneity in 18(6):652–660
hepatocellular carcinomas. Cell Res 26(3): 51. Zhu C, Zhang Y, Li YE et al. (2021) Joint
304–319 profiling of histone modifications and tran-
38. Hu Y, Huang K, An Q, Du G, Hu G, Xue J, scriptome in single cells from mouse brain.
Zhu X, Wang CY, Xue Z, Fan G (2016) Simul- Nat Methods 18(3):283–292
taneous profiling of transcriptome and DNA 52. Markodimitraki CM, Rang FJ, Rooijers K et al.
methylome from a single cell. Genome Biol (2020) Simultaneous quantification of protein-
17:88 DNA interactions and transcriptomes in single
39. Angermueller C, Clark SJ, Lee HJ et al. (2016) cells with scDam & T-seq. Nat Protoc 15(6):
Parallel single-cell sequencing links transcrip- 1922–1953
tional and epigenetic heterogeneity. Nat Meth- 53. Fiskin E, Lareau CA, Eraslan G et al. (2020)
ods 13(3):229–232 Single-cell multimodal profiling of proteins
40. Pott S (2017) Simultaneous measurement of and chromatin accessibility using
chromatin accessibility, DNA methylation, and PHAGE-ATAC. bioRxiv 2020.10.01.322420
nucleosome phasing in single cells. Elife 6: 54. Mimitou EP, Lareau CA, Chen KY et al. (2021)
e23203 Scalable, multimodal profiling of chromatin
41. Peterson VM, Zhang KX, Kumar N et al. accessibility, gene expression and protein levels
(2017) Multiplexed quantification of proteins in single cells. Nat Biotechnol. https://doi.
and transcripts in single cells. Nat Biotechnol org/10.1038/s41587-021-00927-2
35(10):936–939 55. Swanson E, Lord C, Reading J et al. (2021)
42. Stoeckius M, Hafemeister C, Stephenson W Simultaneous trimodal single-cell measure-
et al. (2017) Simultaneous epitope and tran- ment of transcripts, epitopes, and chromatin
scriptome measurement in single cells. Nat accessibility using TEA-seq. eLife 10:e63632
Methods 14(9):865–868 56. Kearney CJ, Vervoort SJ, Ramsbottom KM
43. O’Huallachain M, Bava FA et al. (2020) Ultra- et al. (2021) SUGAR-seq enables simultaneous
high throughput single-cell analysis of proteins detection of glycans, epitopes, and the tran-
and RNAs by split-pool synthesis. Commun scriptome in single cells. Sci Adv 7(8):
Biol 3(1):213 eabe3610
44. Chung H, Parkhurst CN, Magee EM et al. 57. Cao J, Cusanovich DA, Ramani V et al. (2018)
(2021) Simultaneous single cell measurements Joint profiling of chromatin accessibility and
of intranuclear proteins and gene expression. gene expression in thousands of single cells.
https://doi.org/10.1101/2021.01.18.427139 Science 361:1380–1385
45. Katzenelenbogen Y, Sheban F, Yalin A et al. 58. Zhu C, Yu M, Huang H et al. (2019) An ultra
(2020) Coupled scRNA-seq and intracellular high-throughput method for single-cell joint
protein activity reveal an immunosuppressive
230 Samuel H. Kim et al.

analysis of open chromatin and transcriptome. 66. Dobin A, Davis CA, Schlesinger F et al.
Nat Struct Mol Biol 26:1063–1070 (2013) STAR: ultrafast universal RNA-seq
59. Xing QR, Farran CAE, Zeng YY et al. (2020) aligner. Bioinformatics 29(1):15–21.
Parallel bimodal single-cell sequencing of tran- 67. Granja JM, Corces MR, Pierce SE et al. (2021)
scriptome and chromatin accessibility. Genome ArchR is a scalable software package for inte-
Res 30(7):1027–1039 grative single-cell chromatin accessibility analy-
60. Chen S, Lake BB, Zhang K (2019) High- sis. Nat Genet 53(3):403–411
throughput sequencing of the transcriptome 68. Hao Y, Hao S, Andersen-Nissen E et al. (2021)
and chromatin accessibility in the same cell. Integrated analysis of multimodal single-cell
Nat Biotechnol 37(12):1452–1457 data. Cell 184(13):3573–3587.e29
61. Ma S, Zhang B, LaFave LM et al. (2020) Chro- 69. ENCODE Project Consortium (2012) An
matin potential identified by shared single-cell integrated encyclopedia of DNA elements in
profiling of RNA and chromatin. Cell 183: the human genome. Nature 489:57–74
1103–1116.e20 70. Marinov GK, Wang YE, Chan DC, Wold BJ
62. Langmead B, Trapnell C, Pop M et al. (2009) (2014) Evidence for site-specific occupancy of
Ultrafast and memory-efficient alignment of the mitochondrial genome by nuclear tran-
short DNA sequences to the human genome. scription factors. PLoS ONE 9(1):e84713. link
Genome Biol 10:R25 71. Picelli S, Björklund AK, Reinius B et al. (2014)
63. Li H, Handsaker B, Wysoker A et al. (2009) Tn5 transposase and tagmentation procedures
The sequence alignment/map format and for massively scaled sequencing projects.
SAMtools. Bioinformatics 25:2078–2079 Genome Res 24:2033–2040
64. Kuhn RM, Haussler D, Kent WJ (2013) The 72. Domcke S, Hill AJ, Daza RM et al. (2020) A
UCSC Genome Browser and associated tools. human cell atlas of fetal chromatin accessibility.
Brief Bioinform 14:144–161 Science 370(6518):eaba7612
65. Kent WJ, Zweig AS, Barber G et al. (2010) 73. Corces MR, Granja JM, Shams S et al. (2018)
BigWig and BigBed: enabling browsing of The chromatin accessibility landscape of pri-
large distributed datasets. Bioinformatics 26: mary human cancers. Science 362(6413):
2204–2207 eaav1898
Chapter 12

Simultaneous Measurement of DNA Methylation


and Nucleosome Occupancy in Single Cells Using
scNOMe-Seq
Michael Wasney and Sebastian Pott

Abstract
Single-cell Nucleosome Occupancy and Methylome sequencing (scNOMe-seq) is a multimodal assay that
simultaneously measures endogenous DNA methylation and nucleosome occupancy (i.e., chromatin
accessibility) in single cells. scNOMe-seq combines the activity of a GpC Methyltransferase, an enzyme
which methylates cytosines in GpC dinucleotides, with bisulfite conversion, whereby unmethylated cyto-
sines are converted into thymines. Because GpC Methyltransferase acts only on cytosines present in
non-nucleosomal regions of the genome, the subsequent bisulfite conversion step not only detects the
endogenous DNA methylation, but also reveals the genome-wide pattern of chromatin accessibility.
Implementing this technology at the single-cell level helps to capture the dynamics governing methylation
and accessibility vary across individual cells and cell types. Here, we provide a scalable plate-based protocol
for preparing scNOMe-seq libraries from single nucleus suspensions.

Key words scNOMe-seq, Single cell, DNA methylation, Nucleosome occupancy, Chromatin accessi-
bility, GpC Methyltransferase, Bisulfite sequencing, Epigenetic modification, Fluorescence-activated
cell sorting

1 Introduction

Multimodal technologies allow for the simultaneous characteriza-


tion of multiple biological features in the same samples, making it
possible to directly observe relationships between these features
[1–8]. Increasingly, these assays are being implemented at the
level of single cells. Single-cell data reveal cell-type-specific features
and regulatory dynamics that are not apparent in data from bulk
assays.
Nucleosome Occupancy and Methylome sequencing (NOMe-
seq) is a multimodal assay that quantifies both endogenous DNA
methylation and nucleosome occupancy [9–11] – two genomic
features associated with modulation of transcription – and has

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_12,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

231
232 Michael Wasney and Sebastian Pott

average methylation (%)


40

30 CpG
GpC

20

−1000 −500 0 500 1000


distance from DHSs center in bp

Fig. 1 scNOMe-seq data measure chromatin accessibility and DNA methylation.


Aggregated CpG (yellow) and GpC (blue) methylation data from multiple single
cells at DNase Hypersensitive sites in GM12878 cells. Figure reproduced from
Pott, 2017 [1], published under a Creative Commons Attribution license

been adapted for application on single cells (scNOMe-seq)


[1–3]. DNA methylation, which in mammalian cells almost always
occurs in CpG dinucleotides, is strongly associated with transcrip-
tional repression [12]. Nucleosomes block most DNA-dependent
processes by hindering access of transcription factors and other
cellular machinery to the underlying genetic sequence [13]. Con-
versely, nucleosome-depleted regions (NDRs) identify gene regu-
latory elements, such as promoters and enhancers. In surveying
both features at the single-cell level, scNOMe-seq can help to
elucidate the functional relationship between DNA methylation
and nucleosome occupancy within cell-type-specific contexts [14]
(Fig. 1).
scNOMe-seq utilizes the M.CviPi GpC Methyltransferase to
methylate all accessible cytosine residues in GpC dinucleotides.
Importantly, these residues are normally unmethylated in mamma-
lian cells [15]. By following GpC Methyltransferase treatment with
bisulfite conversion, all unmodified cytosine residues are converted,
facilitating the detection of both accessible GpC residues and
endogenous DNA methylation at CpG residues [1, 11].
To detect cytosine methylation, this protocol adapts a plate-
based library strategy developed for single-cell bisulfite sequencing
[16, 17]. Our original scNOMe-seq publication used the Pico
Methyl-seq library prep kit (Zymo Research D5456) to generate
single cell libraries [1]. The Ecker lab established a multiplexed
single-nucleus methylcytosine sequencing protocol [16, 17]
which significantly improved samples throughput and genome
coverage. We have used this protocol for scNOMe-seq since
then. Luo et al. [16] was published under a Creative Commons
Attribution 4.0 International License and we reproduced the pro-
tocol with small adaptations as part of our workflow outlined
scNOMe-Seq 233

A B
GmC mCG 100 kb
aggregate
20

Cardiomyocytes CM
10
Hemat.
Endothelial
TSNE 2

0 Fibroblasts Fibroblasts
Pericytes/
Sm. mus. Hemat.
−10
Endothelial cells Pericytes/
Sm.Mus.
−10 −5 0 5 10
MYH7
TSNE 1

Fig. 2 Multimodal profiling of the heart captures cell-type-specific epigenetic configurations in the MYH7
locus. scNOMe-seq data from an adult human heart sample comprising 1229 cells. (a) TSNE plot with clusters
corresponding to major cell types (left). (b) Pseudobulk data tracks for the corresponding clusters for both data
modalities capturing chromatin accessibility (GmC, green) and DNA methylation (mCG, blue), respectively

below. Sequenced scNOMe-seq libraries are aligned to a human


genome, allowing for the methylation statuses of all captured cyto-
sines in CpG and GpC contexts to be retrieved. Because CpG and
GpC dinucleotides occur relatively frequently, a single read can
simultaneously capture the accessibility and endogenous methyla-
tion of the genomic locus to which the read maps.
The protocol below starts from single cell suspensions and
combines initial preparation and GpC treatment of nuclei [18]
with preparation of single-cell bisulfite libraries [16]. However, in
our experience this protocol can be performed following most
nuclei isolations. For example, we successfully used nuclei isolated
from frozen human heart samples (Fig. 2).

2 Materials

While performing this protocol, use ultrapure water and analytical


grade reagents to prepare solutions and follow institutional and
material-related safety guidelines when disposing of waste.

2.1 Reagents/ 1. Digestion mix: Prepare 1.9% Proteinase K solution by adding


Consumables 1040 μL of Proteinase K Storage buffer to one tube with 20 mg
of Proteinase K (Zymo D3001-2-20) and allow Proteinase K to
2.1.1 Nuclei Isolation and
dissolve completely. Store Proteinase K solution at -20 °C
GpC Methyltransferase
between uses. To isolate nuclei, mix 883 μL of M-Digestion
Treatment
Buffer (Zymo D5020–9), 88 μL Proteinase K solution, and
795 μL water. Digestion mix can be prepared the day before an
experiment and stored at 4 °C (see Note 2).
234 Michael Wasney and Sebastian Pott

2. 1X RSB Buffer [18]: Prepare a stock of 10X RSB with 100 mM


Tris–HCl, pH 7.4, 100 mM NaCl, and 30 mM MgCl2. Make
at least 3 mL of 1X RSB (1:10 dilution of 10X RSB) per sample
being processed.
3. 1X PBS.
4. 1% NP-40: Make 1 mL of 1% NP-40 from 10% NP-40 by
mixing 10 μL of 10% NP-40 with 90 μL of water.
5. GpC Methylase reaction mix: Mix 7.5 μL 10X GpC Methyl-
transferase buffer, 1.5 μL 32 mM SAM, and 50 μL 4 U/μL
GpC Methyltransferase (NEB M0277L). Nuclei will be added
directly to this mix for the methylase reaction. Reserve 25 μL
4 U/μL GpC Methyltransferase and 0.75 μL 32 mM SAM to
boost the reaction.

2.1.2 Fluorescence- 1. NucBlue™ Live Cell Stain ReadyProbes (Invitrogen R37605).


Activated Cell Sorting

2.1.3 Bisulfite 1. CT Conversion reagent: Add 7.9 mL M-Solubilization Buffer


Conversion and 3 mL M-Dilution Buffer to one bottle of CT Conversion
reagent. Vortex vigorously for at least 10 min to fully dissolve
the reagent, and then add 1.6 mL M-Reaction buffer (Zymo
D5022). Prepared CT Conversion reagent can be stored over-
night at room temperature, for a week at 4 °C, or for up to
1 month at -20 °C. If stored at 4 °C or -20 °C, warm the
solution at 37 °C prior to use.
2. EZ-96 DNA Methylation-Direct Kit (shallow-well) (Zymo
D5022).
• M-Binding buffer.
• M-Wash buffer: To prepare one bottle of M-Wash buffer,
add 144 mL of 100% ethanol to the 36 mL of concentrate in
the bottle provided by the kit.
• M-Desulphonation buffer.
• M-Elution buffer.
3. Random Primer Solution: Prepare Random Primer Solution
prior to the purification by adding 64 μL 100 μM random
primer stock to 728 μL M-Elution buffer. Keep on ice.

2.1.4 Random-Primed 1. Random Priming Master Mix: Prior to denaturing the samples
DNA Synthesis as part of the Random-primed DNA synthesis step, mix 922 μL
10X Blue Buffer, 231 μL of 50 U/μL Klenow fragment (Qia-
gen Beverly P7010HCL), 461 μL of dNTP solution with each
nucleotide at a concentration of 10 mM (Thermo Fisher
R0191), and 2995 μL of water. Keep on ice.
scNOMe-Seq 235

2.1.5 Inactivation of Free 1. Exo/rSAP Master Mix: Prior to beginning inactivation step,
Primers and dNTPs mix 922 μL of 20 U/μL Exonuclease I and 461 μL of 1 U/μL
rSAP (Qiagen Beverly X8010L). Keep on ice.

2.1.6 Sample Clean-Up 1. SPRI Beads: Apportion 280 μL of Sera-Mag SpeedBeads into
an Eppendorf tube and place on a magnetic stand. Allow the
solution to clear of beads before carefully removing the super-
natant. Wash the beads twice with 1 mL TE. Between washes,
remove the tube from the magnet and mix by inversion before
replaces the tube on the stand and allowing the beads to clear.
After the second wash, resuspend beads in 280 μL of TE. Mean-
while, transfer 2.52 g PEG 8000 to a 50 mL conical tube. Add
2.8 mL of 5 M NaCl, 140 μL 1 M Tris–HCl pH 8, and 28 μL of
0.5 M EDTA pH 8. Add 7 to 8 mL of water and vortex the
solution until the PEG 8000 has dissolved. Add the washed
Sera-Mag SpeedBeads and bring the solution up to 14 mL with
water. Store at 4 °C (see Notes 3 and 12).
2. 80% Ethanol: To make 50 mL of 80% ethanol, mix 40 mL
200 proof ethanol and 10 mL water. Vortex before use.

2.1.7 Adaptase Reaction 1. Adaptase Master Mix: Mix 450.5 μL of Elution Buffer (Qiagen
19,086), 212 μL Buffer G1, 212 μL Reagent G2, 132 μL
Reagent G3, 53 μL Enzyme G4, and 53 μL Enzyme G5.
Pipette to mix and keep on ice.

2.1.8 Library 1. P5L PCR Primer Mix: 1.2 μM P5L primer (working concen-
Amplification tration of 600 nM when combined with P7L primer). Mix
1.2 μL of 100 μM P5L stock with 98.8 μL water. Keep on ice
before use.
2. P7L PCR Primer Mix: 2 μM P7L primer (working concentra-
tion of 1 μM after being combined with P5L primer). Mix 2 μL
of 100 μM P7L primer with 98 μL water. Keep on ice
before use.
3. 2X Kapa Hifi Mix (Roche 07958935001).

2.1.9 Library Clean-Up 1. SPRI Beads.


2. 80% Ethanol.
3. Elution Buffer (Qiagen 19086).
4. Qubit 4 Fluorometer (Invitrogen or Qubit Flex Fluorometer
(Invitrogen Q33327)).

2.1.10 Primers and 1. HPLC purified random primers (added after bisulfite
Barcodes conversion): H: A, G, or T.
Barcode 1: /5SpC3/ TTCCCTACACGACGCTCTTCC
GATCTATCACG (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
236 Michael Wasney and Sebastian Pott

Barcode 2: /5SpC3/ TTCCCTACACGACGCTCTTCC


GATCTCGATGT (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
Barcode 3: /5SpC3/ TTCCCTACACGACGCTCTTCC
GATCTTGACCA (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
Barcode 4: /5SpC3/ TTCCCTACACGACGCTCTTCC
GATCTGCCAAT (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
Barcode 5: /5SpC3/ TTCCCTACACGACGCTCTTCC
GATCTCAGATC (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
Barcode 6: /5SpC3/ TTCCCTACACGACGCTCTTCC
GATCTACTTGA (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
Barcode 7: /5SpC3/ TTCCCTACACGACGCTCTTCC
GATCTTAGCTT (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
Barcode 8: /5SpC3/ TTCCCTACACGACGCTCTTCC
GATCTCTTGTA (H1:33340033)(H1)(H1)(H1)(H1)(H1)
(H1)(H1)(H1).
These random priming oligos differ from the ones
provided by Swift Biosciences, and sequences were provided
in Luo et al. [17].
2. P5 primers (added during library amplification).
P501: AATGATACGGCGACCACCGAGATCTACACA
CGATCAGACACTCTTTCCCTACACGACGCTCT.
P502: AATGATACGGCGACCACCGAGATCTACACT
CGAGAGTACACTCTTTCCCTACACGACGCTCT.
P503: AATGATACGGCGACCACCGAGATCTACACC
TAGCTCAACACTCTTTCCCTACACGACGCTCT.
P504: AATGATACGGCGACCACCGAGATCTACACA
TCGTCTCACACTCTTTCCCTACACGACGCTCT.
P505: AATGATACGGCGACCACCGAGATCTACACT
CGACAAGACACTCTTTCCCTACACGACGCTCT.
P506: AATGATACGGCGACCACCGAGATCTACACC
CTTGGAAACACTCTTTCCCTACACGACGCTCT.
P507: AATGATACGGCGACCACCGAGATCTACACA
TCATGCGACACTCTTTCCCTACACGACGCTCT.
P508: AATGATACGGCGACCACCGAGATCTACACT
GTTCCGTACACTCTTTCCCTACACGACGCTCT.
P509: AATGATACGGCGACCACCGAGATCTACACA
TTAGCCGACACTCTTTCCCTACACGACGCTCT.
P510: AATGATACGGCGACCACCGAGATCTACACC
GATCGATACACTCTTTCCCTACACGACGCTCT.
scNOMe-Seq 237

P511: AATGATACGGCGACCACCGAGATCTACACG
ATCTTGCACACTCTTTCCCTACACGACGCTCT.
P512: AATGATACGGCGACCACCGAGATCTACACA
GGATAGCACACTCTTTCCCTACACGACGCTCT
3. P7 primers (added during library amplification).
P701: CAAGCAGAAGACGGCATACGAGATAGGCAA
TGGTGACTGGAGTTCAGACGTGTGCTCTT.
P702: CAAGCAGAAGACGGCATACGAGATTCACCT
AGGTGACTGGAGTTCAGACGTGTGCTCTT.
P703: CAAGCAGAAGACGGCATACGAGATCATACG
GAGTGACTGGAGTTCAGACGTGTGCTCTT.
P704: CAAGCAGAAGACGGCATACGAGATGTCATC
GTGTGACTGGAGTTCAGACGTGTGCTCTT.
P705: CAAGCAGAAGACGGCATACGAGATTTACCG
ACGTGACTGGAGTTCAGACGTGTGCTCTT.
P706: CAAGCAGAAGACGGCATACGAGATACCTTC
GAGTGACTGGAGTTCAGACGTGTGCTCTT.
P707: CAAGCAGAAGACGGCATACGAGATACGCTT
CTGTGACTGGAGTTCAGACGTGTGCTCTT.
P708: CAAGCAGAAGACGGCATACGAGATGAGTAG
AGGTGACTGGAGTTCAGACGTGTGCTCTT.

2.2 Equipment 1. KIMBLE® Dounce tissue grinder set (DWK 885300–0001).


2.2.1 Assay 2. 384-well clear reaction plates (Applied Biosystems 4483285).
3. Adhesive PCR sealing foil sheets (Thermo Fisher 00139148).
4. Zymo-Spin 384-Well DNA Binding Plates (Zymo C2012).
5. Thermo Scientific™ Nunc™ 96-Well Polypropylene Deep-
Well™ Storage Plates (Thermo Fisher Scientific 95040462)
(see Note 5).
6. DynaMag™-96 Side Skirted Magnet (Thermo Fisher Scientific
12,027).
7. 96-well PCR plate (Genesee Scientific 24-302).
8. DynaMag™-2 Magnet (Thermo Fisher Scientific 12321D).
9. 1.7 mL microtube, clear (Genesee Scientific 24-282LR).
10. 0.2 mL 8-well PCR strip tubes (Genesee Scientific 21-125).
11. Flow cytometer (e.g., BD FACSAria™ Fusion).
12. Thermocycler(s) with 96- and 384-well blocks.
13. Centrifuge outfitted with a swinging bucket rotor capable of
spinning at 5000 × g, maintaining a temperature of 4 °C, and
accommodating both plates and microtubes.
14. 12-channel pipette set capable of handling volumes ranging
from 0.5 to 300 μL.
238 Michael Wasney and Sebastian Pott

15. Standard wet laboratory equipment (e.g., pipette set, serologi-


cal pipette, 4 °C refrigerator, and - 20 °C freezer).

2.2.2 Sequencing 1. Access to Next Generation Sequencing platform: Illumina-


based sequencing platform (e.g., NovaSeq 6000).

3 Methods

3.1 Assay 1. Prepare digestion mix on ice. Deliver 2 μL of mix to every well
of two 384-well plates. Plates with digestion mix can be
3.1.1 Nuclei Isolation and
prepared the day before the experiment and stored at 4 °C
GpC Methyltransferase
(see Notes 1 and 2).
Treatment
2. Obtain a suspension of single cells. This protocol was opti-
mized for use with a total of 5–10 million cells. Centrifuge
single cell suspension at 500 × g for 5 min at 4 °C, remove the
supernatant, suspend in 1 mL ice-cold PBS, and centrifuge the
sample again at the same settings. Discard the supernatant and
suspend in 1 mL 1X RSB buffer. Incubate for 10 min at room
temperature.
3. Add 15 μL of 1% NP-40 to the cell suspension (NP-40 con-
centration may need to be adjusted depending on the cell
type). Transfer the cell suspension to a 2 mL dounce tissue
grinder and add 1 mL of 1X RSB. Homogenize cell suspension
using 15 strokes of both pestle A and B (number of strokes may
be adjusted to accommodate the particular cell-/tissue-type
being handled). Transfer lysed cells to a new 1.5 mL Eppendorf
tube and centrifuge at 800 × g for 5 min at 4 °C. Discard the
supernatant and wash with 1 mL 1X RSB. Incubate for 30 s to
1 min at room temperature. Centrifuge at 800 × g for 5 min at 4 °C.
4. Resuspend the nuclei in 1X GpC Methyltransferase buffer such
that there are one million nuclei per 75 μL of buffer. If there are
less than one million nuclei, suspend in 75 μL of buffer. Mean-
while, prepare GpC Methylase Reaction Mix. Add the 75 μL of
nuclei to the reaction mix and incubate at 37 °C for 7.5 min.
Add a boost of 25 μL GpC Methyltransferase and 0.75 μL
32 mM SAM and incubate for another 7.5 min at 37 °C (see
Note 4).
5. Quench the reaction by adding 500 μL 1X PBS and spin at
800 × g for 5 min at 4 °C. Resuspend in 0.5–1 mL of 1X PBS
and add 2 drops of Hoechst per mL of sample (1 drop for
0.5 mL, two drops for 1 mL). Keep the sample on ice for
15 min before commencing with fluorescence-activated cell
sorting (FACS).
scNOMe-Seq 239

250k
5
10 Gate 2: 84% (37.1%) 105
200k

104 150k 104

FSC-H
SSC-A

SSC-A
100k
103 10
3
50k single nuclei
Gate 1: 44.2% Gate 3: 18.7% (6.9%)
2
10 0 102
50k 100k 150k 200k 250k 50k 100k 150k 200k 250k 102 103 104 105
FSC-A FSC-A BV421-A

Fig. 3 Example of a gating strategy during FACS sorting. Individual nuclei were selected based on size and
DNA content. Percentages provide proportion of events within a particular gate for each scatter plot; the
proportion of total events is indicated in parenthesis

3.1.2 Fluorescence- 1. We use the BD FACSAria™ Fusion system to sort individual


Activated Cell Sorting nuclei into 384-well plates. However, any system with this
capability should suffice.
2. Our gating strategy is focused on recovering intact single nuclei
and excluding cellular debris (Fig. 3). This needs to be adjusted
based on the input material.
3. Sort a single nucleus into each well of the 384-well plates being
processed. Place the plates on ice when sorting is complete.

3.1.3 Bisulfite 1. Prepare CT Conversion reagent and add 15 μL of it to each


Conversion well of two 384-well reaction plates. Pipette up and down
8 times to mix the sample. Seal the plates and quick spin at
2000 × g for 10 s at room temperature. Place plates into a
thermocycler able to accommodate 384-well plates, and run
the following program:
(a) 98 °C for 8 min
(b) 64 °C for 3.5 h
(c) Hold at 4 °C.
(see Notes 1, 5, and 11)
2. Prior to purifying bisulfite-converted samples, prepare Ran-
dom Primer Solutions (eight separate solutions for primers
1–8). Keep primer solutions on ice until use.
3. Place two Zymo-Spin 384-Well DNA Binding Plates on two
96-Well Polypropylene DeepWell™ Storage Plates and add
80 μL of M-Binding buffer to each well. Transfer bisulfite-
converted samples to the 384-Well DNA Binding Plates and
pipette up and down 8 times to mix the samples. Centrifuge at
5000 × g for 5 min at room temperature (see Notes 1, 6,
and 10).
240 Michael Wasney and Sebastian Pott

Plate 1 Plate 2

1 2 3 4 1 2 3 4
A A

B B

C C

D D

Barcode 1 Barcode 5

Barcode 2 Barcode 6

Barcode 3 Barcode 7

Barcode 4 Barcode 8

Fig. 4 Loading schema for primers in the two 384-well plates used for random
priming step. Pattern shown for Wells A 1–2 and B 1–2 in plates 1 and
2, respectively, is repeated across the entire plate

4. Discard the flow-through in the 96-Well Storage Plates and


add 100 μL M-Wash buffer to each well of the 384-Well DNA
Binding Plates. Centrifuge at 5000 × g for 5 min at room
temperature (see Note 1).
5. Discard the flow-through in the 96-Well Storage Plates and
add 50 μL M-Desulphonation buffer to each well of the
384-Well DNA Binding Plates. Incubate at room temperature
for 15 min and then centrifuge at 5000 × g for 5 min at room
temperature (see Note 1).
6. Discard the flow-through in the 96-Well Storage Plates. Add
100 μL M-Wash buffer and centrifuge at 5000 × g for 5 min at
room temperature. Repeat this wash step once more (see
Note 1).
7. Place 384-Well DNA Binding Plates on 384-Well Reactions
Plates (the two 96-Well Storage Plates can be disposed). Add
7 μL of one of the eight random primer solutions to each well
of each plate such that half of the primers are delivered to one
plate and the other half are delivered to the other plate such
that every other well – along both ranks and files of the plates –
contains the same primer (Fig. 4). Once primers have been
added to every well, incubate the plates for 5 min and then
centrifuge at 5000 × g for 5 min at room temperature. Seal the
384-Well Reaction Plates and store at -20 °C for up to
1 week (see Notes 1 and 10).
scNOMe-Seq 241

3.1.4 Random-Primed 1. Prepare Random Priming Master Mix prior to denaturing


DNA Synthesis samples and keep the mix on ice (see Note 7).
2. Denature the samples by placing 384-well plates in a thermo-
cycler and run the following program:
(a) 95 °C for 3 min.
3. Place the plates on ice for 2 min.
4. Add 5 μL Random Priming Master Mix to each well of the
384-well reaction plates. Vortex to mix and quick spin at
2000 × g for 10 s at room temperature (see Notes 1 and 10).
5. Place the plates into a thermocycler and run the following
program:
(a) 4 °C for 5 min
(b) 25 °C for 5 min
(c) 37 °C for 60 min
(d) Hold at 4 °C (see Note 11).

3.1.5 Inactivation of Free 1. Prepare Exo/rSAP Master Mix and keep on ice. Add 1.5 μL to
Primers and dNTPs each well of the 384-well reaction plates. Vortex to mix and
quick spin at 2000 × g for 10 s at room temperature (see Notes
1, 8, and 10).
2. Place the plates into a thermocycler and run the following
program:
(a) 37 °C for 30 min
(b) Hold at 4 °C (see Note 11).

3.1.6 Sample Clean-Up 1. Prepare 14 mL of SPRI beads (see Note 12).


2. Add 73.6 μL (0.8×) of SPRI beads to each well of a clean
96-well plates. Pool the samples from the two 384-well plates
in the wells of the 96-well plate such that each well of the
96-well plates holds a pool of eight samples, each with a distinct
random barcode. Vortex the plates briefly and incubate for
5 min at room temperature (see Notes 1 and 10).
3. Quick spin at 2000 × g for 10 s at room temperature and then
place the 96-well plate on a DynaMag™-96 Side Skirted Mag-
net and let stand until the solution is clear of beads.
4. Remove the supernatant and wash beads 3 times with 150 μL
fresh 80% ethanol. After the third wash, remove the ethanol
and allow the beads to dry at room temperature. Take care to
not overdry beads (see Notes 1 and 10).
5. Remove the plate from the magnet, add 10 μL of Elution
Buffer (Qiagen), and suspend beads by pipetting. Vortex the
plates briefly and incubate for 5 min at room temperature (see
Notes 1 and 10).
242 Michael Wasney and Sebastian Pott

6. Quick spin at 2000 × g for 10 s at room temperature and then


place the 96-well plate on a DynaMag™-96 Side Skirted Mag-
net and let stand until the solution is clear of beads.
7. Transfer 10 μL of the supernatant from each well to a clean
96-well plate. Store at -20 °C or move on to the next step of
the protocol (see Notes 1 and 10).

3.1.7 Adaptase Reaction 1. Prepare Adaptase Master Mix prior to denaturing the sample
and keep mix on ice.
2. Denature the samples by placing the 96-well plate in a thermo-
cycler and run the following program:
(a) 95 °C for 3 min.
3. Place the plate on ice for 2 min.
4. Add 10.5 μL Adaptase Master Mix to each well of the 96-well
plate. Vortex to mix and quick spin at 2000 × g for 10 s at room
temperature (see Notes 1 and 10).
5. Place the plate in the thermocycler and run the following
program:
(a) 37 °C for 30 min
(b) 95 °C for 2 min
(c) Hold at 4 °C (see Note 11).

3.1.8 Library 1. Prepare P5L and P7L primer mixes. Add 5 μL of the appropri-
Amplification ate primers to the each well of a clean 96-well plate such that
each well has a unique P5L–P7L combination (keep note of
each combination’s location in the plate). Transfer 5 μL of each
P5L–P7L combination to the corresponding well in the
96-well plate containing the pooled samples (see Notes 1 and
10).
2. Add 25 μL 2X Kapa Hifi Mix to each well of the 96-well plate
containing the samples. Vortex to mix and quick spin at
2000 × g for 10 s at room temperature (see Notes 1 and 10).
3. Place the plate in a thermocycler and run the following
program:
(a) 95 °C for 2 min
(b) 98 °C for 30 s
(c) 98 °C for 15 s
(d) 64 °C for 30 s
(e) 72 °C for 2 min
Return to step c 14 times for a total of 15 cycles.
(f) 72 °C for 5 min
(g) Hold at 4 °C (see Notes 11 and 13).
scNOMe-Seq 243

3.1.9 Library Clean-Up 1. Add 40 μL SPRI beads to each well of the 96-well plate. Vortex
the plate briefly and incubate for 5 min at room temperature
and then quick spin at 2000 × g for 10 s at room temperature.
Place the plate on a DynaMag™-96 Side Skirted Magnet and
allow the solution to clear of beads (see Notes 1, 10, and 12).
2. Remove the supernatant and wash beads twice with 150 μL of
freshly made 80% ethanol. After the final wash, remove the
plate from the magnet and allow beads to dry at room temper-
ature. Take care to not overdry beads (see Note 1).
3. Add 25 μL Elution Buffer (Qiagen) and suspend beads by
pipette. Place back on the DynaMag™-96 Side Skirted Magnet
and allow the solution to clear of beads. Combine the superna-
tant from each column into 12 Eppendorf tubes such that there
is one Eppendorf tube per 96-well plate column. Add 160 μL
(0.8×) SPRI beads to each of the 12 Eppendorf tubes. Pipette
to mix and incubate for 5 min at room temperature (see Notes
1 and 10).
4. Place the Eppendorf tubes on a DynaMag™-2 Magnet and
allow the solution to clear of beads. Discard the supernatant
and wash the beads 2 times with 500 μL of 80% ethanol. After
the second wash, remove all ethanol and allow the beads to dry
at room temperature. Take care to not overdry beads (see Note
10).
5. Add 40 μL Elution Buffer (Qiagen) and suspend beads by
pipette. Incubate for 5 min at room temperature. After incuba-
tion, transfer 40 μL of the supernatant to 12 new Eppendorf
tubes (see Note 10).
6. Measure concentration of the libraries using a Qubit Fluorom-
eter and assess the fragment size distribution with an Agilent
2100 Bioanalyzer. Fragment sizes should fall between 300 and
1500 bp (Fig. 5). On the fluorometer, libraries with a concen-
tration of 2–15 ng/μL are to be expected. If concentration and
size distributions are as expected, proceed with sequencing of
the libraries.

3.2 Sequencing 1. Using an Illumina-based next generation sequencing platform


(e.g., NovaSeq 6000), sequence the libraries using pair end
sequencing with a 200 bp cassette. We generally aim to obtain
500,000–one million reads per cell (see Note 9).

3.3 Analysis A full description of the analysis is outside of the scope of this protocol
describing the steps top generate scNOMe-seq libraries for sequenc-
ing. We provide an example of a processing pipeline for raw scNOMe-
seq data at [https://github.com/sebpott/scNOMe_smk].
244 Michael Wasney and Sebastian Pott

Fig. 5 Expected size distribution of scNOMe-seq library pools. Bioanalyzer profile shows size distribution of a
representative pool of 64 individual scNOMe-seq libraries after final amplification

4 Notes

1. All steps involving 96- and 384-well plates should be per-


formed with a 12-tip multichannel pipette. Reagents can be
split between 12-tube rows of strip tubes using a single channel
and then transferred to their final destinations in the 96- and
384-well plates using a multichannel pipette.
2. M-Digestion buffer can form a white precipitate, which can be
dissolved by keeping the buffer at 37 °C for 30 min prior
to use.
3. Make SPRI fresh for each experiment. Before each use, allow
beads to warm to room temperature for 30 min and vortex
vigorously.
4. In practice we have observed relatively small changes in global
GpC levels between a single 7.5 min incubation period and
double that time. However, in order to reach saturation, we
continue to use ~15 min total incubation time.
5. It takes roughly 10 min of vigorous vortexing for the powdered
CT Conversion reagent to go into the solution, and per the
manufacturer’s instruction manual, it is normal to see trace
amounts of undissolved reagent even after extensive mixing.
6. It is best to use the two 384-well plates as balances for each
other for all centrifugations during the sample purification
portion of the bisulfite conversion workflow. During steps
that require a bench rest, begin the timer only after the reagent
has been delivered to the final row of the second plate.
scNOMe-Seq 245

7. Prepare all master mixes in advance of their respective steps to


avoid prolonged waiting times of the sample on ice or in the
thermocycler.
8. The Exo/rSAP Master Mix is very viscous, which can make it
difficult to aspirate equal amounts of fluid in all channels of a
multichannel pipette. During this step, it is crucial to visually
check the amount of fluid in each channel prior to delivering
the solution to each well of the 384-well plates.
9. Sequencing parameters may change depending on the charac-
teristics of your libraries, and how many finals pools you choose
to submit. For example, a cassette with more base pairs may be
desirable to obtain additional sequencing from each fragment
for libraries with longer average fragment length. Choice of
flow cell should be determined by the number of pools that are
being multiplexed to achieve the optimal number of reads
per cell.
10. Pipette tips should be replaced whenever they come in contact
with the plates or with fluid in the plates’ wells (this includes
most, but not all, multichannel pipetting steps). When pooling
barcoded samples (i.e., 3.1.6.2 and 3.1.9.3), the same pipet
tips can be used for samples that end up in the same
pooled well.
11. If working with a thermocycler that can only handle a single-
plate reaction plate at once, process the plates sequentially; that
is, perform a step for the first of two plates and store that plate
at 4 °C as the same step is performed on the second plate. After
both plates are complete, move on to the second step.
12. SPRI beads should be prepared fresh for each experiment.
Allow the beads to warm to room temperature and mix vigor-
ously before use.
13. If the concentration of your libraries is lower than expected,
consider the following:
(a) Increase the cycle number (e.g., 16 or 17) to achieve more
highly concentrated libraries. We have not observed a
huge increase in redundant reads when increasing the
amplification cycle number by one or two.
(b) Ensure good pipetting technique, particularly because
multichannel pipettes can be more difficult to use than
normal ones. If not used properly, not all channels will
aspirate the same volume of liquid. Review the directions
provided by your multichannel pipette’s manufacturer
prior to use.
246 Michael Wasney and Sebastian Pott

14. Manual processing of four 384-well plates works well for


us. Scaling the assay up beyond that might be difficult without
the right reagents and equipment, however. We routinely use
eight barcodes, which allows for two 384-well plates to be
pooled together (768 cells in total) as presented above. We
successfully used 16 barcodes as well, which allows for four
384-well plates to be pooled together (1536 cells in total).
Because each 384-well plate takes a significant amount of
time and labor to process, some level of automation (e.g., a
liquid handler and a thermocycler capable of accommodating
multiple 384-well plates at once) would likely be advisable for
scaling the assay beyond four plates.

References

1. Pott S (2017) Simultaneous measurement of Commun 12:1247. https://doi.org/10.


chromatin accessibility, DNA methylation, and 1038/s41467-021-21409-8
nucleosome phasing in single cells. elife 6: 9. Nabilsi NH, Deleyrolle LP, Darst RP et al
1127. https://doi.org/10.7554/elife.23203 (2013) Multiplex mapping of chromatin acces-
2. Clark SJ, Argelaguet R, Kapourani C-A et al sibility and DNA methylation within targeted
(2018) scNMT-seq enables joint profiling of single molecules identifies epigenetic heteroge-
chromatin accessibility DNA methylation and neity in neural stem cells and glioblastoma.
transcription in single cells. Nat Commun Genome Res 24. https://doi.org/10.1101/
9(1):9. https://doi.org/10.1038/s41467- gr.161737.113
018-03149-4 10. Kilgore JA, Hoose SA, Gustafson TL et al
3. Li L, Guo F, Gao Y et al (2018) Single-cell (2007) Single-molecule and population prob-
multi-omics sequencing of human early ing of chromatin structure using DNA methyl-
embryos. Nat Cell Biol 15(1):18. https://doi. transferases. Methods 41(320):332. https://
org/10.1038/s41556-018-0123-2 doi.org/10.1016/j.ymeth.2006.08.008
4. Kaya-Okur HS, Wu SJ, Codomo CA et al 11. Kelly TK, Liu Y, Lay FD et al (2012) Genome-
(2019) CUT & Tag for efficient epigenomic wide mapping of nucleosome positioning and
profiling of small samples and single cells. Nat DNA methylation within individual DNA
Commun 10:1930. https://doi.org/10. molecules. Genome Res 22(2497):2506.
1038/s41467-019-09982-5 https://doi.org/10.1101/gr.143008.112
5. Cao J, Cusanovich DA, Ramani V et al (2018) 12. Schübeler D (2015) Function and information
Joint profiling of chromatin accessibility and content of DNA methylation. Nature 517:
gene expression in thousands of single cells. 3 2 1 – 3 2 6 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 /
Science 361(1380):1385. https://doi.org/ nature14192
10.1126/science.aau0730 13. Lai WKM, Pugh BF (2017) Understanding
6. Chen S, Lake BB, Zhang K (2019) High- nucleosome dynamics and their links to gene
throughput sequencing of the transcriptome expression and DNA replication. Nat Rev Mol
and chromatin accessibility in the same cell. Cell Biol 18:548–562. https://doi.org/10.
Nat Biotechnol 37:1–6. https://doi.org/10. 1038/nrm.2017.47
1038/s41587-019-0290-0 14. Argelaguet R, Clark SJ, Mohammed H et al
7. Chen AF, Parks B, Kathiria AS et al (2021) (2019) Multi-omics profiling of mouse gastru-
NEAT-seq: simultaneous profiling of intra- lation at single-cell resolution. Nature 1–5.
nuclear proteins, chromatin accessibility, and https://doi.org/10.1038/s41586-019-
gene expression in single cells. Biorxiv 1825-8
2021.07.29.454078. https://doi.org/10. 15. Li E, Zhang Y (2014) DNA methylation in
1101/2021.07.29.454078 mammals. CSH Perspect Biol 6:a019133.
8. Wang Y, Yuan P, Yan Z et al (2021) Single-cell https://doi.org/10.1101/cshperspect.
multiomics sequencing reveals the functional a019133
regulatory landscape of early embryos. Nat 16. Luo C, Rivkin A, Zhou J et al (2018) Robust
single-cell DNA methylome profiling with
scNOMe-Seq 247

snmC-seq2. Nat Commun 9:3824. https:// 18. Miranda TB, Kelly TK, Bouazoune K, Jones PA
doi.org/10.1038/s41467-018-06355-2 (2010) Methylation-sensitive single-molecule
17. Luo C, Keown CL, Kurihara L et al (2017) analysis of chromatin structure. Curr Protoc
Single-cell methylomes identify neuronal sub- Mol Biology Ed Frederick M Ausubel Et Al
types and regulatory elements in mammalian Chapter 21:Unit 21.17.1 16. https://doi.
cortex. Sci New York NY 357:600–604. org/10.1002/0471142727.mb2117s89
https://doi.org/10.1126/science.aan3351
Chapter 13

Massively Parallel Profiling of Accessible Chromatin


and Proteins with ASAP-Seq
Eleni P. Mimitou, Peter Smibert, and Caleb A. Lareau

Abstract
While methods such as the Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) enable a
comprehensive characterization of regulatory DNA, additional measurements are required to characterize
the multifaceted nature of eukaryotic cells. Here, we delineate the ATAC with Select Antigen Profiling by
sequencing (ASAP-seq) protocol, a scalable approach to quantifying proteins via oligo-tagged antibodies
alongside accessible DNA in thousands of single cells. Critically, our method utilizes a custom bridge oligo
that enables the utilization of a variety of oligo-conjugated antibodies, enabling the utilization and
repurposing of other commercial products. The ASAP-seq method can be completed with straightforward
experimental and computational modifications existing single-cell ATAC-seq workflows but yields distinct
modalities underlying complex cellular states, including estimation of protein abundance on the cell surface
as well as intracellular and intranuclear factors.

Key words Multimodal, Single-cell, Protein, Accessible chromatin, ATAC, Intracellular, Gene
regulation

1 Introduction

The massively parallel measurement of chromatin accessibility and


transcriptomes within single cells has catalyzed a rapidly increasing
number of studies that characterize cellular heterogeneity in
biological systems. Specifically, droplet-based single-cell ATAC-
seq (scATAC-seq) and scRNA-seq enable the comprehensive char-
acterization of genome-wide chromatin accessibility and cellular
polyadenylated transcripts in thousands of individual cells within a
single experiment. Though these methods have enabled many fun-
damental insights of the biology underlying complex tissues, both
scATAC-seq and scRNA-seq suffer from data sparsity, wherein
most accessible loci and most genes are not measured in most
cells. This sparse sampling of the underlying cellular features can
complicate downstream analyses and inferences, particularly in
identifying cell state features that delineate closely related cell

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_13,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

249
250 Eleni P. Mimitou et al.

types. Hence, there has been increasing importance of the develop-


ment and application of single-cell multi-omic technologies that
measure multiple modalities from individual cells.
To this end, several recent methods have combined targeted
detection of select protein markers with scRNA-seq [1–4]. Concep-
tually, these methodologies synthesize decades of knowledge of
specific cell types and states obtained by cytometry-based
approaches with the mostly unbiased readout from the transcrip-
tome. Furthermore, the pairing of a sparse genome-wide RNA
measurement with a sensitive protein-based quantification for a
smaller number of targets simultaneously enables both systematic
discovery of genes associated with cellular phenotypes while retain-
ing high-confidence inference for a selected subset of proteins.
Importantly, recent work has established sophisticated computa-
tional algorithms that demonstrate that combining multiplexed
protein detection with scRNA-seq resolves cell types better than
either modality alone [5], reinforcing the utility of this single-cell
multi-omic approach.
Chromatin accessibility is an additional modality that is now
routinely used to characterize single cells in high-throughput by a
variety of different single-cell Assay for Transposase Accessible
Chromatin (ATAC-seq) approaches [6–8]. In many circumstances,
such as development, scATAC may provide a more sensitive mea-
sure of the continuum of cell states, particularly in differentiation
settings where epigenetic reprogramming may be the first mover
[9]. However, many complications arise from the accessible chro-
matin measurements derived from scATAC-seq. First, per-cell spar-
sity tends to be more extreme due in part to ~5–10× more features
(accessible chromatin peaks) than scRNA-seq (genes). Additionally,
inferences of gene activity scores rely on the (weighted) summation
of accessibility fragments overlapping or near gene bodies. How-
ever, as many regulatory elements overlap gene bodies but control
the expression of other loci, this method provides an imperfect
determination of the genes that are actively transcribed, much less
translated, in any given cell. Thus, fine-grained cell-type identifica-
tion from scATAC-seq data alone is channeling, often requiring
complementary scRNA-seq data for high-quality annotations.
To remedy these issues with scATAC-seq data, we recently
introduced ATAC with Select Antigen Profiling by sequencing
(ASAP-seq) to combine robust detection of proteins with chroma-
tin accessibility [10]. In practice, the ASAP-seq workflow builds on
the mitochondrial scATAC-seq (mtscATAC-seq; see other chap-
ters) method that enables scATAC-seq to be performed on whole
cells [11] (Fig. 1). As a consequence of the modified protocol, the
cell remains intact for high-quality estimation of protein abun-
dances and accessible chromatin with minimal modifications to
commercial products. Notably, ASAP-seq enables a number of
tunable options, including (1) the use of “hashing” to multiplex
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 251

Fig. 1 Schematic of the experimental assay. ASAP-seq allows whole cell input into the scATAC-seq workflow,
maintaining the connection between nuclear content and cell surface marker information. Cells are stained
with oligo-conjugated antibodies followed by fixation, permeabilization, and Tn5 transposition. Bridge oligos
are spiked in the barcoding mix prior to droplet formation to allow simultaneous barcoding of ATAC fragments
and antibody-derived oligos

samples; (2) the use of different types of antibody reagents for


protein detection; (3) the detection of intracellular proteins
through minor modifications; and (4) the ability to either enrich
or deplete reads derived from mitochondrial DNA, which can be
used for inferring clonal relationships between cells in a
sample [11].
We note one key feature that is conceptually distinct for the
proteo-genomic capture in ASAP. Specifically, in contrast to other
methods that use exogenous oligonucleotides to either report on
protein abundance or enable sample multiplexing [1–4, 12, 13],
the oligonucleotide sequences that read out protein levels in ASAP-
seq do not directly interact with the barcoding reagents from the
parent ATAC-seq assay. Instead, ASAP-seq employs a bridging
oligo to convert existing labeling oligonucleotides into a format
that is compatible with the ATAC-seq kit, providing enhanced
flexibility for reagent use in ASAP-seq (Fig. 2). Here, the specifica-
tion of this part of the protocol has important implications for the
accessibility and usability of the assay as the bridging oligo enables
the immediate use of a large catalog of existing and available
reagents as well as combinations of different specifications of
reagents.
In this method description, we outline the foundational steps
requisite for enabling single-cell multi-omic profiling with ASAP-
seq technology [10]. We outline a synthesized experimental and
computational workflow that provides flexibility to quantify pro-
teins for downstream integrative analyses and identifies critical steps
associated with quality control of libraries. Taken together, ASAP-
seq enables the high-confidence quantification of selected intracel-
lular and surface antigens while retaining the comprehensive dis-
covery of accessible chromatin loci and clonality underlying cells.
252 Eleni P. Mimitou et al.

Fig. 2 Barcoding scheme of the protein tags using the bridge oligo strategy. Bridge oligo A (BOA) and bridge
oligo B (BOB) function as templates to extend the protein-derived oligos in droplets. While TSB tags (right)
contain UMIs, UBIs (N9V) are introduced to TSA tags via the bridge oligo (left) to allow molecule counting

2 Materials

2.1 Cell Processing, 1. Phosphate buffered saline (PBS) (any provider).


Staining, Fixation, and 2. CITE-seq staining buffer: 2% BSA, 0.01% Tween in PBS.
Lysis
3. Human TruStain FcX™ (BioLegend 422,301).
4. TotalSeq™-A or TotalSeq™-B oligo-labeled antibody reagents
(individually or as panels) (BioLegend – see Note 1).
5. FACS buffer: PBS with 1% FBS. Filtered at 0.45 μm, store at 4 °
C.
6. DAPI (any provider, for example BioLegend 422,801).
7. Formaldehyde, 16% (any provider, for example Thermo Fisher
28,906).
8. Glycine solution, 2.5 M (any provider, for example Ricca
Chemical RMB19103-50C2).
9. Tris–HCl pH 7.5, 1 M (any provider, for example Sigma-
Aldrich T2194).
10. NaCl, 5 M (any provider, for example Sigma-Aldrich 59222C).
11. MgCl2, 1 M (any provider, for example Sigma-Aldrich
M1028).
12. NP40, 10% (Sigma-Aldrich, 74385).
13. Tween 20, 10% (Bio-Rad, 1662404).
14. Digitonin 5% (Thermo Fisher, BN2006).
15. BSA, 10% (any provider, for example Miltenyi Biotec 130-091-
376).
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 253

Table 1
Oligo sequences

Oligo Sequence (shown 5′ > 3′) Notes


BOA TCGTCGGCAGCGTCAGA Used to bridge TSA tags
TGTGTATAAGAGACAGNNNNN 3′ modification to block extension
NNNNVTTTTTTTTTTTT Brings a 10-nt UBI ending in V (non-T)
TTTTTTTTTTTTTTTT/3InvdT
BOB TCGTCGGCAGCGTCAGATGTGTAT Used to bridge TSB tags
AAGAGACAGTTGCTAGGACC 3′ modification to block extension
GGCCTTAAAGC/3InvdT/
P5 AATGATACGGCGACCACCGA Forward primer to amplify TSA and
TSB tags
P7 CAAGCAGAAGACGGCATACGAGAT Reverse primer to re-amplify already
indexed tag libraries (optional)
RPxx CAAGCAGAAGACGGCATACGAGAT TruSeq Small RNA indexing primer,
xxxxxxxxGTGACTGGAGTTCCTT used to index TSA tags
GGCACCCGAGAATTCCA
D7xx CAAGCAGAAGACGGCATACGAGAT TruSeq DNA indexing primer, used to
xxxxxxxxGTGACTGGAGTTCA index TSB tags or TSA hashtags
GACGTGTGC

16. Intracellular staining buffer (BioLegend, custom part number


900002577). Supplement with fresh DTT before use to 1 mM
final concentration.
17. True-stain monocyte blocker (BioLegend, 426101).
18. DTT, 1 M (any provider, for example Sigma-Aldrich 646,563).
19. Flowmi Cell Strainer 40 μm (Bel-Art, H13680-0040).
20. Bridge oligo A (BOA) or bridge oligo B (BOB) (IDT, or other
provider, see Table 1, Note 2).
21. Indexing primers (IDT or other provided, see Table 1).

2.2 ASAP-Seq 1. 10x Genomics Chromium Next GEM Single Cell ATAC
Library Preparation Library & Gel Bead Kit, 16 or 4 rxns.
2. 10x Genomics Chromium Next GEM Chip H Single Cell Kit,
48 or 16 rxns.
3. 10x Genomics Single Index Kit N, Set A, 96 rxns.
4. 2x Kapa Hifi PCR mastermix.
5. SPRI beads (AMPure XP beads or KAPA Pure beads).
6. Custom oligonucleotides for library prep (see Table 1).
254 Eleni P. Mimitou et al.

2.3 Quality Control 1. Qubit dsDNA HS Assay Kit (Thermo Fisher Q32851 or
and Sequencing Q33230).
2. Agilent Bioanalyzer High Sensitivity DNA Analysis Kit
(or Tapestation or similar).
3. KAPA Library Quantification Kit for Illumina® Platforms
(KAPA biosystems KK4835).
4. Illumina NovaSeq or NextSeq reagent kits.

2.4 Software and 1. Download cellranger-atac and relevant reference files (https://
References Needed for suppor t.10xgenomics.com/single-cell-atac/software/
Computational pipelines/latest/what-is-cell-ranger-atac). The most up-to-
Analysis date reference files and versions of the software are available
online. This software will be used to demultiplex sequencing
libraries rom an Illumina sequencing run and can be executed
to process (see Note 3).
2. Install an up-to-date version of the Python 3 library either for
the system, the user, or through a conda environment (see
Note 4).
3. Download the kite antibody tag preprocessing toolkit. The
most up-to-date version of the software is available online at
https://github.com/pachterlab/kite. This software is used to
build a reference map of the oligonucleotide barcodes to the
respective antibody clones.
4. Download the kallisto and bustools software binaries. Current
versions of these software are available at https://github.com/
pachterlab/kallisto and https://github.com/BUStools/bus
tools, respectively. These software are utilities used to efficiently
count reads assigned to each antibody barcode for every cell
while efficiently correcting for sequencing errors.
5. Download the ASAP to kite script toolkit available online:
https://github.com/caleblareau/asap_to_kite. This code is
required to convert the ASAP-seq sequencing data into a for-
mat that are compatible with the existing kite | kallisto | bus-
tools workflows (see Note 5).
6. mgatk package and dependencies (https://github.com/
caleblareau/mgatk).
7. 10x scATAC barcode whitelist:
$ wget https://teichlab.github.io/scg_lib_structs/
data/737K-cratac-v1.txt. This file is available in the distribu-
tion of CellRanger-ATAC but is more accessible from the
indicated GitHub link.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 255

3 Methods

3.1 Cell Preparation, This section outlines the steps required to stain the cells with the
Fixation, and conjugated antibodies, followed by fixation and permeabilization.
Permeabilization The fixation steps are based on the mtscATAC-seq workflow (see a
separate chapter describing mtscATAC-seq in the same issue). Per-
meabilization can be performed using two alternative lysis buffers:
LLL (low loss lysis) and OMNI (based on OMNI-ATAC protocol
[14]), which is the default lysis buffer in the 10x Genomics scATAC
kit. LLL is the lysis buffer described in mtscATAC kit, which, due to
lack of Tween 20 in its formulation, retains mtDNA fragments in
the ATAC library that can be used for mtDNA variant tracing. In
benchmarking experiments, either LLL or OMNI buffers yielded
comparable ATAC and protein data and can be used interchange-
ably if mtDNA retention is not desired.

3.1.1 Cell Staining 1. Obtain single cell suspensions (filter if needed) and measure
viability and density. If viability is lower than 80%, proceed with
live cell enrichment and/or use best judgement depending on
sample source/importance/cell numbers.
2. Resuspend 1–2 million cells in 100 μL CITE staining buffer.
3. Add 10 μL Fc Blocking reagent.
4. Incubate for 10 min at 4 °C.
5. While cells are incubated in Fc Block, prepare the antibody
pool (panel or titrated amounts).
6. Add antibody-oligo pool to cells.
7. Incubate for 30 min at 4 °C.
8. Wash cells 3 times with 1 mL CITE staining buffer, spin at
300 × g for 5 min at 4 °C for every wash to harvest cells.
9. Resuspend cells in 450 μL room temperature PBS.

3.1.2 Cell Fixation and 1. Use about 0.5–1 million cells in 450 μL PBS for the fixation
Permeabilization reaction.
2. Add 30 μL 16% formaldehyde (1% final concentration), mix by
pipetting, and incubate at room temperature for 10 min with
occasional inversion.
3. Quench by adding glycine to final concentration 0.125 M.
4. Wash with 1× ice-cold PBS by filling up the tube, invert
5 times.
5. Spin at 400 × g for 5 min at 4 °C.
6. Discard supernatant and repeat wash with 1 mL 1×
ice-cold PBS.
7. Spin at 400 × g for 5 min at 4 °C, discard the supernatant.
256 Eleni P. Mimitou et al.

Table 2
Permeabilization buffers

Prepare fresh, keep on ice until use (see Note 3)

LLL lysis OMNI lysis Wash


Materials buffer buffer buffer
1 M Tris–HCl pH 7.5 10 mM 10 mM 10 mM
5 M NaCl 10 mM 10 mM 10 mM
1 M MgCl2 3 mM 3 mM 3 mM
10% NP40 0.1% 0.1% –
10% Tween 20 – 0.1% –
5% Digitonin – 0.01% –

8. Resuspended cell pellet in 100 μL chilled lysis buffer (LLL or


OMNI buffer, Table 2, see Note 6), mix by pipetting.
9. Incubate on ice, 3 min for primary cells, 5 min for cell lines.
10. Add 1 mL chilled wash buffer to the lysed cells, mix by
pipetting.
11. Spin at 500 × g for 5 min at 4 °C. If intracellular staining is
desired, go to Subheading 3.1.3.
12. Remove the supernatant, resuspend in 150 μL 1× nuclei buffer
(10x Genomics).
13. Filter through 40 μm strainers. If the cell number is low, skip
this step.
14. Count cells and adjust density according to 10× loading
instructions.
15. Proceed to Subheading 3.2.

3.1.3 Intracellular 1. Resuspend cell pellet from step 11 of Section 3.1.2 in 40 μL


Staining intracellular wash buffer.
2. Add 5 μL of FcX and 5 μL of monocyte block solution.
3. Incubate on ice for 15 min.
4. Add 50 μL of intracellular wash buffer, containing titrated
amounts of conjugated intracellular markers (see Note 7),
incubate on ice for 30 min.
5. Wash 3 times with the intracellular wash buffer, spin at 500 × g
for 5 min at 4 °C.
6. Remove the supernatant, resuspend in 150 μL 1× nuclei buffer
(10x Genomics).
7. Filter through 40 μm strainers. If the cell number is low, skip
this step.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 257

8. Count cells and adjust the density according to 10× loading


instructions.
9. Proceed to Subheading 3.2.

3.2 Transposition For this step, proceed according to 10x Genomics Single Cell
and Barcoding ATAC protocol (CG000168 Rev. D for v1 and CG000209 Rev.
D for v1.1; hereafter, ‘10x Protocol’) with the below modifications:
1. During the barcoding reaction (see step 2.1 of the 10x Proto-
col), spike in 0.5 μL of 1 μM bridge oligo. There is no dead
volume in the reaction, so final volume will be 65.5 μL for v1
and 60.5 μL for v1.1.
2. During GEM incubation (see step 2.5 of the 10x Protocol),
add a 5 min incubation at 40 °C at the beginning of the
protocol (see Note 8). Incubation protocol: 40 °C 5 min,
72 °C 5 min, 98 °C 30 s, 98 °C 10 s, 59 °C 30 s, 72 °C
1 min, cycle for a total of 12 times, hold at 15 °C.
3. During silane bead elution (see step 3.1o of the 10x Protocol),
add 43.5 μL of Elution Solution I and subsequently recover
43 μL. Keep 3 μL aside to use as input (see Note 9) in the tag
library PCR, and with the remaining 40 μL, proceed to SPRI
cleanup as per 10x protocol.
4. During SPRI cleanup (see step 3.2d of the 10x Protocol), save
the supernatant. For the bead bound fraction, proceed as per
10x protocol. For the supernatant fraction, add 32 μL SPRI, let
bind for 5 min. Collect beads on magnet, wash twice with 80%
EtOH, remove the remaining ethanol and elute beads in
42 μL EB (see Note 9). This can be combined with the 3 μL
left aside after the silane purification, as input in the TSA/TSB
indexing reaction:
50 μL 2x KAPA mix
2.5 μL primer P5 10 μM
2.5 μL indexing primer 10 μM (RPxx or D7xx, see Table 2)
3–45 μL input fragments (see Note 9)
100 μL total.
Incubation protocol: 95 °C 3 min, 95 °C 20 s, 60 °C 30 s,
72 °C 20 s, 72 °C 5 min, cycle for a total of 14–16 times, hold
at 4 °C.
5. Proceed with indexing the ATAC library as described in
Subheading 4.2 of the 10× protocol. Usually 10 cycles provide
sufficient material to perform library QC and sequencing. If
native nuclei are run in parallel, a noticeable reduction in PCR
yield can be observed with the fixed sample compared to native
nuclei (presumably due to fixation).
258 Eleni P. Mimitou et al.

Fig. 3 Representative fragment analyzer traces of the sequencing libraries. ATAC (top) and protein tag (bottom)
libraries of fixed human PBMCs permeabilized with OMNI lysis buffer (a) or LLL lysis buffer (b). Note the
increased abundance of the nucleosome-free region (size <300 bp) in the LLL library that corresponds to the
increased capture of mtDNA fragments (arrow)

3.3 Library QC, We recommend to quantify all libraries in three sequential steps:
Pooling and
1. Qubit: use 1 μL undiluted library for total nucleic acid mass.
Sequencing
2. Fragment analyzer (see examples in Fig. 3): preferably Agilent
3.3.1 Library QC BioA (if not available, Tapestation or PerkinElmer LabChip GX
can be used). Run ~1–3 ng of each library based on the Qubit
read. The fragment analyzer will provide the size distribution
of the library and a more accurate quantification of the
expected-for-each-library size fragment/population.
3. KAPA qPCR: prepare 4 nM dilutions of each library based on
quantification by BioA and record dilution. Follow the KAPA
manual instructions for quantification of the “clusterable” frag-
ments (fragments containing P5/P7 sequences). This will be
the most accurate concentration read for sequencing purposes
(see Note 10).

3.3.2 Pooling and Prepare equimolar concentrations of each library (concentration


Sequencing requirement will depend on instructions from the sequencing facil-
ity) and pool together in ratios that will satisfy sequencing depth
requirements.
We recommend a minimum depth of:
• 30 k reads/cell for the ATAC library
• 3 k reads/cell for TSA/TSB hashtag libraries
• 150 reads per antibody/cell for the surface panel. As a rule of
thumb, we allocate ~5–10 k reads for panels up to 50 antibodies,
10–20 k reads for panels up to 150 antibodies, and > 25 k reads
for panels >200 antibodies.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 259

ATAC and TSA/TSB libraries can be sequenced on the same


flow cell. We recommend that protein tag libraries should not
occupy more than ~50% of the flow cell because beyond the first
10–15 nt, both Rd1 and Rd2 enter a low-diversity region (polyT/A
or 10× capture sequence), resulting in a decreased data quality that
can negatively impact ATAC fragment mapping. However, we note
that we have not systematically evaluated relative loading abun-
dances for the ATAC and protein tag library. We have used the
Illumina NextSeq and NovaSeq reagents kits and respective
sequencing platforms. A minimum of 75-cycle kit with recipe [34,
8, 16, 34] is sufficient if you are not intending to retain mtDNA
reads. For experiments that plan to retain mtDNA for genotyping,
we recommend using longer reads to obtain high coverage of the
mitochondrial genome for variant calling. In this setting, we typi-
cally utilize a 150-cycle kit with a [72, 8, 16, 72] recipe. We note
that the full length expected molecules per modality are shown in
Box 1.

Box 1: ASAP-Seq Tag Libraries Structure and Sequencing Schemes

ASAP-seq ADT in TotalSeq™-A format: Final library


UBI

READ 1 --> ••••••••••


i7 index read --> ••••••••
5’AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNNNVTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTVxxxxxxxxxxxxxxxTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACxxxxxxxxATCTCGTATGCCGTCTTCT
GCTTG
3’TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTCNNNNNNNNNBAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAABxxxxxxxxxxxxxxxACCTTAAGAGCCCACGGTTCCTTGAGGTCAGTGxxxxxxxxTAGAGCATACGGCAGAAGA
CGAAC
i5 ••••••••••••••••
••••••••••••••• <-- read 2
Cell barcode (16)
antibody barcode

Sequencing for ASAP-seq with TotalSeq-A Read Length ATAC Protein Tag
protein detection (spiked into ATAC run) Read 1: 50 Genomic fragment 1-10 =
UBI
i7: 8 sample index sample index
i5: 16 cell barcode cell barcode
Read 2: 50 Genomic fragment 1-15 = antibody tag

(continued)
260 Eleni P. Mimitou et al.

Box 1 (continued)

ASAP-seq Hashtag in TotalSeq™-A format: Final library


UBI
READ 1 --> ••••••••••
i7 index read --> ••••••••
5’AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNNNVTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTVxxxxxxxxxxxxxxxAGATCGGAAGAGCACACGTCTGAACTCCAGTCACxxxxxxxxATCTCGTATGCCGTCTTC
TGCTTG
3’TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTCNNNNNNNNNBAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAABxxxxxxxxxxxxxxxTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGxxxxxxxxTAGAGCATACGGCAGAAG
ACGAAC
i5 ••••••••••••••••
••••••••••••••• <-- read 2
Cell barcode (16)
antibody barcode

Sequencing for ASAP-seq with TotalSeq-A Read Length ATAC Protein Tag
Hashtag detection (spiked into ATAC run) Read 1: 50 Genomic fragment 1-10 = UBI
i7: 8 sample index sample index
i5: 16 cell barcode cell barcode
Read 2: 50 Genomic fragment 1-15 = hashtag

ASAP-seq ADT or Hashtag in TotalSeq™-B format: Final library

READ 1 -->
•••••••••••••••••••••••••••••••••••••••••••••••••• i7 index read --> ••••••••
5’AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTGCTAGGACCGGCCTTA
AAGCNNNNNNNNNxxxxxxxxxxxxxxxNNNNNNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCACxxxxxxxxATCTCGTATGCCGTCTTC
TGCTTG
3’TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTCAACGATCCTGGCCGGAAT
TTCGNNNNNNNNNxxxxxxxxxxxxxxxNNNNNNNNNNTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGxxxxxxxxTAGAGCATACGGCAGAAG
ACGAAC
i5 ••••••••••••••••
•••••••••••••••••••••••••••••••••• <-- read 2
Cell barcode (16)
UMI / antibody barcode

Sequencing for ASAP-seq with TotalSeq-B Read Length ATAC Protein Tag
detection (spiked into ATAC run) Read 1: 50 Genomic fragment (discard)
i7: 8 sample index sample index
i5: 16 cell barcode cell barcode
Read 2: 50 Genomic fragment 1-10 = UMI1, 11-25 =
Antibody tag, 26-34 = UMI2

Sequencing this library alone will cause problems due to lack of sequence diversity in read 1. We highly recommend spiking this
into the ATAC libraries generated together in the same assay. UMI and tag barcode can be recovered from either read 1 or read
2. asap_to_kite uses read 2 by default. Please note the orientation of the i5 index read will be different depending on the Illumina
chemistry used. Refer to the 10x scATAC manual for guidance.

3.4 Demultiplex This section briefly summarizes the steps needed to demultiplex
Sequencing Data sequencing data to generate paired-end sequencing data associated
with all libraries on the flow cell. In the ASAP-seq multimodal
workflow, chromatin accessibility (and optionally mtDNA) are cap-
tured in the same library, whereas different libraries per protein
modality (hash tags, protein abundance, different oligonucleotide
backgrounds) will be present on distinct sequencing libraries.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 261

Table 3
Interpretation of quality control metrics from ASAP-seq protein mapping

Target
Processing step Metric value Debugging workflow
Step 5, Subheading Antibody barcode 85% 1. Verify indicated reference matches the
3.5. Pseudo- pseudo- experimental input.
alignmenta alignment rate 2. Verify the correct specification of the antibody
library (e.g., TSA or TSB).
3. Run FastQC to look for overrepresented
sequences that may correspond to known
barcodes.
4. If the FastQC quality-control report from the
demultiplexed flow cell indicates poor base
quality at bases associated with the antibody
barcode (see Fig. 4), consider using a
two-mismatch barcode dictionary in step 2 of
Subheading 3.5.
Calculation: # pseudo-aligned/# reads × 100%
Example: 8,498,079/9,809,110 × 100% = 86.6%
Step 6, Subheading Bead barcode 90% 1. If very low (<5%), check to see if the R2 was
3.5. Correct alignment rate correctly handled for a reverse-complement.
2. Check corresponding scATAC-seq library for
barcode alignment rate.
3. Examine top sequences for contamination or
other repetitive sequences.
Calculation: (# whitelist + # corrected)/# pre-corrected records × 100%
Example: (8,049,279 + 114,962)/8,498,079 × 100% = 96.1%
Step 9, Subheading UMI saturation rate 25– 1. This metric is not interpretable if the top two
3.5. Text 50% values are not of reasonable quality.
2. If >50%, sequencing is saturated and may
represent a low-quality library (unless
purposefully sequenced to saturation).
3. If <25%, additional sequencing is
recommended.
Calculation: [1 - (#Final UMIs/#pre-sorted records)] × 100%
Example: [1 - (6,120,282/8,164,241)] × 100% = 25.0%
a
By virtue of the kite reference, every k-mer up to 1 mismatch will be accounted for and then collapsed during the
quantification step. In this sense, though kallisto is a “pseudo-alignment” algorithm, the quantifications are absolute and
effectively a fast dictionary-based quantification

1. Build the sample sheet csv that specifies the indices for both the
ATAC library and the tag library (see Note 11).
2. Demultiplex sequencing data by running cellranger-atac
mkfastq. For example, $ cellranger mkfastq --id = asap_seq_-
demux --run=/path/to/flow_cell --csv = sample_sheet.
csv.
262 Eleni P. Mimitou et al.

Fig. 4 Schematic of library structure and computational preprocessing for ASAP-seq tag libraries. Colors
represent specific technical attributes of the read library. Colored arrows represent the data transformations in
the asap_to_kite.py tool. The resulting fastq files mimic scRNA-seq data and can be used in kallisto | bustools
for single-cell protein abundance estimation. For Total-Seq B, both UMIs can be used in the mapping
abundance but requires the execution of “kallisto bus” with custom parameters

Table 4
Linking experimental reagents to downstream bioinformatics workflows

Compatible Indexing Read asap_to_kite


Antibody family bridge oligo PCR primer configuration parameter
Totalseq-A (not hashtags) BOA RPxx Figure 4a -j TotalSeqA (default)
Totalseq-A hashtags BOA D7xx Figure 4a -j TotalSeqA (default)
Totalseq-B (including hashtags) BOB D7xx Figure 4b -j TotalSeqB

3. The resulting output folder (asap_seq_demux) will contain .


fastq.gz files for both the scATAC and protein tag libraries.

3.5 Process This section outlines the steps to take raw sequencing data and
Sequencing Data generate counts matrices of features per cell. As a reference, we
include Box 2 that contains a summary of values from a real-world
ASAP-seq library that was processed with the outlined workflow. In
Table 3, we provide context for idea values associated with various
steps in this pipeline, including ideas for debugging executions that
do not meet quality control standards.
1. Build a mismatch aware antibody barcode map using kite (see
Note 12): $ python kite/featuremap/featuremap.py Fea-
tureBarcodes.csv --header.
2. Build a kallisto index from the mismatch aware .fasta file pro-
duced by kite: $ kallisto index -i FeaturesMismatch.idx -k
15 FeaturesMismatch.fa.
3. For convenience in processing, define a bash variable related to
the specific library/sample to run the subsequent steps: $
sample = “ASAP_tag_Sample_ID”.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 263

4. Reformat sequencing reads for compatibility with kallisto using


the asap_to_kite toolkit. A summary of the transformation of
the data is shown in Fig. 4 and will depend on the antibody
library used for the exact hyperparameters (see Table 4): $
python asap_to_kite_v2.py -f asap_seq_demux/flow_cell
-s $sample -o “${sample}_pro”.
5. Pseudo-align the antibody barcode sequences to the estab-
lished reference: $ kallisto bus -i FeaturesMismatch.idx -x
10xv2 -t 8 -o $sample “${sample}_pro_R1.fastq.gz”
“${sample}_pro_R2.fastq.gz”.
6. Correct for sequencing errors in the $ bustools correct -w
737 K-cratac-v1.txt "${sample}/output.bus" -o "${sam-
ple}/output_corrected.bus"
7. Sort the bus file: $ bustools sort -o "${i}/output_sorted.
bus" "${i}/output_corrected.bus"
8. Generate the protein × cell counts matrix: $ bustools count -o
${sample} --genecounts -g FeaturesMismatch.t2g -e
"${sample}/matrix.ec" -t "${sample}/transcripts.txt"
"${sample}/output_sorted.bus"
9. Assess the number of unique UMIs captured: $ bustools text
"${sample}/output_sorted.bus" -p | wc -l
10. The resulting files, including "${sample}.mtx", "${sample}.
barcodes.txt", and "${sample}.genes.txt”, provide a matrix
markdown representation of the counts matrix.
11. In parallel, libraries for accessible chromatin can be processed
with appropriate bioinformatics pipelines, such as cellranger-
atac count (see Note 13).
12. For ASAP-seq libraries where mtDNA was also captured, addi-
tional processing of the mitochondrial genome is facilitated by
the mgatk package, which takes the outputs from the
cellranger-atac count to generate per cell heteroplasmy esti-
mates and a determination of high-confidence variants that
may be useful for clonal lineage tracing. We refer to the
mtscATAC-seq chapter for additional information on proces-
sing these mtDNA variants from the ASAP-seq libraries.

Box 2: Example of Quality Metrics from Running Steps in


Subheading 3.5. Key Metrics Are Indicated After the
Associated Computational Step
Step 3.5.5: Pseudoalignment.
processed 9,809,110 reads, 8,498,079 reads pseudo
aligned,

Step 3.5.6: Correct.


Found 737,280 barcodes in the whitelist.

(continued)
264 Eleni P. Mimitou et al.

Box 2. (continued)
Number of hamming dist 1 barcodes = 20,309,952.
Processed 8,498,079 bus records.
In whitelist = 8,049,279.
Corrected = 114,962.
Uncorrected = 333,838.

Step 3.5.7: Sort.


Read in 8,164,241 BUS records.

Step 3.5.9: Text.


Read in 6,120,282 BUS records.

3.6 Perform For downstream analyses, we recommend utilizing the Seurat/


Multimodal Analysis Signac toolkit for multimodal analyses of ASAP-seq data. While
other tools for scATAC-seq analyses, such as ArchR, provide flexi-
ble metadata annotations that facilitate incorporating protein levels
and mtDNA variants in broad cellColData slots, there is currently
limited out-of-the-box functionality for the multimodal capabilities
derived from ASAP-seq. Conversely, the Seurat/Signac toolkit
facilitates the normalization of protein abundances via the
centered-log-ratio (CLR) transformation (used before annotation
and association analyses with protein data) and complete suite of
functions needed for analysis of the accessible chromatin arm of
ASAP-seq. Notably, a comprehensive code repository of down-
stream analyses of ASAP-seq data, including trimodal analyses of
mtDNA, proteins, and chromatin accessibility and integrative ana-
lyses with CITE-seq data, is available online: https://github.com/
caleblareau/asap_reproducibility. We refer to this code repository
resource for more involved analysis vignettes but note the key
functions required for downstream/multimodal analyses of
ASAP-seq data.
1. Import chromatin accessibility counts via the customary
import functions in the Signac package.
counts < - Read10X_h5(filename = "filtered_-
peak_bc_matrix.h5")
2. Create a Seurat object via the ChromatinAssay functionality.
seurat_obj <- CreateSeuratObject(counts = Create-
ChromatinAssay(counts, ...), ...)
3. Import the output counts data from kite using the read.table
or fread function into a data matrix object (kite_counts).
Append this matrix to the Seurat object for just the cells that
are present in the object.
seurat_obj[["ADT"]] <- CreateAssayObject(counts =
kite_counts[,colnames(seurat_obj)])
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 265

4. The resulting seurat_obj should have multiple assay slots that


will enable both scATAC-based and protein-based analyses. We
refer the reader to appropriate scATAC (https://satijalab.org/
signac/articles/pbmc_vignette.html) and protein (https://
satijalab.org/seurat/articles/multimodal_vignette.html) vign-
ettes, respectively.
5. Generally speaking, we have elected to analyze our ASAP-seq
data utilizing the chromatin accessibility counts for feature
reduction, clustering, and two-dimensional projections while
keeping the protein data as an independent annotation of cell
features and cell states. However, we have also found that
utilizing bioinformatics methods, such as weighted nearest
neighbor (WNN), can enhance dimensionality reduction [5].
This can be easily accomplished using the existing WNN tuto-
rial but utilizing the latent semantic indexing (LSI) rather than
principal components analysis (PCA) dimensions from online:
https://satijalab.org/seurat/articles/weighted_nearest_neigh
bor_analysis.html

4 Notes

1. While planning an experiment the first step is to choose a family


of protein detection reagent. We recommend one of two types
of TotalSeq™ reagents from BioLegend: TotalSeq™-A or
TotalSeq™-B. These are commonly used for CITE-seq [2] or
10x Genomics “feature barcoding” with 3′ scRNA-seq, respec-
tively. While it is theoretically possible to couple protein detec-
tion using TotalSeq™-C reagents to scATAC with a specifically
designed bridge oligo, the sequence characteristics of the
TotalSeq™-C species make bridge oligo annealing inefficient,
and more importantly, the amplification handle on
TotalSeq™-C reagent make it impossible to separately amplify
the protein tag library from the ATAC library.
2. Purchase the specific bridge oligo for the TotalSeq™ reagents
you will use. Note that each family of TotalSeq reagent neces-
sitates several dependent downstream reagents/processing
steps (see Tables 1 and 4). While we have not used antibody:
oligo reagents other than TotalSeq™ products, the bridge
oligo strategy, with appropriate design modifications, should
be compatible with other products (e.g., BD AbSeq reagents,
10× CellPlex reagents).
3. Recent versions of the software and commonly used reference
genomes are available on the 10× support website https://
suppor t.10xgenomics.com/single-cell-atac/software/
pipelines/latest/installation. Certain applications may require
assembling a custom reference, particularly for cell inputs:
266 Eleni P. Mimitou et al.

https://support.10xgenomics.com/single-cell-atac/soft
ware/pipelines/latest/advanced/references.
4. Some dependency packages are also required to run the work-
flow, depending on the exact use case, and are documented
alongside the complementary tools.
5. The current version of the asap_to_kite toolkit contains custom
python scripts for performing this task of reformatting
sequencing data. Depending on the library input (either Total-
SeqA or TotalSeqB, or a mix), these software will have to be
run with custom parameters. See the GitHub repository for
more details.
6. If mtDNA retention is desired, use LLL lysis buffer.
7. So far we have used about 0.5–1 μg of antibodies during the
intracellular staining.
8. This extra step is not essential when using TSA products, but
increases efficiency in TSB capture.
9. You can use either as input in the tag indexing reaction or
combine when working with large antibody panels to increase
input complexity.
10. If KAPA qPCR is not an available option, use the molarity of
the expected fragments as measured by BioA.
11. An online tool to facilitate building the sample sheet is avail-
able: https://support.10xgenomics.com/single-cell-atac/soft
ware/pipelines/latest/using/bcl2fastq-direct. We note that
the index used for the tag libraries will not be available from
the tool and must be entered manually.
12. By default, the kite tool produces an off-by-one mismatch
k-mer dictionary. When using the kallisto tool for read
mapping, there is no error tolerance or incorporation of
sequence base qualities. Thus, building a mismatch index for
all possible off-by-one changes is essential to optimize data
yield.
13. The execution of this software can be performed modularly
without information from the antibody tag libraries. Other
single-cell ATAC preprocessing workflows can also be utilized
at this point.

Acknowledgments

C.A.L. is supported by a Stanford Science Fellowship and a Parker


Institute of Cancer Immunotherapy Scholarship.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq 267

References
1. Mimitou EP, Cheng A, Montalbano A et al 8. Cusanovich DA, Daza R, Adey A et al (2015)
(2019) Multiplexed detection of proteins, Multiplex single cell profiling of chromatin
transcriptomes, clonotypes and CRISPR per- accessibility by combinatorial cellular indexing.
turbations in single cells. Nat Methods 16: Science 348:910–914
409–412 9. Ma S, Zhang B, LaFave LM et al (2020) Chro-
2. Stoeckius M, Hafemeister C, Stephenson W matin potential identified by shared single-cell
et al (2017) Simultaneous epitope and tran- profiling of RNA and chromatin. Cell 183:
scriptome measurement in single cells. Nat 1103–1116.e20
Methods 14:865–868 10. Mimitou EP, Lareau CA, Chen KY et al (2021)
3. Peterson VM, Zhang KX, Kumar N et al Scalable, multimodal profiling of chromatin
(2017) Multiplexed quantification of proteins accessibility, gene expression and protein levels
and transcripts in single cells. Nat Biotechnol in single cells. Nat Biotechnol. https://doi.
35:936–939 org/10.1038/s41587-021-00927-2
4. Triana SH, Vonficht D, Jopp-Saile L et al 11. Lareau CA, Ludwig LS, Muus C et al (2020)
(2021) Single-cell proteo-genomic reference Massively parallel single-cell mitochondrial
maps of the hematopoietic system enable the DNA genotyping and chromatin profiling.
purification and massive profiling of precisely Nat Biotechnol. https://doi.org/10.1038/
defined cell states. bioRxiv s41587-020-0645-6
5. Hao Y, Hao S, Andersen-Nissen E et al (2021) 12. Stoeckius M, Zheng S, Houck-Loomis B et al
Integrated analysis of multimodal single-cell (2018) Cell hashing with barcoded antibodies
data. Cell 184:3573–3587.e29 enables multiplexing and doublet detection for
6. Satpathy AT, Granja JM, Yost KE et al (2019) single cell genomics. Genome Biol 19:224
Massively parallel single-cell chromatin land- 13. McGinnis CS, Patterson DM, Winkler J et al
scapes of human immune cell development (2019) MULTI-seq: sample multiplexing for
and intratumoral T cell exhaustion. Nat Bio- single-cell RNA sequencing using lipid-tagged
technol 37:925–936 indices. Nat Methods 16:619–626
7. Lareau CA, Duarte FM, Chew JG et al (2019) 14. Corces MR, Trevino AE, Hamilton EG et al
Droplet-based combinatorial indexing for (2017) An improved ATAC-seq protocol
massive-scale single-cell chromatin accessibil- reduces background and enables interrogation
ity. Nat Biotechnol 37:916–924 of frozen tissues. Nat Methods 14:959–962
Chapter 14

Concomitant Sequencing of Accessible Chromatin


and Mitochondrial Genomes in Single Cells
Using mtscATAC-Seq
Leif S. Ludwig and Caleb A. Lareau

Abstract
Mitochondria are unique organelles of eukaryotic cells that carry their own multicopy number and circular
genome. In most mammals, including humans and mice, the size of the chromosome is ~16,000 base pairs
and unlike nuclear DNA, mitochondrial DNA (mtDNA) is not densely compacted. This results in mtDNA
to be highly accessible for enzymes such as the Tn5 transposase, commonly used for accessible chromatin
profiling of nuclear chromatinized DNA. Here, we describe a method for the concomitant sequencing of
mtDNA and accessible chromatin in thousands of individual cells via the mitochondrial single-cell assay for
transposase accessible chromatin by sequencing (mtscATAC-seq). Our approach extends the utility of
existing scATAC-seq products and protocols as we (Nam et al, Nat Rev Genet 22:3–18, 2021) fix cells
using formaldehyde to retain mitochondria and its mtDNA within its originating cell, (Buenrostro et al,
Nat Methods 10:1213–1218, 2013) modify lysis conditions to permeabilize cells and mitochondria, and
(Corces et al, Nat Methods 14:959–962, 2017) optimize bioinformatic processing protocols to collectively
increase mitochondrial genome coverage for downstream analysis. Here, we discuss the essentials for the
experimental and computational methodologies to generate and analyze thousands of multiomic profiles of
single cells over the course of a few days, enabling the profiling of accessible chromatin and mtDNA
genotypes to reconstruct clonal relationships and studies of mitochondrial genetics and disease.

Key words Single cell multiomics, Accessible chromatin profiling, Mitochondrial DNA, Somatic
mutation, Lineage tracing, Pathogenic mutation, Mitochondrial disease

1 Introduction

Single cell genomic approaches have revolutionized our ability to


comprehensively characterize cellular states and cell types, revealing
a previously unappreciated heterogeneity and diversity of human
cells in health and pathology. More recently, these efforts have
focused on the integration and/or simultaneous detection of mul-
tiple high-dimensional data types from the same single cells,
thereby enabling an unprecedented depth of phenotyping of the
building blocks of our organs [1]. Within this realm, the single-cell

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_14,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

269
270 Leif S. Ludwig and Caleb A. Lareau

assay for transposase accessible chromatin by sequencing (scATAC-


seq) has enabled the characterization of cell type and state-specific
gene regulatory elements and transcription factor activities that
orchestrate gene expression activity. Notably, the initial versions
of the bulk ATAC protocol yielded up to 50% of sequencing reads
that mapped to mitochondrial DNA (mtDNA), which were origi-
nally perceived as a nuisance of the assay [2, 3]. Indeed, mtDNA is a
relatively small circular genome (~16,591 bp in humans) that is
present in multiple copies per mitochondria and thus per cell, and it
is readily tagmented by the Tn5 transposase used for ATAC-seq.
Leveraging these features of mtDNA and the ATAC-seq work-
flow, we recently described mitochondrial single-cell assay for trans-
posase accessible chromatin by sequencing (mtscATAC-seq), which
enables the massively parallel single cell mitochondrial DNA geno-
typing and concomitant accessible chromatin profiling across
thousands of cells. Conceptually, mtscATAC-seq extends the typi-
cal scATAC-seq workflow by using whole but fixed and permeabi-
lized cells as input, though the overall protocol remains largely
unchanged (Fig. 1). Due to the ease and power of this multimodal
method, mtscATAC-seq thereby enables multiple avenues of
research. One area includes the study of fundamentals of mitochon-
drial genetics, mutational patterns, and their consequences on cel-
lular function, including those of pathogenic mtDNA mutations
associated with congenital mitochondrial disorders, such as myo-
clonic epilepsy with ragged red fibers (MERRF) or mitochondrial
encephalomyopathy with lactic acidosis and stroke-like episodes

Fig. 1 Schematic of the mtscATAC-seq reaction. Fixed and permeabilized whole cells are used as input into
the Tn5 transposition reaction wherein both mtDNA and accessible chromatin are transposed within cells
before being input into the 10x Chromium controller microfluidic device. These key differences are colored in
the schematic. The remaining steps (grayscale) closely match the standard 10x scATAC-seq workflow
Concomitant Sequencing of Accessible Chromatin and Mitochondrial Genomes. . . 271

(MELAS) [4, 5]. Moreover, somatic mtDNA mutations may be


used as clonal markers to reconstruct cellular relationships and
dynamics, thereby enabling lineage and clonal tracing studies of
human cells in vivo in combination with cell (pheno)typing as
demonstrated in human hematopoiesis, different types of leukemia,
and solid cancers [6–8].
Here, we outline the essentials of the experimental workflow,
which we extensively tested with primary human hematopoietic
cells and the 10x Genomics scATAC v1 and NextGEM v1.1 kits.
We describe best practices that should, in principle, translate to
other cell types and organs of interest and important modifications
to standard scATAC-seq protocols, which primarily isolate nuclei
and/or specifically deplete mitochondria. We illustrate the (pre-)
processing of the resulting sequencing data to enable mtDNA
genotyping alongside accessible chromatin profiling and outline
avenues for further analysis.

2 Materials

2.1 Cell Processing, 1. Phosphate buffered saline (PBS).


Fixation, and Lysis 2. FACS buffer: PBS with 1% FBS. Filtered at 0.45 μm, store at 4 °
C.
3. Formaldehyde, 16%.
4. Glycine solution, 2.5 M.
5. Lysis buffer: 10 mM Tris–HCl pH 7.4, 10 mM NaCl, 3 mM
MgCl2, 0.1% NP40, 1% bovine serum albumin (BSA). Prepare
fresh and keep on ice until use (see Note 1).
6. Wash buffer: 10 mM Tris–HCl pH 7.4, 10 mM NaCl, 3 mM
MgCl2, 1% BSA. Prepare fresh and keep on ice until use (see
Note 1).
7. Flowmi Cell Strainer 40 μm.

2.2 mtscATAC-Seq 1. 10x Genomics Chromium Next GEM Single Cell ATAC
Library Preparation Library & Gel Bead Kit, 16 or 4 rxns (see Note 2).
2. 10x Genomics Chromium Next GEM Chip H Single Cell Kit,
48 or 16 rxns.
3. 10x Genomics Single Index Kit N, Set A, 96 rxns.

2.3 Quality Control 1. Qubit dsDNA HS Assay Kit.


and Sequencing 2. Agilent Bioanalyzer High Sensitivity DNA Analysis Kit.
3. Illumina NovaSeq or NextSeq reagent kits (see Note 3).
272 Leif S. Ludwig and Caleb A. Lareau

2.4 Computational 1. 10x Genomics Cell Ranger ATAC package (see Note 4)
Resources (https://support.10xgenomics.com/single-cell-atac/soft
ware/pipelines/latest/what-is-cell-ranger-atac).
2. mgatk package and dependencies (https://github.com/
caleblareau/mgatk) (see Note 5).

3 Methods

The protocol described here has been optimized for the use of
hematopoietic cell lines and primary human hematopoietic cells,
including peripheral blood or bone marrow-derived mononuclear
cells that have been obtained via the use of standard approaches
such as Ficoll-based gradient centrifugation. Specific populations of
interest may be enriched for example via flow cytometry-based
sorting. A high viability of cells (>95%) and a low residual granulo-
cyte/neutrophil content (<3%) is essential to obtain high-quality
mtscATAC-seq data. For primary hematopoietic cells, we have
processed these fresh or cryopreserved them using standard prac-
tices (e.g., in 90% FBS with 10% DMSO) with no significant loss of
data quality following thawing and processing of cells when com-
bined with sorting to ensure high viability of the input cell popula-
tion. For the centrifugation of cells, we recommend the use of
DNA LoBind microcentrifuge tubes and favor swinging-bucket
centrifuges compared to fixed angle rotors. For mtscATAC-seq
library preparation, we follow the Chromium Next GEM Single
Cell ATAC Reagent kits v1.1 user guide from 10x Genomics
(CG000209 Fev F) and only briefly describe modified steps as
outlined in Subheading 3.2.

3.1 Cell Processing, 1. Transfer 1 × 105 to 1 × 106 live cells to a 1.5 mL microcen-
Fixation, and Lysis trifuge tube and spin at 400 × g for 5 min at 4 °C. Discard the
supernatant without disrupting the cell pellet, resuspend and
wash cells in 1–1.5 mL FACS buffer, and spin at 400 × g for
5 min at 4 °C.
2. Discard the supernatant without disrupting the cell pellet,
gently flick the tube to loosen the cell pellet, and carefully
and completely resuspend the cells in 450 μL of room
temperature PBS.
3. Fix cells by adding formaldehyde to a final concentration of 1%
(e.g., by adding 30 μL of 16% formaldehyde), followed by
inversion of the tube for complete mixing. Incubate at room
temperature for 10 min and occasionally invert the tube.
4. Quench the fixation reaction by adding a glycine solution to a
final concentration of 0.125 M and invert the tube for com-
plete mixing. Add 950 μL PBS or FACS buffer, invert the tube
2–3× times, and spin at 400 × g for 5 min at 4 °C. Discard the
supernatant without disrupting the cell pellet, gently flick the
Concomitant Sequencing of Accessible Chromatin and Mitochondrial Genomes. . . 273

tube to loosen the cell pellet, resuspend cells and repeat the
wash with 1–1.5 mL FACS buffer, and spin at 400 × g for 5 min
at 4 °C.
5. Discard the supernatant without disrupting the cell pellet,
gently flick the tube to loosen the cell pellet, and add
200–300 μL ice-cold lysis buffer. Gently pipette up and down
3 times to completely resuspend the cells. Incubate on ice for
3 min, before adding 1 mL of ice-cold wash buffer and spin at
500 × g for 5 min at 4 °C (see Note 6).
6. Discard the supernatant without disrupting the cell pellet,
gently flick the tube to loosen the pellet, and resuspend cells
in freshly prepared 1x nuclei buffer provided by the 10x Geno-
mics scATAC-seq kit. Aim for a concentration of 2000–7500
cells/μL, as validated by counting an aliquot of cells mixed with
trypan blue using a hematocytometer (e.g., Neubauer
Improved) or a ThermoFisher Countess II or III automated
cell counter (see Note 7). If cell clumps are abundant, the cell
suspension may be filtered using, for example, 40 μm Flowmi
cell strainers. Immediately proceed with the next steps of the
protocol.

3.2 mtscATAC-Seq 1. Adjust the cell concentration as desired with 1x nuclei buffer,
Library Preparation following the recommendations by 10x Genomics. We typically
aim for a concentration of about 2500 cells/μL and note that
only 5 μL of cell suspension may be used for the tagmentation
reaction.
2. The cells are mixed with the transposition mix on ice in a
suitable PCR tube, followed by transposition at 37 °C before
proceeding with GEM generation and barcoding using linear
PCR and after GEM incubation cleanup. Please follow the
detailed instructions of the 10x Genomics user guide for
these steps without modifications (see Note 8).
3. For the library construction step involving the index PCR of
the mtscATAC-seq sample, we typically conduct 1–2 additional
cycles of PCR (see Note 9) before cleaning up of the libraries as
described.

3.3 Quality Control 1. The yield of the mtscATAC-seq libraries is determined using a
and Sequencing Qubit dsDNA HS Assay kit following the manufacturer’s
recommendations. We typically use 1 μL of the library and
typically yield 5–20 ng/μL depending on cell type and used
cell input.
2. The size distribution of the mtscATAC-seq library is assessed
using an Agilent 2100 Bioanalyzer system and a High Sensitiv-
ity DNA Analysis Kit using 1–10 ng of the library. Typical
bioanalyzer traces of libraries prepared with the original 10x
scATAC-seq and the modified mtscATAC-seq protocols are
shown in Fig. 2 (see Note 10). We typically set region gates at
274 Leif S. Ludwig and Caleb A. Lareau

100 and 9000 bp to assess the molarity and yield in ng/μL of


the mtscATAC-seq library via the Agilent Bioanalzyer software
and compare the yields to the Qubit results, which should be
largely concordant (±10–20% variance, see Note 11).
3. For sequencing we follow the recommendations by 10x Geno-
mics and typically use Illumina NovaSeq and NextSeq reagent
kits using paired-read sequencing (Read 1 and 2: 50–100 bp,
Index Read 1:8 bp, Index Read 2:16 bp). Longer reads will
increase mitochondrial genome coverage to more comprehen-
sively capture genomic variants along sequenced DNA frag-
ments. We note that reads derived from mtDNA have a
median insert size of 120 bases (see Fig. 3), and fully covering

Fig. 2 Representative bioanalyzer traces of sequencing libraries. (a) Original scATAC-seq and (b) mtscATAC-
seq library of human peripheral blood mononuclear cells. We note the increased abundance of the
nucleosome-free (size <300 bp) region in the mtscATAC-seq library relative to the scATAC-seq library that
corresponds to the significant increase of captured mtDNA fragments

Fig. 3 Insert size distribution of mtscATAC-seq libraries. (a) Representative size distribution of accessible
fragments mapping to a nuclear chromosome and (b) the mitochondrial DNA chromosome. The distinctive
mono- and di-nucleosome peaks from ATAC-seq data appear only in the nuclear genome as mtDNA is not
compacted into nucleosomes. The median lengths of the fragment distribution lengths are indicated
Concomitant Sequencing of Accessible Chromatin and Mitochondrial Genomes. . . 275

the majority of the bases in a molecule on a sequencing read


will be advantageous to detect low frequency/heteroplasmy
variants.
4. We typically recommend sequencing ~50,000 reads/cell
yielded in the library capture. This value may vary depending
on the cell input type, the overall viability of the input sample,
and the depth of the desired analysis (see Note 12).

3.4 Computational A feature of the mtscATAC-seq protocol that we emphasize is that


Processing and each of the multiple modalities (mtDNA genotypes and accessible
Analyses chromatin) is contained within a single library, which provides a
convenient bioinformatics workflow and mitigates the risk of
uncoupling modalities from a single capture. Furthermore, we
have designed the mtscATAC-seq workflow to capitalize on multi-
ple cores/threads via parallel computing.
1. Demultiplex mtscATAC-seq sequencing data using cellranger-
atac mkfastq (see Note 13).
$ cellranger-atac mkfastq --id = mtscatac_seq_fastqs --
run=/path/to/flow_cell --csv = sample_sheet.csv
2. Download a NUMT-modified reference genome (see Notes 14
and 15). This approach results in more uniform coverage of the
mitochondrial genome by eliminating multimapping biases
with NUMT regions in the nuclear genome that are highly
homologous to sequences within mtDNA (Fig. 4, see
Note 16).
3. Align the full mtscATAC-seq libraries to the blacklisted refer-
ence genome using the cellranger-atac ‘count’ function (see
Note 17).

Fig. 4 Mitochondrial genome coverage. (a) Circular and (b) linear representations of the mitochondrial genome
showing the differences in coverage of mtscATAC-seq (red) and scATAC-seq (blue). The same mtscATAC-seq
data obtained from hematopoietic cell lines [4] is shown in both panels. Notable differences in the coverage
plots are highlighted. Note that the resulting coverage is a function of cell type and sequencing depth. Primary
hematopoietic cells tend to have lower mean mtDNA coverage
276 Leif S. Ludwig and Caleb A. Lareau

$ cellranger-atac count --reference /path/to/masked/


reference --sample mtscatac_sample_output --fastq mtsca-
tac_seq_fastqs/flow_cell
4. Genotype the mtDNA data using mgatk (see Note 18).
$ mgatk tenx -i mtscatac_sample_output/outs/possor-
ted_bam.bam -n mtscatac_sample_mgatk -o mtscatac_sam-
ple_mgatk -bt CB -b mtscatac_sample_output/outs/
filtered_peak_bc_matrix/barcodes.tsv
5. Collect processed data files necessary for downstream analysis.
These include relevant accessible chromatin summary files from
the cellranger-atac execution available in the mtscatac_sam-
ple_output/outs directory and relevant mgatk output files in
mtscatac_sample_mgatk/final (see Note 19).
6. Interpret the presence and abundance of variable heteroplasmic
variants for downstream analyses in the *.vmr_strand_plot.
png file (see Note 20). An example plot derived from a typical
and high-quality mtscATAC-seq library is shown in Fig. 5.
7. For biological samples that span multiple libraries, one often
uses the cellranger-atac aggr function to generate a compre-
hensive peak set, peaks-by-cells matrix, and total fragments file.
However, the .bam file is not aggregated, meaning that the
mgatk output must come from the per-library processing as
described in step 4 of this subheading.
8. Perform interactive analyses using widely used tools such as
Signac [9]. We note that while other tools such as SnapATAC
[10] and ArchR [11] have full suites of interactive functionality
for chromatin accessibility data, Signac currently has the only

Fig. 5 Example of variant calling output from mgatk. Each dot is a distinct mtDNA
mutation. Heteroplasmic, low-quality, and homoplasmic mutations are
separated by these two dimensions (x-axis: heteroplasmy correlation between
strands; y-axis: variance-mean-ratio (VMR) of the allele frequencies for all cells
in the analysis)
Concomitant Sequencing of Accessible Chromatin and Mitochondrial Genomes. . . 277

Fig. 6 Quality control metrics for mtscATAC-seq data. The correlation between
the log10 number of nuclear chromatin accessibility fragments and the mean
mtDNA coverage per cell is shown for peripheral blood mononuclear cells. The
cells with low FRIP and proportionally lower mtDNA abundance present residual
granulocytes

built-in functionality for directly working with mtscATAC-seq


data, in particular including the mitochondrial DNA genotyp-
ing described here (see Notes 21 and 22).
9. Identify high-quality cells for downstream analysis. We note
that mtDNA-specific per-cell quality control metrics, such as
mean mtDNA coverage, can be incorporated for quality con-
trol and analysis of populations. In general and in our experi-
ences, cells that have high-quality accessible chromatin profiles
also have well-captured amounts of mtDNA (Fig. 6, see
Note 23).
10. Identify lineage-biased mtDNA variants. The power of the
mtscATAC-seq workflow is the concomitant inference of cell
state via chromatin accessibility and clonal relationships via the
mtDNA mutation profiles. Appropriate statistical tests that we
have used in the past include the Kruskal-Wallis and
Chi-Squared tests of association (see Note 24).
We emphasize that the biological questions of interest within a
specific library will largely dictate the exact computational workflow
after step 6. Examples of custom scripts used for varied down-
stream analyses, including nucleotide enrichment analyses, trajec-
tory inferences, clonal bias, and longitudinal sampling, are available
online: https://github.com/caleblareau/mtscATACpaper_
reproducibility.
278 Leif S. Ludwig and Caleb A. Lareau

4 Notes

1. Note that the lysis and wash buffer do not contain Tween 20 as
is being used in many scATAC-seq workflows, including the
standard protocol by 10x Genomics. We omit Tween 20 as it
depletes mitochondria and mtDNA within [3, 4], thereby pre-
venting the sequencing and identification of mtDNA variants.
2. For mtscATAC-seq library preparation, we refer the reader to
the detailed instructions of the user guide by 10x Genomics,
which further includes a detailed list of reagents, consumables,
and best practices required to successfully complete the
protocol.
3. For sequencing we have successfully worked with the Illumina
NextSeq and NovaSeq reagents kits and respective sequencing
platforms. We typically have used kits with 150–200 cycles to
obtain high coverage of the mitochondrial genome for variant
calling enabled by the longer read lengths.
4. The 10x Genomics cellranger-atac software comes as a stable
binary that requires no installation aside from untarring the
requisite files and placing them in a stable directory for
execution.
5. A complete discussion of dependencies and installation instruc-
tions is available online: https://github.com/caleblareau/
mgatk/wiki/Installation.
6. Lysis time may need to be optimized depending on cell type.
Lysis efficacy may be assessed via the quantification of live/
dead cells and should be performed on unfixed cells. Please also
see the 10x Genomics demonstrated protocol Nuclei isolation
for single cell ATAC sequencing (CG000169 Rev. D) and the
troubleshooting section within.
7. To obtain a sufficiently high cell concentration for the tagmen-
tation reaction, we initially resuspend the cells in a small vol-
ume of 1x nuclei buffer to avoid the need to concentrate
further via an additional centrifugation step. The initial volume
is dependent on the starting cell number and for 300,000 cells
we would typically resuspend in 20–30 μL from which to
obtain a first cell count using 5 μL of the cell suspension. One
typically loses some cells during the upstream processing and it
is advisable to be more conservative before diluting the cell
concentration too much. For overloading of 10x channels, for
example when pooling multiple cell lines or cells of multiple
donors or when applying hashing-based approaches [12] to
enable downstream computational demultiplexing of the sam-
ple origin, a higher cell concentration will be required.
Concomitant Sequencing of Accessible Chromatin and Mitochondrial Genomes. . . 279

8. Given the fixation of cells prior to lysis/permeabilization, we


have attempted to incorporate a decrosslinking step as part of
the GEM incubation PCR program. Extended decrosslinking
at 60 °C or 72 °C for 1–12 h did not appear to significantly
improve mtscATAC-seq data quality. As such and given the
high temperatures during cycling, we recommend working
with the standard PCR conditions recommended by 10x
Genomics.
9. The number of cycles for the index PCR are a function of cell
input as well as cell type. Cell lines typically yield a higher
number of accessible chromatin fragments, and cells with
higher mtDNA content will also show higher yield, while the
fixation appears to diminish nuclear library complexity. As such,
the optimal numbers of cycles during the index PCR may need
to be determined empirically. A yield of 10–15 ng/μL in an
elution volume of 20 μL usually provides sufficient material for
downstream library preparation for sequencing.
10. For mtscATAC-seq bioanalyzer traces, the shape and size dis-
tribution is also a function of cell type and mtDNA content. We
have observed the first nucleosome-free peak (200–300 bp) to
also be lower or even higher than the second mono-
nucleosome peak (300–400 bp) as depicted in Fig. 2. Ulti-
mately, only sequencing enables one to obtain proper quality
control metrics to assess the success of the experiment.
11. For mtscATAC-seq library quantification for sequencing, we
prefer the outlined method and have found it to be sufficiently
reliable and less time consuming than the qPCR-based
approaches that are being recommended by 10x Genomics or
other protocols.
12. As an alternative strategy, we perform a two-pass sequencing
strategy where we first sequence ~25,000 reads/cell and then
examine the library complexity of the overall run based on the
CellRanger-ATAC quality control file. Based on this estimated
complexity and the percent duplicates, we will perform addi-
tional sequencing, typically equaling or slightly surpassing the
overall complexity and surpassing ~40% duplicates.
13. cellranger-atac mkfastq wraps the Illumina bcl2fastq software,
which must also be in your environment. A description of this
software and support for building the sample sheet based on
the supplied indices is available online: https://support.10
xgenomics.com/single-cell-atac/software/pipelines/latest/
using/mkfastq.
14. The widely used prebuilt references are available on the 10x
Genomics website: https://support.10xgenomics.com/single-
cell-atac/software/pipelines/latest/advanced/references.
280 Leif S. Ludwig and Caleb A. Lareau

15. We recommend generating a custom reference using a


pre-generated blacklist of NUMTs and other blacklist features.
This is achieved by hard-masking the .fasta reference file in the
cellranger-atac reference as we describe online: https://github.
com/caleblareau/mgatk/wiki/Increasing-coverage-from-10
x-processing. A repository for handling .bed files of NUMT
regions for common reference genomes is available online:
https://github.com/caleblareau/mitoblacklist, as well as cus-
tom code to generate these reference files for uncommon
reference genomes.
16. We note that this approach is highly efficacious because
NUMTs are present in individual cells at a much lower copy
number than mtDNA and are not necessarily accessible to the
Tn5 enzyme used for scATAC-seq. Other approaches for miti-
gating mapping biases in the mtDNA genome include remap-
ping of multimapping reads or mapping to a shifted reference.
Our recommended approach of mapping to the modified ref-
erence provides a one-step execution that is very parsimonious.
17. This is the most computationally intensive part of the protocol.
We recommend specifying a minimum of 12 cores for efficient
processing. Further, we recommend using cellranger-atac ver-
sion 2.0+, which has been greatly improved for computational
efficiency and runtime performance.
18. The main inputs into this execution include the .bam file form
the cellranger-atac processing; this list of barcodes are to be
analyzed downstream (we use all cells that pass the knee thresh-
old via the -b flag and specification, but any user-provided list
would suffice). A full list of user parameters for mgatk is
provided by the mgatk --help option. We note that, by default,
PCR duplicates (fragments that share the same cell barcode and
both transposition events) are removed by default, which can
be retained with the --keep-duplicates flag. Retaining PCR
duplicates could be useful in settings where the mtDNA copy
number is particularly high and the chances of multiple dupli-
cate transposition events is a possibility. Further, the number of
cores available for parallel processing can be specified using the
--ncores flag.
19. From the cellranger output folder, we typically recommend
retaining the fragments.tsv.gz (+ index) and single-cell .csv
files. However, the .bam file from this run is required for initial
genotyping with mgatk. For the mgatk output, we typically
only keep the *.rds file, which summarizes the rest of the
contents. Notably, the outputs of this folder are redundant,
and retaining the. A/.C/.G/.T plain text files represent suffi-
cient summary of the data for downstream analyses though
may be less convenient than the other files in the output folder.
Concomitant Sequencing of Accessible Chromatin and Mitochondrial Genomes. . . 281

20. The x-axis depicts the strand correlation in per-cell hetero-


plasmy across all cells in the experiment. Mutations separated
on this axis are typically low quality (low-strand correlation)
versus high quality (high-strand correlation), and we employ a
density-based threshold to make these calls. The y-axis depicts
the per-variant variance-mean-ratio (VMR) across all cells.
Mutations separated on the y-axis represent heteroplasmic
(high VMR) versus homoplasmic allele frequencies. The popu-
lation of mutations high for both values represent high-quality
variants used for downstream analyses. However, homoplasmic
mutations may also be useful in certain analysis settings, such as
identification of the mtDNA haplogroup.
21. For interactive analyses using Signac, we recommend the fol-
lowing online vignette: https://satijalab.org/signac/articles/
mito.html. Many accessory functions have been built into Sig-
nac since version 1.0.0 to facilitate the importing, analysis, and
interpretation of mgatk data for multimodal analyses.
22. For use in ArchR, we recommend using the addCellColData()
function to append per-cell heteroplasmy values associated
with the mgatk output from the *.cell_heteroplasmic_df.tsv.
gz file.
23. In our experience, we typically recommend thresholding on a
minimum 10x or 20x mean per-cell coverage alongside other
commonly used metrics, such as transcription start site (TSS)
score, number of unique nuclear fragments, fraction of reads in
peaks, and/or nucleosome score. The mean coverage per cell
metric is computed automatically in the mgatk output, but is
simply defined as the mean per-base coverage of the mgatk
output for a given cell.
24. From our current analyses, Kruskal-Wallis is a nonparametric
test of association between clusters (identified via the accessible
chromatin cell state) and a continuous value (per-cell hetero-
plasmy). If a relevant heteroplasmy cutoff can be drawn (say at
5%, 10%, etc.), the association between presence/absence of a
variant and the cell state clusters can be best modeled using a
Chi-squared test of association. These statistical tests are per-
formed independently via a loop over all variants in the high-
quality set identified by mgatk.

Acknowledgments

L.S.L. is supported by an Emmy Noether fellowship by the German


Research Foundation (DFG, LU 2336/2-1) and a Hector
Research Career Development Award. C.A.L. is supported by a
Stanford Science Fellowship and a Parker Institute of Cancer
Immunotherapy Scholarship. We are grateful to the Ludwig lab
for useful feedback in the generation of this protocol.
282 Leif S. Ludwig and Caleb A. Lareau

References
1. Nam AS, Chaligne R, Landau DA (2021) Inte- 7. Penter L, Gohil SH, Lareau C et al (2021)
grating genetic and non-genetic determinants Longitudinal single-cell dynamics of chromatin
of cancer evolution by single-cell multi-omics. accessibility and mitochondrial mutations in
Nat Rev Genet 22:3–18 chronic lymphocytic leukemia mirror disease
2. Buenrostro JD, Giresi PG, Zaba LC et al history. Cancer Discov 11:3048. https://doi.
(2013) Transposition of native chromatin for org/10.1158/2159-8290.CD-21-0276
fast and sensitive epigenomic profiling of open 8. Lareau CA, Ludwig LS, Sankaran VG (2019)
chromatin, DNA-binding proteins and nucleo- Longitudinal assessment of clonal mosaicism in
some position. Nat Methods 10:1213–1218 human hematopoiesis via mitochondrial muta-
3. Corces MR, Trevino AE, Hamilton EG et al tion tracking. Blood Adv 3:4161–4165
(2017) An improved ATAC-seq protocol 9. Stuart T, Srivastava A, Madad S et al (2021)
reduces background and enables interrogation Multimodal single-cell chromatin analysis with
of frozen tissues. Nat Methods 14:959–962 Signac. Nat Methods (in press)
4. Lareau CA, Ludwig LS, Muus C et al (2021) 10. Fang R, Preissl S, Li Y et al (2021) Compre-
Massively parallel single-cell mitochondrial hensive analysis of single cell ATAC-seq data
DNA genotyping and chromatin profiling. with SnapATAC. Nat Commun 12:1–15
Nat Biotechnol 39:451–461 11. Granja JM, Corces MR, Pierce SE et al (2021)
5. Walker MA, Lareau CA, Ludwig LS et al ArchR is a scalable software package for inte-
(2020) Purifying selection against pathogenic grative single-cell chromatin accessibility analy-
mitochondrial DNA in human T cells. N Engl J sis. Nat Genet 53:403–411
Med 383:1556–1563 12. Mimitou EP, Lareau CA, Chen KY et al (2021)
6. Ludwig LS, Lareau CA, Ulirsch JC et al (2019) Scalable, multimodal profiling of chromatin
Lineage tracing in humans enabled by mito- accessibility, gene expression and protein levels
chondrial mutations and single-cell genomics. in single cells. Nat Biotechnol 39:1246.
Cell 176:1325–1339.e22 https://doi.org/10.1038/s41587-021-
00927-2
Part IV

Imaging Methods for Visualization of Accessible DNA


Chapter 15

ATAC-See: A Tn5 Transposase-Mediated Assay for Detection


of Chromatin Accessibility with Imaging
Yonglong Dang, Ram Prakash Yadav, and Xingqi Chen

Abstract
Assay of transposase-accessible chromatin with visualization (ATAC-see), a transposase-mediated imaging
technology that enables direct imaging of the accessible genome in situ and deep sequencing to reveal the
identity of the imaged elements. Here we image spatial organization of the accessible genome in HT1080
cells with this method.

Key words ATAC-see, Tn5 transposase, Chromatin accessibility, In situ imaging, Epigenetics, 3D
genome organization

1 Introduction

Eukaryotic genomes are extensively compacted in their genome to


form euchromatin and heterochromatin [1, 2]. Euchromatin has
active regulatory elements whose access control the gene activity,
whereas heterochromatin is mostly inactive and contains low gene
activity [1, 2]. These accessible elements comprise approximately
2–3% of the genome [3–5] in any given cell types that include
enhancers, promoters, and other regulatory sequences critical for
development processes and disease progression [6, 7]. Nuclear
architecture and 3D genome organization are tightly linked to
gene expression, replication, and DNA repair [6–8]. We previously
reported ATAC-see [9], where hyperactive Tn5 transposase loaded
with fluorescence dye labeled DNA adaptors selectively inserts the
adaptors into accessible chromatin loci within fixed cells. The cova-
lently inserted fluorophores at open chromatin sites genome-wide
allows us to image the open chromatin sites in the intact cells. Thus,
ATAC-see decodes molecular accessibility of chromatin by detect-
ing inserted fluorophores. After imaging the spatial organization of
the accessible genome in 3D, the inserted adaptors still allow deep
sequencing to map open chromatin sites on the same sample, in the

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_15,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

285
286 Yonglong Dang et al.

identical manner of ATAC-seq. Here, we image spatial organization


of the accessible genome in HT1080 cells with this method.

2 Materials

All solutions are prepared with ultrapure water (18 MΩ-cm at


25 °C). Prepare and store reagents at room temperature (unless
indicated otherwise).

2.1 Hyperactive Tn5 1. pTXB1-Tn5 plasmid.


Production 2. T7 Express LysY/Iq E. coli strain.
3. LB medium: For each 950 mL of Milli-Q water, add 10 g of
Tryptone, 10 g of sodium chloride (NaCl), and 5 g of Yeast
Extract, mix until powder is dissolved. Adjust the pH solution
to ~7.0 using sodium hydroxide (NaOH) and make the final
volume up to 1000 mL by adding Milli-Q water. Autoclave
using liquid cycle.
4. Isopropyl-β-d-1-thiogalactopyranoside (IPTG): Dissolve
2.38 g of IPTG in 8 mL of sterilized double-distilled water
(ddH2O) to make 1 M IPTG stock solution. Filter with a
0.22 μm filter and store aliquots at -20 °C.
5. Proteinase inhibitor: Proteinase inhibitor tablet is dissolved in
sterilized ddH2O to make 50X stock. Make aliquots and store
at -20 °C.
6. Chitin resin.
7. Dithiothreitol (DTT): Dissolve 1.54 g of Dithiothreitol (DTT)
in 10 mL of sterilized ddH2O to make 1 M stock. Filter with a
0.22 μm filter and store the aliquots at -20 °C.
8. Bradford assay kit.
9. Ultracel 30-K column.
10. NuPAGE Novex 4–12% Bis–Tris gel.
11. Coomassie Brilliant Blue R-250: Add 100 mL of glacial acetic
acid to 400 mL of sterilized ddH2O. Dissolve 0.5 g of Coo-
massie R-250 dye in 500 mL methanol. Mix the acetic acid and
methanol solutions and filter through a Whatman No. 1 filter
to remove any particulate matter.

2.2 Tn5 Transposase 1. DNA adaptor oligos were synthesized at integrated DNA tech-
Assembly nologies (IDT) with following sequences (see Notes 1 and 2):
Tn5MErev, 5′-[phos]CTGTCTCTTATACACATCT-3′;
Tn5ME-A-ATTO590:
5′-/ATTO590/TCGTCGGCAGCGTCAGATGTGTATAA-
GAGACAG-3′;
Tn5ME-B-ATTO590:
ATAC-See: A Tn5 Transposase-Mediated Assay for Detection of Chromatin. . . 287

5′-/ATTO590/GTCTCGTGGGCTCGGAGATGTGTA-
TAAGAGACAG-3′.
2. Tn5 transposases were produced according to Picelli et al. [10].
3. 2X dialysis buffer (DB) [9, 10]: To make 10 mL of 2X DB
buffer,mix 1 mL of 1 M HEPES-KOH (pH 7.2), 400 μL of
5 M NaCl, 40 μL of 0.5 M EDTA, 20 μL mL of 1 M DTT,
20 μL mL of Triton X-100, 2 mL of glycerol, and 6.52 mL of
sterilized ddH2O. Make aliquots and store at -20 °C.

2.3 Tn5 1. 1% Formaldehyde: Dilute 16 times formaldehyde solution


Tagmentation (16% stock) in the sterilized PBS to make 1% formaldehyde
(see Note 3).
2. Phosphate-buffered saline (PBS) solution: Add PBS tablet in
ddH2O. Mix the solution completely to dissolve all insoluble
matter and sterilize the PBS solution with autoclaving at 115 °
C for 15 min.
3. 2X Tagment DNA (TD) buffer [11]: To make 10 mL 2X TD
buffer, mix 200 μL of 1 M Tris–HCl (pH 7.6), 200 μL of 0.5 M
MgCl2, 2 mL of N,N-Dimethylformamide (DMF), and 7.6 mL
of sterilized ddH2O. Make aliquots and store at -20 °C.
4. Lysis buffer: To make 10 mL lysis buffer, mix 100 μL of 1 M
Tris–HCl (pH 7.4), 20 μL of 5 M NaCl, 30 μL of 1 M MgCl2,
10 μL of IGEPAL CA-630, and 9.84 mL of sterilized dH2O. S-
tore the solution at +4 °C (see Note 4).
5. Glass coverslips for cell culture: Place the coverslips in a loosely
covered glass beaker containing 1 M HCl for 4 h at 50 °C. After
cooling down to room temperature, rinse the coverslips with
ddH2O. Next, wash 3 times with 50% EtOH, 70% EtOH, 95%
EtOH, respectively, for 30 min. Store coverslip in absolute
EtOH until further use.
6. Washing buffer: To make 1 L wash buffer, mix 1 mL of 10%
sodium dodecyl sulfate (SDS) and 100 mL of 0.5 M EDTA
with 899 mL of sterilized PBS (see Note 5).

2.4 Immunostaining 1. Primary antibodies dilution (1:100): Dilute the antibodies


of Mitochondria and 100 times (rabbit anti-Lamin B1 antibody and mouse anti-
Nuclear Lamina mitochondria antibody) with antibody dilution reagent
(00–3218, Thermo Fisher Scientific) (see Note 6).
2. Vectashield Antifade Mounting Medium with 4′,6-diamidino-
2-phenylindole (DAPI).
3. Secondary antibodies dilution (1:500): Dilute the antibodies
500 times (goat anti-rabbit-ATTO488 and goat anti-mouse-
Atto647N) with antibody dilution reagent (see Note 6).
4. Washing buffer (1 L): Mix 0.5 mL of Tween-20 in the
1000 mL of sterilized PBS.
288 Yonglong Dang et al.

2.5 Cell Culture HT1080 cells were cultured in DMEM/F-12, GlutaMAX™ sup-
plement, 10% fetal bovine serum [12], and 1% Pen/Strep with
SecureSlip cell culture system.

2.6 Equipment 1. Cell culture incubator.


2. Confocal microscopy.
3. Humidity chamber box.
4. Sonicator.
5. Heat block.

3 Methods

3.1 Hyperactive Tn5 Hyperactive Tn5 was produced as previously described


Production [9, 10]. pTXB1-Tn5 plasmid (60,240, Addgene) was introduced
into T7 Express LysY/Iq E. coli strain. Overnight cultured E. coli
(10 mL) was inoculated to LB medium (500 mL) at 37 °C for 1.5 h
[13]. When the cell density (OD600) becomes 0.9, then Tn5
protein was induced by adding 0.25 mM IPTG for 4 h. E. coli
pellet was resuspended in lysis buffer (20 mM HEPES-KOH
pH 7.2, 0.8 M NaCl, 1 mM EDTA, 10% glycerol, 0.2% Triton
X-100, complete proteinase inhibitor), followed by mild sonication
to lyse the cells. Chitin resin (10 mL) was added to the supernatant
and incubated for 1 h at 4 °C with slow rotation. Unbound resin
was washed by the lysis buffer extensively. Next, lysis buffer con-
taining 100 mM DTT was added to the bound resin and stored in
4 °C. After 48 h, protein was eluted by gravity flow and collected in
the 1 mL fractions. Each fraction (1 ul) was added to detergent
compatible Bradford assay [13] and peaked fractions were pooled
and dialyzed against 2X dialysis buffer (100 mM HEPES-KOH
pH 7.2, 0.2 M NaCl, 0.2 mM EDTA, 2 mM DTT, 0.2% Triton
X-100, 20% glycerol). Dialyzed Tn5 protein was concentrated by
using Ultracel 30-K column, and the quantity of Tn5 was measured
by Bradford assay and visualized on NuPAGE Novex 4–12% Bis–
Tris gel followed by Coomassie blue staining [14].

3.2 Tn5 1. Oligonucleotides (Tn5ME-A-ATTO590, Tn5ME-B-


Transposome ATTO590, Tn5MErev) were resuspended in the TE buffer
Assembly (10 mM Tris, 0.1 mM EDTA, pH 8.0) to a final concentration
of 100 μM each.
2. Equimolar amounts of Tn5MErev/Tn5ME-A-ATTO590 and
Tn5MErev/Tn5ME-B-ATTO590 were mixed in separate
200 μL PCR tubes.
3. These two tubes of oligos mixtures were denatured on a ther-
mocycler for 5 min at 95 °C and cooled down slowly on the
thermocycler by turning off the thermocycler.
ATAC-See: A Tn5 Transposase-Mediated Assay for Detection of Chromatin. . . 289

4. The Tn5 transposome was assembled with the following com-


ponents: 0.25 vol Tn5MErev/Tn5ME-A-ATTO590 +
Tn5MErev/Tn5ME-B-ATTO590 (final concentration of
each double strand oligo is now 50 μM each), 0.4 vol glycerol
(100% solution), 0.12 vol 2X dialysis buffer (100 mM HEPES-
KOH at pH 7.2, 0.2 M NaCl, 0.2 mM EDTA, 2 mM DTT,
0.2% Triton X-100, 20% glycerol, 0.1 vol SL-Tn5 (50 μM), and
0.13 vol of sterilized ddH2O.
5. The reagents were mixed gently, and the solution was incu-
bated for 1 h at 25 °C. After annealing, the Tn5 assembly is
stored at -20 °C.

3.3 Slide Preparation First, precleaned glass coverslips were placed in the 6-well cell
and Fixation cultures plate. Then, HT1080 cells were grown on precleaned
glass coverslip until 80–90% confluent, fixed with 1% formaldehyde
[15] (Sigma-Aldrich) for 10 min at room temperature and
quenched with 0.125 M glycine for 5 min at room temperature.

3.4 ATAC-see 1. Glass coverslip and fixed HT1080 cells were permeabilized
with lysis buffer (10 mM Tris–HCl pH 7.4, 10 mM NaCl,
3 mM MgCl2, 0.1% IGEPAL CA-630) for 10 min.
2. Premixed (50 μL) transposase reaction solution (2.5 μL 2 mM
ATTO-Tn5, 25 μL 2X TD buffer, 22.5 μL water) was added
onto the slide, and the cells on the slide were incubated at 37 °
C for 30 min.
3. After Tn5 tagmentation, the cells were washed with washing
buffer (0.01% SDS, 50 mM EDTA in PBS) for 3 times at 55 °C
for 15 min each.

3.5 Immunostaining 1. Cells were blocked with antibody dilution reagent for 1 h at
After ATAC-see room temperature.
2. Primary antibodies (rabbit anti-LaminB1, ab16048, Abcam
and mouse anti-mitochondria, ab3298, Abcam) were diluted
(1:100) in the antibody dilution reagent and incubated over-
night at 4 °C.
3. After washing with washing buffer (containing 0.05% Tween-
20 in PBS) for 3 times 10 min each, slides were incubated with
secondary antibodies (goat anti-rabbit-ATTO488,18,772-
1ML-F, Sigma-Aldrich; goat anti-mouse-Atto647N, 50,185-
1ML-F, Sigma-Aldrich) diluted to 1:500 for 45 min at room
temperature.
4. Finally, slides were washed with washing buffer, 3 times for
10 min each, mounted using Vectashield with DAPI (H-1200,
Vector labs), and imaged with confocal microscopy.
290 Yonglong Dang et al.

4 Notes

1. The modification with fluorescent dyes on Tn5 adaptors could


be other dyes instead of Atto590 or other big molecules, e.g.,
Biotin. However, the modification must be on the 5′ end of
oligos.
2. There could be free DNA oligos in the assembled ATTO-Tn5,
which could potentially introduce some unspecific signal for
ATAC-see. To remove the free oligos, HPLC purification could
be used to remove the free DNA oligos.
3. Paraformaldehyde is a potential carcinogen, so all formalde-
hyde work must be conducted in a properly operating
fume hood.
4. The concentration of IGEPAL CA-630 in lysis buffer in the
Tn5 tagmentation could be cell type specific, which could refer
to the concentration used in ATAC-seq for different cell lines.
5. Wear a mask while preparing a solution containing sodium
dodecyl sulfate (SDS).
6. Alternative to antibody dilution reagent (00–3218, Thermo
Fisher Scientific), we can also use freshly prepared antibody
dilution buffer: 0.05 g bovine serum albumin (BSA) dissolve
in 5 mL of sterilized PBS. The solution is filter sterilize with a
0.22 μm filter.

Acknowledgments

This work is supported by grants to X.C. from the Swedish


Research Council (VR-2016-06794, VR-2017-02074).

References

1. Babu A, Verma RS (1987) Chromosome struc- 4. Corces MR et al (2018) The chromatin acces-
ture: euchromatin and heterochromatin. Int sibility landscape of primary human cancers.
Rev Cytol 108:1–60. https://doi.org/10. Science 362. https://doi.org/10.1126/sci
1016/s0074-7696(08)61435-7 ence.aav1898
2. Janssen A, Colmenares SU, Karpen GH (2018) 5. Klemm SL, Shipony Z, Greenleaf WJ (2019)
Heterochromatin: guardian of the Genome. Chromatin accessibility and the regulatory epi-
Annu Rev Cell Dev Biol 34:265–288. genome. Nat Rev Genet 20:207–220. https://
https://doi.org/10.1146/annurev-cellbio- doi.org/10.1038/s41576-018-0089-8
100617-062653 6. Bickmore WA, van Steensel B (2013) Genome
3. Buenrostro JD, Giresi PG, Zaba LC, Chang architecture: domain organization of inter-
HY, Greenleaf WJ (2013) Transposition of phase chromosomes. Cell 152:1270–1284.
native chromatin for fast and sensitive epige- https://doi.org/10.1016/j.cell.2013.02.001
nomic profiling of open chromatin, 7. Misteli T (2009) Self-organization in the
DNA-binding proteins and nucleosome posi- genome. Proc Natl Acad Sci U S A 106:
tion. Nat Methods 10:1213–1218. https:// 6885–6886. https://doi.org/10.1073/pnas.
doi.org/10.1038/nmeth.2688 0902010106
ATAC-See: A Tn5 Transposase-Mediated Assay for Detection of Chromatin. . . 291

8. Schneider R, Grosschedl R (2007) Dynamics Protoc 8:2022–2032. https://doi.org/10.


and interplay of nuclear architecture, genome 1038/nprot.2013.118
organization, and gene expression. Genes Dev 12. Honn KV, Singley JA, Chavin W (1975) Fetal
21:3027–3043. https://doi.org/10.1101/ bovine serum: a multivariate standard. Proc
gad.1604607 Soc Exp Biol Med 149:344–347. https://doi.
9. Chen X et al (2016) ATAC-see reveals the org/10.3181/00379727-149-38804
accessible genome by transposase-mediated 13. Harlow E, Lane D (2006) Bradford assay. CSH
imaging and sequencing. Nat Methods 13: Protoc 2006. https://doi.org/10.1101/pdb.
1013–1020. https://doi.org/10.1038/ prot4644
nmeth.4031 14. Brunelle JL, Green R (2014) Coomassie blue
10. Picelli S et al (2014) Tn5 transposase and tag- staining. Methods Enzymol 541:161–167.
mentation procedures for massively scaled https://d oi.org/10.1016 /B97 8-0-12-
sequencing projects. Genome Res 24:2033– 420119-4.00013-6
2 0 4 0 . h t t p s : // d o i . o r g / 1 0 . 1 1 0 1 / g r. 15. Fox CH, Johnson FB, Whiting J, Roller PP
177881.114 (1985) Formaldehyde fixation. J Histochem
11. Wang Q et al (2013) Tagmentation-based Cytochem 33:845–853. https://doi.org/10.
whole-genome bisulfite sequencing. Nat 1177/33.8.3894502
Chapter 16

NicE-viewSeq: An Integrative Visualization and Genomics


Method to Detect Accessible Chromatin in Fixed Cells
Pierre-Olivier Estève, Udayakumar S. Vishnu, Hang Gyeong Chin,
and Sriharsa Pradhan

Abstract
A novel genome-wide accessible chromatin visualization, quantitation, and sequencing method is
described, which allows in situ fluorescence visualization and sequencing of the accessible chromatin in
the mammalian cell. The cells are fixed by formaldehyde crosslinking, and processed using a modified nick
translation method, where a nicking enzyme nicks one strand of DNA, and DNA polymerase incorporates
biotin-conjugated dCTP, 5-methyl-dCTP, Fluorescein-12-dATP or Texas Red-5-dATP, dGTP, and dTTP.
This allows accessible chromatin DNA to be labeled for visualization and on bead NGS library preparation.
This technology allows cellular level chromatin accessibility quantification and genomic analysis of the
epigenetic information in the chromatin, particularly accessible promoter, enhancers, nucleosome position-
ing, transcription factor occupancy, and other chromosomal protein binding.

Key words Open chromatin, Nicking enzyme, DNA Polymerase I, DNA labeling, Fluorescent dye,
dNTPs, Slides, NEBNext, Biotin, DNA library, Microscopy

1 Introduction

Eukaryotic chromosomes are built of packaged chromatin fibers, a


complex made of DNA, histone, and chromatin-associated pro-
teins. Each chromosome maintains its territory in the nucleus.
During development, spatial separation of active and inactive frac-
tions of the genome in the cell nucleus is crucial for gene expression
[1–3]. Depending on the transcriptional activity and degree of
compaction, chromatin can be functionally and structurally distin-
guished to two types, euchromatin and heterochromatin, and are
spatially separated within the nucleus. The euchromatin is less
condensed and enriched in genes encompassing the active fraction
of the genome. It is less densely packed, and therefore more acces-
sible to the transcriptional protein machineries. This fraction of the
genome is also known as accessible chromatin [4]. The other

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_16,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

293
294 Pierre-Olivier Estève et al.

fraction is heterochromatin, highly condensed, and generally con-


tains few genes and mostly transcriptionally inactive [5]. This gen-
eral pattern of the nuclear genome organization described above is
found in virtually all eukaryotic cell types. For nuclear structure
studies, particularly specific protein or DNA staining, including
labeling technologies for visualization, is routinely performed. Spe-
cific DNA sequences in an intact nucleus can be probed and visua-
lized by hybridization with fluorescently labeled DNA probes with
a complementary sequence. This technique is called fluorescence in
situ hybridization (FISH) [6–8]. In this case the hybridization
probe is itself made fluorescent, generally by the incorporation of
FITC, rhodamine, or far-red fluorophore-conjugated deoxynu-
cleotides that would fluorescent upon excitation. Alternatively,
modified nucleotides (e.g., bromodeoxyuridine) or conjugated
deoxynucleotides may be used, which could be further revealed
by fluorescence-conjugated antibodies to the modified nucleotide
using immunocytochemistry/histochemistry methodologies.
Apart from sequence specific interrogation, total nuclear DNA
can be fluorescently labeled using a number of fluorescent DNA
binding dye, such as DAPI or Hoechst, although these dyes exhibit
non-specific affinity for RNA, as well as exhibit DNA sequence
preferences [9–12].
Labeling accessible chromatin was demonstrated for the first
time using a modified assay of transposase-accessible chromatin
using a hyperactive Tn5 transposon, which allowed direct imaging
of the accessible chromatin in situ, and deep sequencing to reveal
the identity of the imaged DNA elements. The basic principle relied
on a prokaryotic Tn5 transposon, which is loaded with fluores-
cently labeled sequencing adapters creating an active dimeric trans-
posome complex. The complex can provide the cut in the accessible
chromatin and simultaneous ligation of the cargo sequences. This
technology is known as ATAC-see and it provides information that
other chromatin accessible technologies such as DNase-seq,
MNase-seq, FAIRE-seq, ATAC-seq alone, or NicE-seq could not
generate [13–19]. Indeed, ATAC-see is a direct modification of
ATAC-seq technology [13, 18]. ATAC-see has been used in a
variety of studies with protocol optimization [20, 21].
We subsequently modified our universal nicking enzyme-
assisted sequencing (UniNicE-seq) for direct fluorescent labeling
and sequencing of accessible chromatin genome-wide [22]. The
modified UniNicE-seq protocol included fluorescein-dATP or
Texas Red-5-dATP in the mix to allow sequential visualization,
cell sorting, and sequencing of the accessible regions from the
same sample. We named this technology as Nicking Enzyme-
assisted viewing and Sequencing (NicE-viewSeq) [23]. It is a versa-
tile method that combines quantitative imaging and sequencing of
the accessible chromatin regions across various cell types using
NGS. This technology has been used to study HDAC inhibitor
NicE-viewSeq: An Integrative Visualization and Genomics Method to Detect. . . 295

NicE-view with TexasRed-dATP (HUT 78 cells)

a DAPI - + Nt.CviPII b
9
8
7
DMSO
6

OCI Index
5
4
3
2
Romidepsin 1
0
DMSO ROMIDEPSIN

Fig. 1 NicE-view of HUT 78 cells treated or not with 1 μM of HDAC inhibitor (Romidepsin) for 6 h at 37 °C. (a)
The visualization of Texas Red-5-dATP (red) with or without Nt.CviPII. Nuclei are shown using DAPI staining
(blue). (b) The open chromatin index (OCI), (quantification of fluorescence incorporation)

Fig. 2 Representative IGV tracks of HUT 78 control and Romidepsin treated cells

effect on chromatin accessibility in both quantitatively and using


NGS (Figs. 1 and 2). Here, we describe a stepwise protocol for
NicE-viewSeq on mammalian culture cells that is fixed by
formaldehyde.

2 Materials

Prepare all solutions using ultrapure water (e.g., Milli-Q water or


equivalent by purifying deionized water, to attain a conductivity of
18 MΩ-cm at 25 °C) and analytical grade reagents. Prepare all
reagents at room temperature and store at 4 °C (unless indicated
otherwise). Formaldehyde must be handled with gloves and in a
laboratory fume hood.
296 Pierre-Olivier Estève et al.

2.1 Crosslinking 1. For HCT116 cells, culture in McCoy’s 5A medium (Thermo


Cells on Micro Cover Fisher Scientific #16600082) supplemented with 10% Fetal
Glass Bovine Serum (GemCell #100-500); for MCF7 HeLa cells,
use DMEM medium (Cytiva #SH30285.1) supplemented
with 10% Fetal Bovine Serum (GemCell #100-500). For
HUT 78 cells, use RPMI 1640 medium (Thermo Fisher Sci-
entific #61870036) supplemented with 10% Fetal Bovine
Serum (GemCell #100-500). Grow cells on micro cover glass
(VWR #48366067) in a 6-well plate format. HUT 78 cells
were grown in suspension and spun at 1000 rpm on micro
cover glass before fixation with formaldehyde.
2. 16% formaldehyde (w/v), methanol free (Thermo Fisher Sci-
entific #28908).
3. 1X PBS (Thermo Fisher Scientific #70011-044).
4. 2.5 M Glycine (Sigma-Aldrich #G7126).

2.2 Accessible 1. Prepare cytosolic buffer: 15 mM Tris–HCl pH 7.5, 5 mM


Chromatin Labeling MgCl2, 60 mM KCl, 15 mM NaCl, 1% NP-40, and 300 mM
sucrose. Add 0.5 mM fresh DTT (NEB #B7705S).
2. Prepare 10X dNTP mix: 240 μM dATP, 180 μM 5-methyl-
dCTP (NEB #N0356S), 60 μM biotin-14-dCTP (Thermo
Fisher Scientific #19518018), 300 μM dGTP, 300 μM dTTP,
and 60 μM of Fluorescein-12-dATP (PerkinElmer
#NEL465001EA) or Texas Red-5-dATP (PerkinElmer
#NEL471001EA).
3. Prepare 800 μL of accessible chromatin labeling buffer for one
well for a 6-well plate: 80 μL of 10X NEBuffer™ 2 (NEB
#B7002S), 2.5 U of Nt.CviPII (NEB #R0626S), 50 U of
DNA Polymerase I (NEB #M0209S) with 30 μM of each
dNTP including 6 μM of biotin-dCTP, 6 μM of Fluorescein-
12-dATP (PerkinElmer #NEL465001EA), or 6 μM of Texas
Red-5-dATP (PerkinElmer #NEL471001EA). Fluorescent
dNTPs will be used for visualization, biotin-dCTP for on
bead DNA library construction.
4. 37 °C incubator.
5. 0.5 M EDTA, pH 8.0 (Thermo Fisher Scientific #15575-038).
6. RNase A (Thermo Fisher Scientific #12091021).

2.3 Mounting Slides 1. Microscope slides (VWR Vista Vision #16004-368).


and Open Chromatin 2. ProLong gold antifade reagent with DAPI (Thermo Fisher
Index Using Zeiss LSM Scientific #P36935).
880 Confocal
3. Visualization of Fluorescein-dATP or TexasRed-dATP and
Microscope Hoechst using 488, 561, and 405 nm laser, respectively.
NicE-viewSeq: An Integrative Visualization and Genomics Method to Detect. . . 297

2.4 NicE-viewSeq 1. 65 °C heat block.


Library Construction 2. Proteinase K (NEB #P8107S).
3. Phenol:Chloroform:Isoamyl Alcohol 25:24:1 saturated with
10 mM Tris, pH 8.0, 1 mM EDTA (Sigma-Aldrich #P3803).
4. Ethanol (Sigma-Aldrich #E7023).
5. Monarch Genomic DNA Purification Kit (NEB #T3010S).
6. Covaris S2 sonicator.
7. Covaris microtubes (Covaris #500330).
8. NEB Ultra II DNA Library Prep Kit for Illumina (NEB
#E7645).
9. PCR machine.
10. Prepare High Salt Buffer: 10 mM Tris–HCl, pH 8, 2 M NaCl,
and 1 mM EDTA.
11. Prepare High Salt Buffer with Triton X-100: 10 mM Tris–HCl,
pH 8, 2 M NaCl, 1 mM EDTA, and 0.05% Triton X-100.
12. 1.5 mL DNA LoBind Eppendorf tubes (Eppendorf
#022431021).
13. Streptavidin Magnetic Beads (NEB #S1420S).
14. Rocking platform/End-over end rotator (VWR # 10136-
084).
15. Nuclease-free water (Thermo Fisher Scientific #AM9932).
16. NEBNext® Oligos for Illumina (NEB #E7335S or E7500S).
17. NEBNext® Sample Purification Beads (E7104S) or AMPure
XP beads (Beckman Coulter #A63881).
18. 0.1X TE buffer (NEB #E7763AA).
19. Qubit fluorometer 2.0 and Qubit dsDNA HS Assay Kit
(Thermo Fisher Scientific # Q32854).
20. Bioanalyzer (Agilent 2100 Bioanalyzer with Agilent High
Sensitivity DNA Kit #5067-4626).

3 Methods

3.1 Crosslinking 1. Grow between 5 k to one million cells on micro cover glass in
Cells on Micro Cover 2 mL media in a 6-well plate (see Note 1). Remove media at
Glass 50–70% confluency and add 937.5 μL 1X PBS per well (see
Note 1). Add 62.5 μL of 16% formaldehyde to crosslink the
cells for 10 min at RT on rocking platform (see Note 2).
2. Quench reaction by adding 125 mM glycine (52.5 μL of 2.5 M
stock) and incubate for 5 min at RT on a rocking platform.
3. Wash cells twice with 1X PBS.
298 Pierre-Olivier Estève et al.

3.2 Accessible 1. Add 1 mL of cytosolic buffer to the cells and incubate for
Chromatin Labeling 10 min at 4 °C (see Note 3).
2. Wash twice with 1X PBS, nuclei can be visualized under the
microscope at this point (circular with smooth edges).
3. Add 800 μL of accessible chromatin labeling buffer for at least
30 min (can be extended up to 2 h) at 37 °C away from light (see
Note 4).
4. Add 20 μL 0.5 M EDTA and 2 μL RNase A to each well to stop
the reaction. Incubate for 20 min at 37 °C to digest RNA.
5. Remove the supernatant and wash once with 1X PBS at 55 °C
to remove autofluorescence background (see Note 6).
6. Wash twice with 1X PBS at RT.
7. Remove and dry the cover glass for at least 30 min at RT away
from light.
8. Mount the coverslip on microscope slide using ProLong gold
antifade reagent with DAPI. At this point, slides are ready to be
visualized by confocal microscopy (see Note 7).

3.3 Mounting Slides 1. To visualize Fluorescein-dATP or TexasRed-dATP and DAPI,


and Open Chromatin set up the 488 or 561 nm laser power to 2% and diode
Index Using Zeiss LSM (405 laser) at 0.5% (Fig. 1a), (see Note 5).
880 Confocal 2. Set up the frame size at optimal, scan time at 5–7 min, average
Microscope number of image frames at 16 and bit depth at 16.
3. Histograms will give the mean pixel intensity for Fluorescein-
dATP or TexasRed-dATP.
4. Open chromatin index (OCI) is defined by mean pixel inten-
sity/number of nuclei (Fig. 1b).

3.4 NicE-viewSeq 1. After labeling chromatin on slides, cells can be removed using
DNA Preparation 1% SDS, 2 mg/mL proteinase K, and 200 mM NaCl at 65 °C
overnight.
2. Genomic DNA can be extracted and purified using phenol/
chloroform/isoamyl alcohol method or Monarch Genomic
DNA purification kit.
3. Take 200–500 ng of genomic DNA and sonicate (see Note 8).
Transfer the genomic DNA to Covaris microtube and add 1X
TE buffer to 50 μL final volume. Sonicate using the following
settings to obtain 150 bp fragments. First, insert the tube into
the holder, and simply select “Open” and select the program
named “Covaris 200 for 50μL” on the computer. Click the
“start” button. The parameters: Intensity: 5; Duty Cycle: 10%;
Cycles per burst: 200; and Treatment time: 2 min.
NicE-viewSeq: An Integrative Visualization and Genomics Method to Detect. . . 299

3.5 NicE-viewSeq 1. For DNA pull-down, transfer the biotinylated fragmented


DNA Pull-Down on DNA to DNA LoBind Eppendorf tubes.
Beads 2. For low number of cells, add 15 μL of Streptavidin mag-
netic beads. For high number of cells, add 30 μL of beads. To
the beads, add 1 mL of 1X High Salt Buffer. Incubate the tube
for 2 h at 4 °C on end-over end rotator.
3. Place the Eppendorf tube on the magnetic rack. When solution
is clear, remove the liquid carefully using a pipet, and wash
beads for 5 min with 1 mL cold High Salt Buffer containing
0.05% Triton X-100.
4. Repeat the above wash steps 2 times.
5. Wash beads once with 1 mL 1X TE buffer for 5 min.
6. Resuspend beads in 50 μL 1X TE buffer.

3.6 NicE-viewSeq 1. For end-repair per sample, combine the following in sequential
NGS Library order: 50 μL of fragmented DNA on Streptavidin beads from
Preparation the above step; 3 μL of NEB Next Ultra II End Prep Enzyme
Mix; 7 μL of NEB Next Ultra II End Prep Reaction Buffer. Mix
well and incubate at 20 °C for 30 min and at 65 °C for 30 min
(use a PCR machine).
2. For adaptor ligation per sample, combine the following in
sequential order: 60 μL of End Prep Reaction Mixture; 30 μL
of NEB Next Ultra II ligation Master Mix; 1 μL of NEB Next
ligation Enhancer; 1 μL of 1:10 diluted (recommended by
NEB) NEB Next Adaptor for Illumina. Incubate for 2–16 h
at RT.
3. Add 3 μL USER enzyme for 15 min and incubate at 37 °C.
4. Place the Eppendorf tube on the magnetic rack. When solution
is clear, remove the liquid carefully using a pipet, and wash the
beads for 5 min with 1 mL cold High Salt Buffer containing
0.05% Triton X-100.
5. Repeat the above wash steps 2 times.
6. Wash the beads once with 1 mL 1X TE buffer for 5 min.
7. Resuspend the beads in 19 μL 0.1X TE buffer.

3.7 NicE-viewSeq 1. For PCR amplification per sample, combine the following in
NGS Library sequential order: 19 μL of Streptavidin beads, 3 μL Index
Amplification primer (10 μM), 3 μL Universal primer (10 μM), and 25 μL
NEB Ultra II Q5 Master Mix.
2. Set the amplification using the following parameters in a PCR
machine: 30 s at 98 °C; initial denaturation, 10 s at 98 °C;
denaturation, 30 s at 65 °C; annealing, 45 s at 65 °C; extension
for 10 cycles. The final extension is 5 min at 72 °C. Library
amplification may be hold at 4 °C.
300 Pierre-Olivier Estève et al.

3.8 NicE-viewSeq 1. Place PCR reaction on a magnetic rack to remove the magnetic
NGS Library streptavidin-biotinylated-DNA bead complexes. Transfer the
Purification Using supernatant that contains the PCR products to new DNA
NEBNext® Sample LoBind tube and add 0.9X volume (45 μL) of NEBNext®
Purification Beads Sample Purification Beads.
2. Incubate for 5 min at RT and quick-spin the tube in a
microcentrifuge.
3. Put samples on magnetic rack to separate beads from the
supernatant. When the solution looks clear, carefully remove
the supernatant. DO NOT DISTURB THE BEADS.
4. Add 200 μL of freshly prepared 80% EtOH. DO NOT
REMOVE THE PCR TUBES OF THE RACK OR RESUS-
PEND THE BEADS. Wait for 30 s, remove the 80% EtOH
from the beads, and repeat once. After removing the superna-
tant for a second time, quickly spin down the tubes and
completely remove the residual EtOH.
5. Air-dry the beads for 5 min while the tube is on the rack with
the lid open. DO NOT OVERDRY THE BEADS, THIS MAY
RESULT IN LOWER RECOVERY OF DNA.
6. Remove the tube from the magnet and resuspend the beads in
20 μL 0.1X TE buffer for 2 min at RT to elute the DNA.
7. Put back the tube on to the magnetic rack until the solution is
clear and transfer the supernatant to a clean PCR tube and store
it at -20 °C.
8. Measure the amount of DNA using the Qubit HsDNA proto-
col. A successful library preparation should have at least a DNA
concentration of 1 ng/μL.
9. Analyze the DNA on the Bioanalyzer (Agilent DNA 1000
Chip) to assess the library quality (size distribution and con-
centration). After Illumina DNA sequencing mapping and peak
analyses, genomic open chromatin regions can be visualized
using IGV browser (Fig. 2).

4 Notes

1. Cell should grow on glass until 80% confluency. 100% con-


fluency may impair the nicking enzyme diffusion through
nuclei and therefore affecting the open chromatin labeling
efficiency.
2. Cells can be crosslinked with 1% up to 4% formaldehyde for
10 min at 4 °C. Below 1% formaldehyde, DNA damage path-
way is activated and can lead to false-positive detection of open
chromatin [24].
3. Cytosolic extraction can be extended from 10 min up to 30 min
at 4 °C.
NicE-viewSeq: An Integrative Visualization and Genomics Method to Detect. . . 301

4. Chromatin labeling buffer should be made fresh each time and


all the enzymes (CviPII and DNA Polymerase I) stored at -
20 °C before being added to the chromatin labeling buffer.
5. Any fluorescent dye or biotin coupled with dNTPs can be used
for visualization or DNA sequencing, respectively.
6. Cells can be washed with either PBS at 55 °C or 3 times with
PBS containing 0.1% Tween 20 for 5 min at room temperature.
7. Slides should be dried for at least 30 min at RT and kept away
from light before mounting. Detection of open chromatin
index can be performed using a confocal or inverted
microscope.
8. For DNA library construction, low genomic DNA input (from
1 to 100 ng) can also be used. PCR cycle numbers need to be
determined accordingly.

Acknowledgments

This work was partly supported by NIH SBIR grant


R44HG011006 and New England Biolabs, Inc. to S.P. We thank
C. Carlow for critical reading of the manuscript, T. Evans,
D. Comb, Sir R.J. Roberts, and J. Ellard for encouragement.
Basic research support for H.G.C., P.O.E, U.S.V., and S.P. was
provided by New England Biolabs, Inc.

References
1. Jackson DA (2003) The principles of nuclear fluorescence in situ hybridization techniques.
structure. Chromosom Res 11:387–401 Nat Rev Microbiol 6:339–348
2. Martins RP, Finan JD, Guilak F, Lee DA 8. Volpi EV, Bridger JM (2008) FISH glossary:
(2012) Mechanical regulation of nuclear struc- an overview of the fluorescence in situ hybridi-
ture and function. Annu Rev Biomed Eng 14: zation technique. BioTechniques 45:385–409
431–455 9. Tarnowski BI, Spinale FG, Nicholson JH
3. Nathanailidou P, Taraviras S, Lygerou Z (1991) DAPI as a useful stain for nuclear quan-
(2020) Chromatin and nuclear architecture: titation. Biotech Histochem 66:297–302
shaping DNA replication in 3D. Trends 10. Latt SA, Stetten G, Juergens LA, Willard HF,
Genet 36:967–980 Scher CD (1975) Recent developments in the
4. Klemm SL, Shipony Z, Greenleaf WJ (2019) detection of deoxyribonucleic acid synthesis by
Chromatin accessibility and the regulatory epi- 33258 Hoechst fluorescence. J Histochem
genome. Nat Rev Genet 20:207–220 Cytochem 23:493–505
5. Volpe TA, Kidner C, Hall IM, Teng G, Grewal 11. Latt SA, Stetten G (1976) Spectral studies on
SI, Martienssen RA (2002) Regulation of het- 33258 Hoechst and related bisbenzimidazole
erochromatic silencing and histone H3 lysine-9 dyes useful for fluorescent detection of deoxyr-
methylation by RNAi. Science 297:1833–1837 ibonucleic acid synthesis. J Histochem Cyto-
6. Langer-Safer PR, Levine M, Ward DC (1982) chem 24:24–33
Immunological method for mapping genes on 12. Bucevičius J, Lukinavičius G, Gerasimaitė R
Drosophila polytene chromosomes. Proc Natl (2018) The Use of Hoechst Dyes for DNA
Acad Sci U S A 79:4381–4385 Staining and Beyond. Chemosensors 6:18
7. Amann R, Fuchs BM (2008) Single-cell identi- 13. Chen X, Shen Y, Draper W et al (2016) ATAC-
fication in microbial communities by improved see reveals the accessible genome by
302 Pierre-Olivier Estève et al.

transposase-mediated imaging and sequencing. DNA-binding proteins and nucleosome posi-


Nat Methods 13:1013–1020 tion. Nat Methods 10:1213–1218
14. Boyle AP, Davis S, Shulha HP, Meltzer P, Mar- 19. Ponnaluri VKC, Zhang G, Estève PO et al
gulies EH, Weng Z, Furey TS, Crawford GE (2017) NicE-seq: high resolution open chro-
(2008) High-resolution mapping and charac- matin profiling. Genome Biol 18:122
terization of open chromatin across the 20. Pintacuda G, Wei G, Roustan C et al (2017)
genome. Cell 132:311–322 hnRNPK recruits PCGF3/5-PRC1 to the Xist
15. Crawford GE, Holt IE, Whittle J, Webb BD, RNA B-repeat to establish Polycomb-mediated
Tai D, Davis S, Margulies EH, Chen Y, Bernat chromosomal silencing. Mol Cell 68:955–969
JA, Ginsburg D, Zhou D, Luo S, Vasicek TJ, 21. Rodrigues CP, Herman JS, Herquel B et al
Daly MJ, Wolfsberg TG, Collins FS (2006) (2020) Temporal expression of MOF acetyl-
Genome-wide mapping of DNase hypersensi- transferase primes transcription factor net-
tive sites using massively parallel signature works for erythroid fate. Sci Adv 6:eaaz4815
sequencing (MPSS). Genome Res 16:230 22. Chin HG, Sun Z, Vishnu US et al (2020)
16. Cui K, Zhao K (2012) Genome-wide Universal NicE-seq for high-resolution accessi-
approaches to determining nucleosome occu- ble chromatin profiling for formaldehyde-fixed
pancy in metazoans using MNase-Seq. Chro- and FFPE tissues. Clin Epigenetics 12:143
matin Remodel Method Mol Biol 833:413– 23. Estève PO, Vishnu US, Chin HG, Pradhan S
419 (2020) Visualization and sequencing of acces-
17. Giresi PG, Kim J, McDaniell RM, Iyer VR, sible chromatin reveals cell cycle and post-
Lieb JD (2006) FAIRE (Formaldehyde- HDAC inhibitor treatment dynamics. J Mol
Assisted Isolation of Regulatory Elements) iso- Biol 432:5304–5321
lates active regulatory elements from human 24. Vishnu US, Estève PO, Chin HG, Pradhan S
chromatin. Genome Res 17:877–885 (2021) One-pot universal NicE-seq: all enzy-
18. Buenrostro JD, Giresi PG, Zaba LC, Chang matic downstream processing of 4% formalde-
HY, Greenleaf WJ (2013) Transposition of hyde crosslinked cells for chromatin
native chromatin for fast and sensitive epige- accessibility genomics. Epigenetics Chromatin
nomic profiling of open chromatin, 14(1):53
Part V

Computational Analysis of Chromatin Accessibility Datasets


Chapter 17

ATAC-seq Data Processing


Daniel S. Kim

Abstract
ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) has gained wide popularity as a
fast, straightforward, and efficient way of generating genome-wide maps of open chromatin and guiding
identification of active regulatory elements and inference of DNA protein binding locations. Given the
ubiquity of this method, uniform and standardized methods for processing and assessing the quality of
ATAC-seq datasets are needed. Here, we describe the data processing pipeline used by the ENCODE
(Encyclopedia of DNA Elements) consortium to process ATAC-seq data into peak call sets and signal tracks
and to assess the quality of these datasets.

Key words ATAC-seq, Data pipeline

1 Introduction

ATAC-seq [1] (Assay for Transposase-Accessible Chromatin using


sequencing) is a ubiquitous method for generating genome-wide
maps of open chromatin (see the first chapter in this book). ATAC-
seq uses Tn5 transposase loaded with sequencing adaptors to
simultaneously fragment genomic DNA at accessible locations
and insert a sequencing library amplification adaptor sequence.
This library generation method thus produces genomic fragments
enriched for open chromatin regions. The straightforward and
optimized nature of this assay has made it widely used in genomics
as a way to identify open chromatin in various cell types. This assay
has also been used to analyze nucleosome positioning and infer
locations of DNA protein binding through analysis of accessible
sequence and footprinting analyses [2–4]. As a major assay type
generated by the ENCODE (Encyclopedia of DNA Elements)
consortium [5], a standardized pipeline was necessary to ensure
high-quality reproducible data. Here, we describe the ENCODE
standard pipeline for processing ATAC-seq with additional tips and
commentary around data processing decisions.

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_17,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

305
306 Daniel S. Kim

The pipeline consists of the following processing steps. Reads


are trimmed to remove adapter sequences from the ends of reads, as
ATAC-seq library fragments can be shorter than the read length of
the sequencer. The trimmed reads are aligned and filtered to obtain
a high-quality set of alignments. These alignments are used to call
peaks that can be used to generate reproducible peaks across repli-
cates as well as signal tracks for visualization on a genome browser.
Finally, we describe useful quality control metrics and tools to help
troubleshoot ATAC-seq library generation and determine the qual-
ity of the ATAC-seq experiment.

2 Materials

This protocol assumes a basic working knowledge of the UNIX


command line. You can utilize the ENCODE pipeline as found at:
https://github.com/ENCODE-DCC/atac-seq-pipeline. This
pipeline will install all required software and manage file names
and locations.
Otherwise, install the following software before going through
each step in Subheading 3:
1. cutadapt (v1.9.1) [6].
2. Bowtie2 (v2.2.6) [7]. Set up Bowtie2 index as directed in the
Bowtie2 manual.
3. Samtools (1.7) [8].
4. Bedtools (2.26) [9].
5. Picard (v1.126) [10].
6. MACS2 (v2.1.0) [11].
7. UCSC tools (3.0.9, http://hgdownload.soe.ucsc.edu/
downloads.html), specifically bedGraphToBigWig.
8. SAMstats (v0.2.1, https://github.com/kundajelab/
SAMstats).
Optionally, also install the following tools for specific quality
control steps: phantompeakqualtools (https://github.com/
kundajelab/phantompeakqualtools) for cross-correlation, SPP
[12] for cross-correlation, and deepTools (v3.3.0) [13] for
Jensen-Shannon distance.

3 Methods

Each step of the pipeline is described with input and outputs. For
this step-by-step guide, we provide the pipeline for paired-ended
ATAC-seq with two replicates, utilizing multimapping reads (our
recommended default experiment design and processing).
ATAC-seq Data Processing 307

Adjustments to the pipeline for other experimental designs can be


found in Subheading 4. Utilizing the pipeline from ENCODE will
run these steps in an automated fashion.

3.1 Adaptor 1. Trim adapter sequence from the sequencing reads. Do this for
Detection and Read each FASTQ file (see Notes 1 and 2).
Trimming • Inputs: reads in FASTQ format ($FASTQ), adapter
sequence ($ADAPTER), adapter error rate ($ADAPTER_-
ERR_RATE, default 0.2).

• Outputs: trimmed reads in FASTQ format


($TRIMMED_FASTQ).
• Command:
cutadapt -m 5 -e $ADAPTER_ERR_RATE -a
$ADAPTER $FASTQ | gzip -nc > $TRIMMED_FASTQ

3.2 Read Alignment 1. Run Bowtie2 to align reads with up to k multimapping loca-
and Post-alignment tions allowed (see Notes 3–5).
Filtering • Inputs: reads in FASTQ format ($FASTQ1, $FASTQ2),
Bowtie2 index ($bwt2_idx), number of CPU threads
($nth_bwt2), number of multimapping locations allowed
per read ($multimapping, default 4), output file name
prefix ($prefix).
• Outputs: unfiltered alignments file in BAM format (${pre-
fix}.bam, in future steps referred to as $RAW_BAM), log file
($log).
• Command:
bowtie2 -k ${multimapping+1} -X2000 --mm --
threads $nth_bwt2 -x $bwt2_idx -1 $FASTQ1 -2
$FASTQ2 2>$log | samtools view -Su /dev/stdin |
samtools sort - $prefix

2. Filter reads by read flags. Remove reads that were unmapped,


the read mate was unmapped, not primary alignment, reads
failing platform (see Notes 6–9).
• Inputs: unfiltered alignments file in BAM format ($RAW_-
BAM), number of multimapping locations allowed, as set in
step 1 ($multimapping). Note that two intermediate fil-
tered BAM files are required, one for initial filtering
($TMP_FILT_BAM) and one after fixing read pairs
($TMP_FILT_FIXMATE_BAM), which can be deleted after
this step is complete.
• Outputs: flag filtered alignments in BAM format
($FLAG_FILT_BAM).
• Commands:
(1) samtools view -F 524 -f 2 -u $RAW_BAM | sam-
tools -n /dev/stdin -o $TMP_FILT_BAM
308 Daniel S. Kim

(2) samtools view -h $TMP_FILT_BAM | assign_-


multimappers.py -k $multimapping --paired-end |
samtools fixmate -r /dev/stdin
$TMP_FILT_FIXMATE_BAM
(3) samtools view -F 1804 -f 2 -u $TMP_FILT_-
FIXMATE_BAM | samtools sort /dev/stdin -o
$FLAG_FILT_BAM

3. Filter reads for duplicate alignments. These are read pairs


whose alignments have the same start position when mapped
to the genome (see Notes 10 and 11). This step produces a
final filtered BAM file with high quality alignments.
• Inputs: Picard tools MarkDuplicates.jar script, flag fil-
tered alignments in BAM format ($FLAG_FILT_BAM) from
the previous step. Note that an intermediate BAM with
marked duplicates will be generated ($TMP_DUPMARK_BAM)
that will replace the flag filtered alignment file.
• Outputs: final filtered alignments in BAM format
($FINAL_BAM) with index (${FINAL_BAM}.bai), dupli-
cates metrics file ($DUP_FILE_QC).
• Commands:
(1) java -Xmx4G -jar MarkDuplicates.jar
INPUT=$FLAG_FILT_BAM OUTPUT=$TMP_DUPMARK_BAM
METRICS_FILE=$DUP_FILE_QC VALIDATION_STRIN-
GENCY=LENIENT ASSUME_SORTED=true
REMOVE_DUPLICATES=false
(2) mv $TMP_DUPMARK_BAM $FLAG_FILT_BAM
(3) samtools view -F 1804 -f 2 -b $FLAG_FILT_-
BAM > $FINAL_BAM
(4) samtools index $FINAL_BAM
4. Convert BAM file to BEDPE format.
• Inputs: final filtered alignments in BAM format ($FINAL_-
BAM), read name sorted.
• Outputs: final filtered alignments in BEDPE format
($FINAL_BEDPE).
• Command:
bedtools bamtobed -bedpe -mate1 -i $FINAL_-
BAM | gzip -nc > $FINAL_BEDPE

5. Convert BEDPE file to tagAlign format (BED file of reads) (see


Note 12).
• Inputs: final filtered alignments in BEDPE format
($FINAL_BEDPE).
• Outputs: final filtered alignments in tagAlign format
($FINAL_TA_FILE).
ATAC-seq Data Processing 309

• Command:
zcat $FINAL_BEDPE | awk ’BEGIN{OFS="\t"}
{printf "%s\t%s\t%s\tN\t1000\t%s\n%s\t%s\t%s
\tN\t1000\t%s\n",$1,$2,$3,$9,$4,$5,$6,$10}’
| \ gzip -nc > $FINAL_TA_FILE

6. Adjust read starts for transposase cut sites for base-pair resolu-
tion alignments (see Note 13).
• Inputs: reads in tagAlign format ($FINAL_TA).
• Outputs: shifted reads in tagAlign format
($FINAL_TA_SHIFTED).
• Command:
zcat $FINAL_TA | awk -F $’\t’ ’BEGIN {OFS = FS}
{ if ($6 == "+") {$2 = $2 + 4} else if ($6 == "-") {$3
= $3 - 5} print $0}’ | gzip -nc > $FINAL_TA_SHIFTED

3.3 Peak Calling 1. Call peaks (see Notes 14–16).


• Inputs: alignments in tagAlign format ($TAG), chromosome
sizes ($gensz), p-value threshold ($pval_thresh, default
0.1), smoothening window ($smooth_window, default
150), shift size ($shiftsize), default is (-$smooth_win-
dow / 2). Note that this command also produces the pileup
files used to create bedGraph files for the signal tracks.
• Outputs: peak files, pileup bedGraph files.
• Command:
macs2 callpeak -t $TAG -f BED -n $prefix -g
$gensz -p $pval_thresh --shift $shiftsize --
extsize $smooth_window --nomodel -B --SPMR --
keep-dup all --call-summits

2. Filter for ENCODE blacklisted regions (see Note 17).


• Inputs: peak file ($PEAK), blacklist ($BLACKLIST).
• Output: filtered peak file ($FILT_PEAK).
• Commands:
bedtools intersect -v -a $PEAK -b $BLACKLIST |
awk ’BEGIN{OFS="\t"} {if ($5>1000) $5=1000;
print $0}’ | grep -P ’chr[\dXY]+[ \t]’ | gzip -nc
> $FILT_PEAK

3.4 Identifying 1. Run IDR framework for replicate consistent peaks (see Notes
Replicate-Consistent 18–22).
Peaks • Inputs: replicate peak files ($REP1_PEAK_FILE,
$REP2_PEAK_FILE), master peak list ($POOLED_PEAK_-
FILE), p-value threshold ($IDR_THRESH).
310 Daniel S. Kim

• Output: peak file with IDR scores ($IDR_OUTPUT), IDR


peak file ($IDR_PEAKS).
• Commands:
(1) idr --samples $REP1_PEAK_FILE
$REP2_PEAK_FILE --peak-list $POOLED_PEAK_FILE
--input-file-type narrowPeak --output-file
$IDR_OUTPUT --rank p.value --soft-idr-thresh-
old $IDR_THRESH --plot --use-best-multisummit-
IDR
(2) IDR_THRESH_TRANSFORMED=$(awk -v p=
$IDR_THRESH ’BEGIN{print -log(p)/log(10)}’)
(3) awk ’BEGIN{OFS="\t"} $12>=’"$IDR_-
THRESH_TRANSFORMED"’ {print $1,$2,$3,$4,$5,
$6,$7,$8,$9,$10}’ $IDR_OUTPUT | sort | uniq |
sort -k7n,7n | gzip -nc > $IDR_PEAKS

2. Filter for ENCODE blacklisted regions if needed (see Subhead-


ing 3.3, step 2).

3.5 Generating 1. Generate fold-change bigWig file with MACS2 (see Note 23).
Signal Tracks • Inputs: pileup bedGraph files, generated from MACS2 call-
peak ($TREAT_PILEUP, $CONTROL_PILEUP), chromo-
some sizes file ($gensz). Note that this produces an
intermediate bedGraph file ($fc_bedgraph,
$fc_bedgraph_srt).

• Outputs: fold-change bigWig ($FC_BIGWIG).


• Commands:
(1) macs2 bdgcmp -t $TREAT_PILEUP -c $CON-
TROL_PILEUP --o -prefix $prefix -m FE
(2) slopBed -i ${prefix}_FE.bdg -g $gensz -b
0 | bedClip stdin $gensz $fc_bedgraph
(3) sort -k1,1 -k2,2n $fc_bedgraph >
$fc_bedgraph_srt
(4) bedGraphToBigWig $fc_bedgraph_srt
$gensz $FC_BIGWIG

2. Generate p-value bigWig files with MACS2.


• Inputs: tagAlign file ($TAG), pileup bedGraph files, gener-
ated from MACS2 callpeak ($TREAT_PILEUP, $CON-
TROL_PILEUP), chromosome sizes file ($gensz). Note
that this produces an intermediate bedGraph file ($pval_-
bedgraph, $pval_bedgraph_srt).
• Outputs: p-value bigWig ($PVAL_BIGWIG).
• Commands:
(1) sval=$(wc -l <(zcat -f $TAG) | awk ’{printf
"%f", $1/1000000}’)
ATAC-seq Data Processing 311

(2) macs2 bdgcmp -t $TREAT_PILEUP -c $CON-


TROL_PILEUP --o -prefix $PREFIX -m ppois -S $sval
(3) slopBed -i ${PREFIX}_ppois.bdg -g $gensz
-b 0 | bedClip stdin $gensz $pval_bedgraph
(4) sort -k1,1 -k2,2n $pval_bedgraph >
$pval_bedgraph_srt
(5) bedGraphToBigWig $pval_bedgraph_srt
$gensz $pval_bigwig

3. Optionally, generate count-signal tracks (see Note 24).


• Inputs: tagAlign files ($TA_FILE), chromosome sizes file
($GENSZ).
• Outputs: strand separated count-signal tracks ($POS_-
COUNT_BIGWIG, $NEG_COUNT_BIGWIG).
• Commands:
(1) zcat -f $TA_FILE | sort -k1,1 -k2,2n | bed-
tools genomecov -5 -bg -strand + -g $GENSZ -i
stdin > TMP.POS.BED
(2) bedGraphToBigWig TMP.POS.BED $GENSZ
$POS_COUNT_BIGWIG
(3) zcat -f $TA_FILE | sort -k1,1 -k2,2n | bed-
tools genomecov -5 -bg -strand - -g $GENSZ -i
stdin > TMP.NEG.BED
(4) bedGraphToBigWig TMP.NEG.BED $GENSZ
$NEG_COUNT_BIGWIG

3.6 ATAC-seq Quality 1. Calculate mitochondrial fraction (see Note 25).


Control Evaluation • Inputs: unfiltered alignments file in BAM format ($RAW_-
BAM), number of CPU threads ($nth). Note that these
commands will produce intermediate files: (1) an align-
ments file without chrM alignments ($NON_MITO_BAM)
and (2) an alignments file with only chrM alignments
($MITO_BAM).
• Outputs: fraction of reads mapped to chrM.
• Commands:
(1) samtools idxstats $RAW_BAM | cut -f 1 | grep
-v -P "^chrM$" | xargs samtools view $RAW_BAM -@
$nth -b> $NON_MITO_BAM
(2) samtools view -b $RAW_BAM -@ $nth chrM >
$MITO_BAM
(3) samtools sort -n --threads 10 $NON_MITO_-
BAM -O SAM | SAMstats.sort.stat.filter.py --
sorted_sam_file - --outf $non_mito_samstat_qc
(4) samtools sort -n --threads 10 $MITO_BAM -O
SAM | SAMstatspython SAMstats.sort.stat.
312 Daniel S. Kim

filter.py --sorted_sam_file - --outf


${mito_samstat_qc}
(5) Rn = number of mapped reads in $non_mito_-
samstat_qc, Rm = number of mapped reads in $mito_-
samstat_qc, then fraction of mito reads is Rm / (Rm +
Rn).
2. Calculate read counts at each stage of filtering. This can be
done with any alignment file (BAM format) to determine why
reads are lost in processing and to guide future library genera-
tion as needed (see Note 26).
• Input: alignments in BAM format ($BAM).
• Output: mapped statistics ($MAPSTATS).
• Command:
samtools sort -n --threads 10 $BAM -O SAM |
SAMstats --sorted_sam_file - --outf $MAPSTATS

3. Estimate library complexity (see Notes 27 and 28).


• Inputs: final alignments file ($BAM), prefix for a temporary
read name sorted BAM file ($OFPREFIX).
• Outputs: PCR bottlenecking coefficient 1 (PBC1), PCR
bottlenecking coefficient 2 (PBC2), Non-Redundant Frac-
tion (NRF).
• Commands:
(1) samtools sort -n $BAM -o ${OFPREFIX}.srt.
tmp.bam
(2) bedtools bamtobed -bedpe -i ${OFPREFIX}.
srt.tmp.bam | awk ’BEGIN{OFS="\t"}{print $1,$2,
$4,$6,$9,$10}’ | grep -v ’chrM’ | sort | uniq -c |
awk ’BEGIN{mt=0;m0=0;m1=0;m2=0} ($1==1)
{m1=m1+1} ($1==2){m2=m2+1} {m0=m0+1} {mt=mt+
$1} END{printf "%d\t%d\t%d\t%d\t%f\t%f\t%f
\n",mt,m0,m1,m2,m0/mt,m1/m0,m1/m2}’ >
${PBC_FILE_QC}

4. Calculate cross-correlation metrics (see Notes 29 and 30).


• Inputs: BEDPE files ($FINAL_BEDPE_FILE), number of
reads to subsample ($NREADS), tagAlign file ($FINAL_-
TA_FILE) used for standardized randomization, number
of compute threads ($NTHREADS). Note that an intermedi-
ate file ($SUBSAMPLED_TA_FILE) will be generated which
can be deleted after this analysis.
• Outputs: cross-correlation scores ($CC_SCORES_FILE) and
plots ($CC_PLOT_FILE).
• Commands.
ATAC-seq Data Processing 313

First, subsample the BEDPE or tagAlign file (default:


25M reads):
zcat $FINAL_BEDPE_FILE | grep -v “chrM” | shuf
-n $NREADS --random-source=<(openssl enc
-aes-256-ctr -pass pass:$(zcat -f $FINAL_TA_-
FILE | wc -c) -nosalt </dev/zero 2>/dev/null) |
awk ’BEGIN{OFS="\t"}{print $1,$2,
$3,"N","1000",$9}’ | gzip -nc >
$SUBSAMPLED_TA_FILE
Then use the following commands to run cross-
correlation:
(1) Rscript $(which run_spp.R) -c=$SUBSAM-
PLED_TA_FILE -p=$NTHREADS -filtchr=chrM
-savp=$CC_PLOT_FILE -out=$CC_SCORES_FILE
(2) sed -r ’s/,[^\t]+//g’ $CC_SCORES_FILE >
temp
(3) mv temp $CC_SCORES_FILE
5. Calculate the Jensen-Shannon distance (JSD) metric (see
Note 31).
• Inputs: aligned reads in BAM format ($BAM), MAPQ
threshold ($MAPQ_THRESH, default 30), number of proces-
sers ($NTH).
• Outputs: fingerprint plots showing JSD ($JSD_PLOT) and
log ($JSD_LOG).
• Command:
plotFingerprint -b $BAM --labels rep1 --out-
QualityMetrics $JSD_LOG --minMappingQuality
$MAPQ_THRESH -T "Fingerprints of different sam-
ples" --numberOfProcessors $NTH --plotFile
$JSD_PLOT

6. Estimate GC bias (see Note 32).


• Inputs: filtered alignments file ($BAM), reference genome
($REF_FA).
• Outputs: GC bias plot ($GC_BIAS_PLOT) and log
($GC_BIAS_LOG) of results. The log can be used to replot
as desired.
• Command:
java -Xmx6G -XX:ParallelGCThreads=1 -jar
picard.jar CollectGcBiasMetrics R=$REF_FA I=
$BAM O=$GC_BIAS_LOG USE_JDK_DEFLATER=TRUE
USE_JDK_INFLATER=TRUE VERBOSITY=ERROR
314 Daniel S. Kim

QUIET=TRUE ASSUME_SORTED=FALSE CHART=


$GC_BIAS_PLOT S=summary.txt

7. Fragment length statistics. This is for paired end only (see


Note 33).
• Inputs: final BAM file ($BAM).
• Outputs: data file with fragment length distribution
($INSERT_DATA), distribution plot ($INSERT_PLOT).
• Command:
java -Xmx6G -XX:ParallelGCThreads=1 -jar
picard.jar CollectInsertSizeMetrics INPUT=
$BAM OUTPUT=$INSERT_DATA H=$INSERT_PLOT VERBO-
SITY=ERROR QUIET=TRUE USE_JDK_DEFLATER=TRUE
USE_JDK_INFLATER=TRUE W=1000
STOP_AFTER=5000000

8. Analyze TSS enrichment (see Note 34).


• Inputs: filtered final BAM file ($BAM), chromosome sizes file
($GENSZ), read length estimated from FASTQ ($READ_-
LEN), TSS BED file ($TSS_BED).
• Output: TSS plot, TSS enrichment value at peak within a
desired output directory ($OUT_DIR).
• Command:
encode_task_tss_enrich.py --read_len
$READ_LEN --nodup-bm $BAM --chrsz $GENSZ --tss
$TSS_BED --out-dir $OUT_DIR

9. Calculate the fraction of reads in peaks (FRiP) (see Notes 35


and 36).
• Inputs: final filtered peak file ($PEAK), final filtered tagAlign
file ($TA).
• Output: text file with FRiP score ($FRIP).
• Commands:
(1) val1=$(bedtools intersect -a <(zcat -f
$TA) -b <(zcat -f $PEAK) -wa -u | wc -l)
(2) val2=$(zcat $TA | wc -l)
(3) awk ’BEGIN {print ’${val1}’/’${val2}’}’ >
$FRIP

10. Calculate the IDR quality control metrics (see Note 37).
Let Np = number of peaks passing IDR threshold by
comparing pooled pseudoreplicates and Nt = number of
peaks passing IDR threshold by comparing true replicates.
Calculate the Rescue Ratio = max(Np, Nt)/min(Np, Nt).
Let N1, N2 = number of peaks passing IDR threshold for
self-pseudoreplicates for replicate 1 and replicate 2, respectively.
Calculate the Self-consistency Ratio = max(N1, N2)/min
(N1, N2).
ATAC-seq Data Processing 315

4 Notes

1. To test the pipeline, we recommend accessing a publicly avail-


able dataset (e.g., GSE47753 [1]) and running through each
step to confirm proper outputs. Subsampling can be done (i.e.,
head -n 1000000 $FASTQ > $SUBSAMPLED_FASTQ, or if gzip-
ped then zcat $FASTQ | head -n 1000000 > $SUBSAM-
PLED_FASTQ) to confirm the pipeline works properly before
running the pipeline on potentially very deep sequencing
libraries.
2. The adapter is the sequencing primer sequence used in trans-
position. Commonly the sequence is AGATCGGAAGAGC (Illu-
mina) but confirm that your library generation method uses
this primer. This trimming step is important as fragments gen-
erated by transposase cuts may be shorter than your read
length. As an example, consider an open chromatin site which
is 70 bp in length. It is possible for two transposases to bind in
that open chromatin site, generating a fragment that is less than
70 bp in length. If the read length is 100 bp, then the sequenc-
ing will read through the adapter on the 3′ end of the fragment,
leading to non-genomic adapter sequence in the read itself.
This non-genomic sequence needs to be trimmed off for
proper alignment of the reads.
3. Note that we run Bowtie2 with the -k parameter set to k + 1.
This is by design, as it allows us to distinguish between reads
that map to only k total positions and those that map to more
than k positions in downstream processing.
4. For running alignment for single-end reads, replace the Bow-
tie2 command with:
bowtie2 -k ${multimapping+1} --mm -x $bwt2_idx
--threads $nth_bwt2 -U <(zcat -f $fastq) 2>$log |
samtools view -Su /dev/stdin | samtools sort - $pre-
fix. Note that the key difference is in using parameter -U
instead of -1 and -2.
5. For running alignment with uniquely mapped reads only,
replace the Bowtie2 command with:
bowtie2 -X2000 --mm --threads $nth_bwt2 -x
$bwt2_idx -1 $fastq1 -2 $fastq2 2>$log | samtools
view -Su /dev/stdin | samtools sort - $prefix. Note
that the key difference is removal of the parameter -k.
6. Note the use of a custom script assign_multimappers.py.
This script can be found in the ENCODE ATAC-seq pipeline
GitHub repository. Briefly, this script looks at reads with mul-
tiple alignments and only keeps reads that mapped to no more
than the desired number of multimappers. For example, if the
316 Daniel S. Kim

multimapping threshold is 4 but the read is found to map to


5 locations, the read is discarded (a read is only allowed to map
to a maximum of 4 locations). Downstream filtering by sam-
tools chooses one of the read alignments as primary and dis-
cards the rest. Note that MAPQ threshold is NOT used when
processing multimappers, as all multimappers fall below the
usual MAPQ threshold.
7. For filtering reads for single-ended read alignments (that are
multimapped) with samtools flags, use these read filtering
commands instead:
(1) samtools sort -n ${RAW_BAM} -o
${QNAME_SORT_BAM}
(2) samtools view -h ${QNAME_SORT_BAM} | $(which
assign_multimappers.py) -k $multimapping | sam-
tools view -F 1804 -Su /dev/stdin | samtools sort /
dev/stdin -o ${FLAG_FILT_BAM}
Note that the key differences are sorting by read name
order first, no fixing read mates, and no filtering for read mates.
8. For filtering reads with uniquely mapped read alignments only
with samtools flags, use these read filtering commands instead:
(1) samtools view -F 1804 -f 2 -q ${MAPQ_THRESH}
-u ${RAW_BAM} | samtools sort -n /dev/stdin -o
${TMP_FILT_BAM}
(2) samtools fixmate -r ${TMP_FILT_BAM}
${TMP_FILT_FIXMATE_BAM}
(3) samtools view -F 1804 -f 2 -u ${TMP_FILT_FIX-
MATE_BAM} | samtools sort /dev/stdin -o
${FLAG_FILT_BAM}
Note the key differences are no filtering for multimappers,
immediate filtering with -F 1804, and use of MAPQ thresh-
olds. Note that MAPQ threshold default is 30. This threshold
is aligner dependent, so if using a different aligner then remem-
ber to adjust the MAPQ.
9. For filtering reads for single-ended read alignments that are
uniquely mapped, use this read filtering command instead:
(1) samtools view -F 1804 -q ${MAPQ_THRESH} -u
${RAW_BAM} | samtools sort /dev/stdin -o ${FILT_-
BAM} -T ${FILT_BAM_FILE_PREFIX}
10. Duplicates are filtered as a conservative measure to avoid read
biases that may occur by using PCR duplicates. There is a
tradeoff to be considered in only using unique reads vs using
multimappers. While uniquely mapped reads are able to defini-
tively avoid reads that are PCR duplicates generated by library
amplification, there are a number of reads that are not PCR
duplicates but will be seen as multimapped due to multiple
distinct genomic locations that are closely or completely the
ATAC-seq Data Processing 317

same (as in the case of evolutionarily recent region duplica-


tions, such as around hemoglobin genes HBA1 and HBA2).
To capture these reads, the ENCODE consortium utilizes
some multimappers (reads that do not map to more than four
unique locations) to help capture these evolutionarily recently
duplicated genomic regions.
11. To filter duplicates in a single-end alignments file, adjust Com-
mand (3) to the following:
samtools view -F 1804 -b ${FILT_BAM_FILE} >
${FINAL_BAM_FILE}
12. The tagAlign format is an ENCODE format in which align-
ments are kept in a BED format, such that each line in the BED
file is an alignment. This format can be useful for quick com-
patibility with Bedtools. To generate a tagAlign of alignments
from a single-end library, change the command to the
following:
bedtools bamtobed -i ${FINAL_BAM} | awk BEGIN
{OFS="\t"}{$4="N";$5="1000";print $0}’ | gzip
-nc > ${FINAL_TA}
Note that there is no BEDPE in a single-end library as
there are not paired reads.
13. Transposase activity leads to an offset cut that is 9 bp in length.
To approximate the center of the transposase binding site on
the sequence and achieve base-pair resolution information on
the genome, read starts (transposase cut sites) are adjusted to
get positions that are closer to the transposase center instead of
the cut site by adding 4 to the positive strand reads and sub-
tracting 5 from the negative strand reads.
14. Peaks are called with a loose p-value threshold to capture a large
set of possible peaks. Having more peaks aids the IDR frame-
work in determining the threshold for reproducible peaks.
Stricter thresholding is used in calculating reproducible peaks
in the next steps. If the IDR framework will not be used for
filtering peaks (as may be the case when only a single replicate is
present), we recommend setting a stricter p-value threshold for
peak calling (e.g., 0.01). Adjust the p-value as needed based on
your data quality and downstream IDR results.
15. It is helpful to clean up peak names after peak calling by repla-
cing the peak names with peak ID where the ID number is the
peak rank:
sort -k 8gr,8gr "$prefix"_peaks.narrowPeak |
awk ’BEGIN{OFS="\t"}{$4="Peak_"NR ; print $0}’ |
head -n ${NPEAKS} | gzip -nc > $peakfile
16. To utilize MACS2 for ATAC-seq, we adjust --extsize and
--shiftsize parameters to fit ATAC-seq specifications. In
contrast to ChIP-seq, the ends of the reads (Tn5 transposase
318 Daniel S. Kim

cut sites) matter for ATAC-seq instead of the midpoints of the


reads (approximate midpoint of DNA-protein binding sites).
This requires adjusting read shifting to ensure that peak calling
is performed with read ends rather than read midpoints.
17. Blacklist regions are genomic regions known to produce sig-
nificant signal enrichment that can bias downstream analyses
due to significant amplification of noise and artifact. Blacklist
files can be found at: doi: 10.5281/zenodo.1491732 [14].
18. To run IDR, also run peak calling on a pooled set of reads. To
do so, simply concatenate the replicate tagAlign files and run
peaks on the pooled set of alignments. For example:
zcat ${REP1_TA_FILE} ${REP2_TA_FILE} | gzip
-nc > ${POOLED_TA_FILE}
19. When running IDR, we also recommend capping the input
peak files to a peak list of the top 300K peaks. Observationally,
across ENCODE accessibility datasets, we find that most cell
types have about 200K accessible regions. As such, we suggest
only using at most 300K peaks in IDR analysis as the top 300K
likely captures all the real accessibility regions in addition to
some regions that represent noise.
20. When only a single replicate is present, IDR can be run on self-
pseudoreplicates. To generate pseudoreplicates, take the single
replicate, shuffle reads, and split the reads into two equal parts.
Use the original replicate peak file as the new “pooled repli-
cate” peak file when running IDR.
• Commands for paired-ended datasets:
(1) nlines=$( zcat ${joined} | wc -l )
(2) nlines=$(( (nlines + 1) / 2 ))
(3) zcat -f ${joined} | shuf --random-
source=<(openssl enc -aes-256-ctr -pass pass:
$(zcat -f ${FINAL_TA_FILE} | wc -c) -nosalt </
dev/zero 2>/dev/null) | split -d -l ${nlines} -
${PR_PREFIX}
(4) awk ’BEGIN{OFS="\t"}{printf "%s\t%s\t%s
\t%s\t%s\t%s\n%s\t%s\t%s\t%s\t%s\t%s\n",$1,
$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}’
"${PR_PREFIX}00" | gzip -nc > ${PR1_TA_FILE}
(5) awk ’BEGIN{OFS="\t"}{printf "%s\t%s\t%s\t%
s\t%s\t%s\n%s\t%s\t%s\t%s\t%s\t%s\n",$1,$2,
$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}’
"${PR_PREFIX}01" | gzip -nc > ${PR2_TA_FILE}
• Commands for single-ended datasets:
(1) nlines=$( zcat ${FINAL_TA_FILE} | wc -l )
(2) zcat ${FINAL_TA_FILE} | shuf --random-
source=<(openssl enc -aes-256-ctr -pass pass:
$(zcat -f ${FINAL_TA_FILE} | wc -c) -nosalt </
ATAC-seq Data Processing 319

dev/zero 2>/dev/null) | split -d -l ${nlines} -


${PR_PREFIX}
(3) gzip -nc “${PR_PREFIX}00" > ${PR1_TA_-
FILE} (4) gzip -nc “${PR_PREFIX}01" >
${PR2_TA_FILE}

21. When two replicates are heavily imbalanced in read counts,


consider running IDR on pseudo-pseudoreplicates. To do so,
first use the steps in Note 20 to generate two pseudoreplicates
for each replicate. Then merge one pseudoreplicate from each
replicate with the pseudoreplicate from the other replicate. For
example, if replicate r1 produces pseudoreplicates r1pr1 and
r1pr2 and replicate r2 produces pseudoreplicates r2pr1 and
r2pr2, then merge r1pr1 and r2pr1 to get pseudo-
pseudoreplicate ppr1. Use the pooled peak file as the “pooled
peak file” in IDR.
22. In the ENCODE pipeline, IDR is run on multiple versions of
peak sets, including true replicates, pseudoreplicates, and
pooled pseudoreplicates. These peak files are all compared to
determine an optimal peak set (the largest number of peaks)
and a conservative peak set (the fewest peaks). Please see the
ENCODE pipeline website for further discussion of how these
comparisons with IDR can aid in creating robust and repro-
ducible region sets.
23. We recommend producing both fold-change enrichment sig-
nal tracks and p-value signal tracks. The p-value signal tracks
can often be more helpful in visualization, while downstream
analyses are best done with the fold-change enrichment signal
values.
24. The count tracks can be utilized in downstream base-pair
resolution analyses, such as in deep learning with BPNet [15].
25. The mitochondrial genome is very accessible as it has no
nucleosomes and is a known source of poor library generation.
It is important to check the fraction of mitochondrial mapped
reads before a deep sequencing run if possible to determine
how much sequencing is necessary to get the desired read
depth on the non-mitochondrial genome.
26. Please note that samtools flagstat metrics track the number of
alignments in the file, not the read count. We provide the
SAMstats package as a way to calculate read counts in align-
ment files, to accurately capture read counts at each stage of
data processing.
27. Library complexity measures are PBC1, PBC2, and NRF.
PBC1 should be > 0.9, PBC2 > 10, and NRF > 0.9. The file
produced has this information in the following columns:
320 Daniel S. Kim

TotalReadPairs [tab] DistinctReadPairs [tab]


OneReadPair [tab] TwoReadPairs [tab] NRF=Di-
stinct/Total [tab] PBC1=OnePair/Distinct [tab]
PBC2=OnePair/TwoPair
28. To run library complexity measures for a single-end library,
change the command to the following:
bedtools bamtobed -i ${FILT_BAM_FILE} | awk
’BEGIN{OFS="\t"}{print $1,$2,$3,$6}’ | grep -v
’chrM’ | sort | uniq -c | awk ’BEGIN{mt=0;m0=0;
m1=0;m2=0} ($1==1){m1=m1+1} ($1==2){m2=m2+1}
{m0=m0+1} {mt=mt+$1} END{printf "%d\t%d\t%d\t%d
\t%f\t%f\t%f\n",mt,m0,m1,m2,m0/mt,m1/m0,m1/
m2}’ > ${PBC_FILE_QC}
29. When cross-correlating forward strand alignment start posi-
tions to reverse strand alignment start positions, you can gen-
erate two peaks of cross-correlation, one that corresponds to
the read length and the other to the average fragment length.
This is more useful in ChIP-seq experiments, as there is a
characteristic fragment length (the length of DNA covered by
the DNA-binding protein), but can also be a useful metric in
ATAC-seq to confirm that the library is enriched for genomic
DNA fragments. The score file for cross-correlation has the
following columns:
# Filename <tab> numReads <tab> estFragLen
<tab> corr_estFragLen <tab> PhantomPeak <tab>
corr_phantomPeak <tab> argmin_corr <tab> min_corr
<tab> phantomPeakCoef <tab> relPhantomPeakCoef
<tab> QualityTag
The following columns are most important:
• Normalized strand cross-correlation coefficient (NSC) =
col9 in outFile
• Relative strand cross-correlation coefficient (RSC) = col10
in outFile
• Estimated fragment length = col3 in outFile, take the top
value.
30. For subsampling a single-end library:
zcat ${FINAL_TA_FILE} | grep -v “chrM” | shuf -n
${NREADS} --random-source=<(openssl enc
-aes-256-ctr -pass pass:$(zcat -f ${FINAL_TA_-
FILE} | wc -c) -nosalt </dev/zero 2>/dev/null) |
gzip -nc > ${SUBSAMPLED_TA_FILE}
31. We recommend using deepTools to calculate a Jensen-
Shannon distance, which provides a measure of signal-to-
noise ratio in the sequencing library. Please see deepTools
ATAC-seq Data Processing 321

references for more details. We filter out blacklist aligned reads


before running JSD.
32. GC content sequence bias is a known phenomenon in next-
generation sequencing methods for chromatin [16]. This anal-
ysis can be used to determine how much GC bias is in your
experiment, and whether GC bias correction may be necessary.
In practice, GC bias is ubiquitous and should always be taken
into account in downstream analyses.
33. In a paired-end experiment, the fragment lengths generated
from transposition can be determined after alignment. As the
transposition sites (the ends of the fragments) are at accessible
DNA locations, fragments can be generated within
nucleosome-free regions (NFRs), or can span one or more
nucleosomes. NFR fragments are the most common, while
mono-nucleosomal fragments (fragments spanning one nucle-
osome) and di-nucleosomal fragments are rarer but often pres-
ent in a good library. These fragment patterns at genomic loci
can provide additional information about chromatin structure
and nucleosome positioning at loci of interest, such as with
V-plots [17]. To determine if such analyses may be possible, a
fragment length distribution plot can be useful to determine if
mono-nucleosomal and di-nucleosomal fragments are present.
Observationally, >40% of reads fall in NFR regions (fragment
length 0–150), and mono-nucleosomal reads may be approxi-
mately 40% of the NFR total.
34. The TSS enrichment is an important measure of signal-to-
noise ratio within an ATAC-seq dataset. To calculate the TSS
enrichment, use the following procedure (full code for this
procedure can be found at https://github.com/ENCODE-
DCC/atac-seq-pipeline/blob/master/src/encode_task_tss_
enrich.py). For the TSS file, take a standard genomic annota-
tion file (such as a GTF file), select only the protein coding
genes, and use the start positions of the genes. Using these
TSSs, generate the read pileups around each TSS, from
2000 bp downstream to 2000 bp upstream. Combine all
these read pileups to get the aggregate read profile around
TSSs. Calculate the background read pileup as the average
read pileup in the 100 bps on either edge. Then normalize
the aggregate profile by dividing aggregate plot by the back-
ground read pileup to get a fold change signal. Please note that
TSS enrichment values are dependent on the reference used.
For hg19 refSeq, a TSS enrichment >10 is ideal, though 6–10
is acceptable by ENCODE standards. For GRCh38 refSeq, >7
is ideal, though 5–7 is acceptable. For mm9 GENCODE, >7 is
ideal, though 5–7 is acceptable. For mm10 refSeq, >15 is ideal,
though 10–15 is acceptable. Please see the ENCODE data
quality standards (https://www.encodeproject.org/atac-seq/
322 Daniel S. Kim

#standards) for the latest updates to TSS enrichment


thresholds.
35. The Fraction of Reads in Peaks (FRiP) score is a measure of
signal-to-noise. To calculate the FRiP, take your finalized peak
set and determine the fraction of reads that fall into these peak
regions. The higher the FRiP, the better signal-to-noise of the
dataset. A strong FRiP score, particularly important in foot-
printing analyses, is 0.4 or higher.
36. The FRiP calculation can also be used with any region set
desired. It may be of interest to calculate fraction of reads
within all known open chromatin regions, or in blacklist
regions.
37. Nt and Np should be within a factor of 2 of each other. If more
than 2, this suggests that the replicates are very different in
quality. N1 and N2 should be within a factor of 2 of each other.
If more than 2, this also suggests that the replicates are very
different in quality. Note that these metrics are simply based on
how many peaks were discovered in IDR analysis.

References
1. Buenrostro JD, Giresi PG, Zaba LC et al 7. Langmead B, Salzberg SL (2012) Fast gapped-
(2013) Transposition of native chromatin for read alignment with Bowtie 2. Nat Methods 9:
fast and sensitive epigenomic profiling of open 357–359. https://doi.org/10.1038/nmeth.
chromatin, DNA-binding proteins and nucleo- 1923
some position. Nat Methods 10:1213–1218. 8. Li H, Handsaker B, Wysoker A et al (2009)
https://doi.org/10.1038/nmeth.2688 The sequence alignment/map format and
2. Galas DJ, Schmitz A (1978) DNAse footprint- SAMtools. Bioinformatics 25:2078–2079.
ing: a simple method for the detection of https://doi.org/10.1093/bioinformatics/
protein-DNA binding specificity. Nucleic btp352
Acids Res 5:3157–3170. https://doi.org/10. 9. Quinlan AR, Hall IM (2010) BEDTools: a
1093/nar/5.9.3157 flexible suite of utilities for comparing genomic
3. Hesselberth JR, Chen X, Zhang Z et al (2009) features. Bioinformatics 26:841–842. https://
Global mapping of protein-DNA interactions doi.org/10.1093/bioinformatics/btq033
in vivo by digital genomic footprinting. Nat 10. (2020) Picard Toolkit. Broad Institute
Methods 6:283–289. https://doi.org/10. 11. Feng J, Liu T, Qin B et al (2012) Identifying
1038/nmeth.1313 ChIP-seq enrichment using MACS. Nat Pro-
4. Li Z, Schulz MH, Look T et al (2019) Identifi- toc 7:1728–1740. https://doi.org/10.1038/
cation of transcription factor binding sites nprot.2012.101
using ATAC-seq. Genome Biol 20:45. 12. Kharchenko PV, Tolstorukov MY, Park PJ
https://doi.org/10.1186/s13059-019- (2008) Design and analysis of ChIP-seq experi-
1642-2 ments for DNA-binding proteins. Nat Biotech-
5. ENCODE Project Consortium, Moore JE, nol 26:1351–1359. https://doi.org/10.
Purcaro MJ et al (2020) Expanded encyclopae- 1038/nbt.1508
dias of DNA elements in the human and mouse 13. Ramı́rez F, Ryan DP, Grüning B et al (2016)
genomes. Nature 583:699–710. https://doi. deepTools2: a next generation web server for
org/10.1038/s41586-020-2493-4 deep-sequencing data analysis. Nucleic Acids
6. Martin M (2011) Cutadapt removes adapter Res 44:W160–W165. https://doi.org/10.
sequences from high-throughput sequencing 1093/nar/gkw257
reads. EMBnet J 17:10. https://doi.org/10. 14. Amemiya HM, Kundaje A, Boyle AP (2019)
14806/ej.17.1.200 The ENCODE blacklist: identification of prob-
lematic regions of the genome. Sci Rep 9:9354.
ATAC-seq Data Processing 323

https://doi.org/10.1038/s41598-019- methods for chromatin biology. Nat Rev


45839-z Genet 15:709–721. https://doi.org/10.
15. Avsec Ž, Weilert M, Shrikumar A et al (2021) 1038/nrg3788
Base-resolution models of transcription-factor 17. Henikoff JG, Belsky JA, Krassovsky K et al
binding reveal soft motif syntax. Nat Genet 53: (2011) Epigenome characterization at single
354–366. https://doi.org/10.1038/s41588- base-pair resolution. PNAS 108:18318–
021-00782-6 18323. https://doi.org/10.1073/pnas.
16. Meyer CA, Liu XS (2014) Identifying and miti- 1110731108
gating bias in next-generation sequencing
Chapter 18

Deep Learning on Chromatin Accessibility


Daniel S. Kim

Abstract
DNA accessibility has been a powerful tool in locating active regulatory elements in a cell type, but
dissecting the combinatorial logic within these regulatory elements has been a continued challenge in the
field. Deep learning models have been shown to be highly predictive models of regulatory DNA and have
led to new biological insights on regulatory syntax and logic. Here, we provide a framework for deep
learning in genomics that implements best practices and focuses on ease of use, versatility, and compatibility
with existing tools for inference on DNA sequence.

Key words DNA accessibility, ATAC-seq, DNase-seq, Deep learning, Machine learning

1 Introduction

DNA accessibility assays continue to be powerful tools in locating


active regulatory elements in a cell type, as these assays work
genome-wide and mark regions where DNA binding proteins are
interacting with the genome to produce cell-type-specific gene
regulation [1–5]. However, further analysis on accessible regions
is necessary to dissect regulatory logic encoded in these regulatory
regions. Deep learning models have emerged as highly predictive
models of regulatory DNA [6]. These models learn nonlinear
predictive functions that map DNA sequence to genome-wide
profiles of regulatory activity by learning predictive sequence fea-
tures and their higher order combinations. These models produce
highly robust and accurate mappings between DNA sequence to
molecular phenotypes like accessibility, suggesting that these mod-
els capture the higher order regulatory logic that produces accessi-
bility and regulatory potential from a DNA sequence [7–9]. While
deep learning models have been criticized for their opaqueness in
interpretation, we and others have developed powerful interpreta-
tion methods to extract rules of cis-regulatory logic from these
black-box models [10–14]. These models in conjunction with

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_18,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

325
326 Daniel S. Kim

validated interpretation tools have provided new insights into DNA


sequence syntax and logic, and have become established tools in
computational biology.
Here, we provide a framework and best practices for building
deep learning models for genomics as well as guides for applying
interpretation tools to these models. Deep learning continues to be
a rapidly evolving field, so here we provide a high-level perspective.
We note that there are many reasonable ways to implement a deep
learning pipeline, including decisions around the programming
language and the deep learning framework. Here we have chosen
to share best practices that attempt to balance compatibility (both
backward and toward the future), ease of use, and versatility. To this
end, we provide a framework that is best implemented in Python
3, with Tensorflow 2.0 and TF Keras. Even more specifically, this
protocol is for building a classification model from DNA sequence
input to predict binarized accessibility (e.g., is an accessible peak
present/not present at this genomic region) in a single cell type.
Useful interpretation tools for analyzing such models are also
recommended. While we focus on a specific machine learning
problem in this protocol (mapping from DNA sequence to accessi-
bility), we think that the best practices and ideas here are widely
applicable to a variety of machine learning problems in genomics.
As such we hope that this protocol will be an effective starting point
for many possible types of deep learning in genomics.

2 Materials

This protocol assumes working knowledge of Python, bioinformat-


ics, and machine learning. As noted above, there are many ways to
do deep learning, and frameworks and tools are rapidly evolving. At
this current point in time, for deep learning frameworks we most
recommend using either Tensorflow (Keras has now become a part
of Tensorflow 2.0) or Pytorch. Both are best used in Python 3 for
compatibility with existing inference tools.
There are a variety of relevant resources for deep learning in
genomics. For plug-and-play models, the Kipoi model zoo has a
variety of models that may be of interest. For inference, the most
common tools used include DeepLIFT, SHAP, and TF-MoDISco.
Please note that you can run a deep learning pipeline without a
graphical processing unit (GPU), but it will be exponentially
slower. We recommend running training, evaluation, and inference
with a GPU.
Deep Learning on Chromatin Accessibility 327

3 Methods

3.1 Data Processing 1. Start with a set of genomic intervals, such as a set of accessible
and Data Loading regions for a cell type (see Note 1). This will be your set of
genomic intervals that are labeled as positives.
2. Collect an informative set of negative genomic intervals (see
Notes 2 and 3). This includes flanking intervals (the genomic
intervals adjacent to the positive intervals on either side, our
default is to collect three extra bins on each side), random
intervals (intervals anywhere else in the genome that are not
positives), as well as known accessible intervals that are not
accessible in your set of genomic intervals (see Note 4).
3. For the positive intervals and negative intervals selected, bin
these intervals into equal-size bins (see Note 5), using a stride
length to generate fixed-length examples across the selected
intervals (see Note 6). These bins are your genomic examples.
Default bins are 200 bp in length (e.g., an example is 200 base
pairs of genomic sequence), and our default stride length is
50 bp.
4. Set up labels for your examples. Positives should be labeled
with 1 and negatives are labeled with 0 (see Note 7).
5. Extend each example interval to your final interval length (see
Note 8). This step now adds the flanking sequences of each bin
to give more sequence context during training. The default
final length is 1000 bp. At this stage, you should have a set of
genomic intervals that are all 1000 bp in length and are each
associated with a label (1 or 0).
6. Optionally, pre-generate one-hot encodings for your regions
(see Notes 9–11). If you intend to use a standard data loader
for your desired deep learning framework, this will be necessary
to have appropriate inputs for training.
7. Build a data generator appropriate for your desired deep
learning framework. Many frameworks now provide the option
to create your own data loader if needed. If performing a
one-hot encoding on the fly, write a one-hot encoder in your
data generator to ensure the deep learning framework receives
a proper input with the label.

3.2 Train a Model 1. Before training, determine your evaluation setup. We use a
cross-validation strategy based on splitting by chromosome
(see Note 12). For a tenfold cross validation strategy, split
your chromosomes by size as equally as possible across ten-
folds, then use eight folds for training, one fold for validation,
and one fold for testing (see Note 13).
328 Daniel S. Kim

2. Choose a model architecture and implement in your desired


deep learning framework (see Notes 14 and 15).
3. Train the model using your desired deep learning framework.
Training will require your dataset, a model, a loss function, and
an optimizer. Use a Binary Cross Entropy loss (see Note 16)
with Adam optimizer (see Note 17). Default parameters for
Adam optimizer are: learning rate = 0.001, beta_1 = 0.9,
beta_2 = 0.999, epsilon = 1e-08. Set up a training regimen:
number of epochs to run the training data (default is 20) as well
as the metric to optimize (default is the loss) (see Note 18).
Adjust the parameters as desired to optimize training.

3.3 Evaluation 1. Evaluate your model using only the held-out test data (see
Notes 19 and 20). Unlike training, where only an informative
set of negative regions is used, please use the entirety of the
validation chromosomes during evaluation. Useful measures
during evaluation of a classification model include the loss,
area under the precision-recall curve (AUPRC), and area
under the receiver-operator curve (AUROC) (see Note 21).

3.4 Inference 1. As a starting point for downstream inference methods, gener-


ate base-pair level contribution scores on genomic intervals of
interest (see Notes 22–24). To determine statistical significance
of these scores, create dinucleotide shuffled versions of the
sequences and generate base-pair level contribution scores on
those sequences to get an empirical null distribution of contri-
bution scores (see Note 25). We recommend tools such as
DeepLIFT, SHAP, or backpropagated gradients (see Note
26), some of which will also handle null sequence generation
and significance scoring for you.
2. For motif scanning, you can utilize your deep learning frame-
work to quickly scan using your database of interest. The usual
position-weight matrix (PWM) scan is a convolutional opera-
tion, so you can utilize the deep learning framework by initi-
alizing a convolutional layer with the weights set as the PWM
weights for each motif in your database (see Note 27).
3. For de novo motif discovery, use TF-MoDISco to take contri-
bution scores and find enriched patterns.
4. Combinatorial analyses can be performed on enriched motifs of
interest (see Note 28). You can utilize the model predictions on
the original sequence compared to combinatorial scrambling of
identified motif sites in the sequence. Motif scrambling
involves taking the underlying motif match sequence and shuf-
fling that sequence in place to generate a sequence that does
not contain that motif at that location anymore. Additionally,
you can use the Deep Feature Importance Map (DFIM)
method by scrambling motif sites and determining how the
Deep Learning on Chromatin Accessibility 329

contribution scores change compared to the original sequence.


Both of these analyses can give you further insight and hypoth-
eses of combinatorial logic in DNA sequence.

4 Notes

1. Please see the previous chapter in this book for a processing


pipeline to generate peaks from an accessibility assay (such as
ATAC-seq).
2. It is often easier in downstream processing to set up different
files for positives and negatives. This can allow you greater
control over the ratio of positives to negatives by adjusting
how many examples come from each file in training.
3. We generally train with an equivalent number of positives and
negatives, so we select the negative set of examples to approxi-
mately equal the number of positive examples. This can be
adjusted to better optimize training and ensure the model has
seen an appropriate diversity of negative examples.
4. Note that training uses an informative set of negatives, but for
evaluation of the model it is best to evaluate the model against
genome-wide data. It is important to keep this consideration in
mind early as you set up your dataset, so that it is easy to switch
to a genome-wide dataset for downstream evaluation.
5. Deep learning models require a fixed size input. As such, region
sets must be binned to generate these fixed length inputs. To
adequately cover the regions of interest without generating too
many similar examples, a stride length of 50 bp is used between
bins. Bin size and stride length can be adjusted as desired.
6. The 200 bp bins are the “active” DNA sequence of interest in
the example. We consider the bin positive if more than 50% of
the bin overlaps a positive interval. Later, the region is
extended with additional flanking sequence to provide more
context around the “active” sequence to the model. It is
important to note that labeling is only done using the “active”
sequence, i.e., the 200 bp bin, which acts as a way to focus the
model on learning important features in the middle of the
sequence, utilizing surrounding contextual DNA sequence.
7. Note that in the case of a single positives region set (single task
model), this is trivial – simply label your positive bins with
1 and your negative bins with 0 – but in the case of multitask
models, this will be an important step to generate an appropri-
ate label set for each task (each region set), most commonly in
an array of dimensions (n, n_task) where n is the number of
examples and n_task is the number of tasks.
330 Daniel S. Kim

8. Please remember to check that extending the region intervals


does not cause the interval to exceed chromosome boundaries.
9. By convention, the one-hot encoding order is alphabetical (i.e.,
A is [1, 0, 0, 0]; C, is [0, 1, 0, 0]; G is [0, 0, 1, 0]; and T is [0,
0, 0, 1]).
10. We recommend not generating the one-hot encodings in
advance. This is quickly done with an interval lookup for the
alphabetical sequence and then a lookup dictionary to convert
to a one-hot encoding. This can be very helpful in reducing file
sizes and decreasing I/O time. This may require a custom data
loader for your model, though many deep learning frameworks
do have one-hot encoding data layers available.
11. There are a variety of options for storing your dataset (text files,
Python numpy arrays, etc.). For the purposes of this protocol
we recommend HDF5 files, as HDF5 is a standard format well
suited for machine learning datasets and widely used.
12. Think early about how you will manage chromosome splits
(our recommended setup for cross-validation of models). An
easy way to do so is to generate separate dataset files for each
chromosome, so that you only load specific chromosomes in
your data generator for various stages (training, validation,
testing). We recommend creating train/validation/test splits
by chromosome as to prevent train/test contamination from
overlapping examples on the same chromosome.
13. A strategy for generating n equally sized folds is to do the
following. Order your chromosomes by size (largest first). Set
up n “buckets” for chromosomes. Place the first
n chromosomes into the n buckets. Then, do the following
iterative process until there are no more chromosomes: (1) find
the bucket that is smallest in terms of examples and (2) add the
largest remaining chromosome to that bucket.
14. Convolutional neural networks (CNNs) have been very effec-
tive for DNA sequence to accessibility models. We recommend
starting with a CNN architecture and adjusting from there as
desired. A useful tuned architecture is Basset [9].
15. If running a regression model, simply remove the final activa-
tion layer (often a softmax or sigmoid layer). This exposes the
logits as the final layer that can be float values
(vs. probabilities).
16. Loss functions are an active area of research in deep learning.
For classification, Binary Cross Entropy is an effective loss
function. Check whether the loss function is designed to oper-
ate after activation (after the sigmoid/softmax layer) or before
(on logits). For regression, mean-squared error loss is a reason-
able starting point.
Deep Learning on Chromatin Accessibility 331

17. Optimizers are an active area of research in deep learning.


Effective optimizers used in deep learning on sequence include
Adam optimizer [15] and RMSprop [16]. Optimizer para-
meters for Adam are as given in the Methods, default para-
meters for RMSprop are: learning rate = 0.002, decay = 0.98,
momentum = 0.0.
18. Many deep learning frameworks come with a training function
that encapsulates the training process. This can simplify model
training but also make the process opaque. A fuller description
of deep learning requires much more explanation, but a brief
high-level explanation is provided here. Within this training
routine, the training data is fed to the model in batches. One
full iteration through a training dataset is an epoch. The data is
pushed forward through the model (interacting with model
weights) to generate predictions, which are compared to the
labels. The difference between the labels and predictions are
then pushed backward through the model (backpropagation)
based on the loss function and optimizer to adjust the model
weights. The next batch is then pushed forward through the
model, interacting with the updated weights, and so on. This
continues until the training data is all used, at which point the
model evaluates the performance of the model using the vali-
dation data. If the performance of the model is still improving,
the training routine will run another epoch and evaluate again
with the validation data. This continues until the routine has
hit the maximum number of epochs or is no longer improving
in performance on the validation data (most commonly based
on early stopping criteria).
19. Note that in evaluation, there is no loss function or optimizer
as these are only used in training.
20. We believe it is very important to determine performance of
the model in a genome-wide setting. This provides an accurate
view on how the model would perform in the true setting of
genome-wide prediction. Genome-wide evaluation also carries
the additional benefit of being a more comparable metric
across studies. As different studies will select their positives or
negatives in different ways, metrics calculated on subsets of the
genome can be biased or an inaccurate measure of true perfor-
mance on the genome.
21. AUROC is known to be a very inflated metric in genome-wide
evaluation, as the number of negatives vastly outweighs the
number of positives. We recommend AUPRC as the more
accurate and meaningful metric to determine performance.
High performing accessibility models have AUPRCs of 0.6 or
more [17].
332 Daniel S. Kim

22. Sequences of interest can be dynamically accessible regions,


accessible regions around a locus of interest, accessible regions
that are known to be bound by a DNA binding protein, or any
other subset of interest. To generate contribution scores, you
can start with the gradients propagated back onto the input.
Given the labeling strategy above, it is important to only
perform interpretation on the actual example sequence (e.g.,
if the original bin was 200 base pairs, then interpretation
should only be performed on those 200 base pairs).
23. Various studies in the field have looked at convolutional filters
to interpret what the model has learned. While convolutional
filters have shown pattern weights that look like motifs, we do
not recommend analysis on weights in the model, as the model
learns a representation of the input DNA sequence that is
distributed across the entire layer and can be hard to interpret
in an isolated convolutional filter. As such, we recommend
backpropagation-based methods that reaggregate contribution
information back onto the DNA sequence itself, which gives a
more comprehensive view of the sequence features in relation
to each other.
24. Base pair contribution scores can also be used to dissect genetic
variation by taking known single nucleotide polymorphisms
(SNPs) and adjusting the SNP to its allelic form. Of note,
variant analyses in deep learning can be unstable, as the
model was trained to predict accessibility and not variant
effects. Interpret results carefully and utilize multiple inference
methods when analyzing variants.
25. For inference methods, it is often very helpful to have empirical
reference distributions to determine if your contribution scores
or downstream results are significant. We recommend using
dinucleotide shuffled sequences as the empirical null
sequences, which aims to maintain the distribution of dinu-
cleotides in the sequence rather than just the sequence content.
This is a more constrained but more biologically accurate null
sequence.
26. We do not recommend using the Integrated Gradients
method, as we have found it tends to obscure cell-type-specific
features (which is important in highlighting cross-cell type
differences) and appears to highlight general features in DNA
sequences.
27. When scanning for motifs on contribution scores with a motif
database, it is important to also scan on the original sequence as
well. PWMs are designed to give log-odds on sequence assum-
ing a one-hot sequence without weights on the base pairs, and
can give high scores to poor PWM matches on contribution
scores. As such, it is important to check that the sequence is
Deep Learning on Chromatin Accessibility 333

also an appropriate match by PWM score on the original


sequence.
28. One of the greatest strengths of deep learning in genomics is
that it can build a high-performing mapping from DNA
sequence to molecular phenotypes like accessibility without
needing initial featurization. This suggests that this modeling
framework is able to capture DNA syntax effectively, including
parameters like variable spacing, motif density, and motif
counts. As such, syntax analysis will be an important area of
research with deep learning in genomics.

References

1. Boyle AP, Davis S, Shulha HP et al (2008) 9. Kelley DR, Snoek J, Rinn J (2016) Basset:
High-resolution mapping and characterization learning the regulatory code of the accessible
of open chromatin across the genome. Cell genome with deep convolutional neural net-
132:311–322. https://doi.org/10.1016/j. works. Genome Res gr.200535.115. https://
cell.2007.12.014 doi.org/10.1101/gr.200535.115
2. Song L, Crawford GE (2010) DNase-seq: a 10. Shrikumar A, Greenside P, Kundaje A (2017)
high-resolution technique for mapping active Learning important features through propa-
gene regulatory elements across the genome gating activation differences.
from mammalian cells. Cold Spring Harb Pro- arXiv:170402685 [cs]
toc 2010:pdb.prot5384. https://doi.org/10. 11. Lundberg SM, Lee S-I (2017) A unified
1101/pdb.prot5384 approach to interpreting model
3. Thurman RE, Rynes E, Humbert R et al predictions. In: Advances in neural information
(2012) The accessible chromatin landscape of processing systems. Curran Associates, Inc.
the human genome. Nature 489:75–82. 12. Greenside P, Shimko T, Fordyce P, Kundaje A
https://doi.org/10.1038/nature11232 (2018) Discovering epistatic feature interac-
4. Roadmap Epigenomics Consortium, tions from neural network models of regu-
Kundaje A, Meuleman W et al (2015) Integra- latory DNA sequences. Bioinformatics 34:
tive analysis of 111 reference human epigen- i629–i637. https://doi.org/10.1093/bioin
omes. Nature 518:317–330. https://doi.org/ formatics/bty575
10.1038/nature14248 13. Avsec Ž, Weilert M, Shrikumar A et al (2021)
5. Buenrostro JD, Giresi PG, Zaba LC et al Base-resolution models of transcription-factor
(2013) Transposition of native chromatin for binding reveal soft motif syntax. Nat Genet 53:
fast and sensitive epigenomic profiling of open 354–366. https://doi.org/10.1038/s41588-
chromatin, DNA-binding proteins and nucleo- 021-00782-6
some position. Nat Methods 10:1213–1218. 14. Shrikumar A, Tian K, Avsec Ž, et al (2020)
https://doi.org/10.1038/nmeth.2688 Technical note on transcription factor Motif
6. Eraslan G, Avsec Ž, Gagneur J, Theis FJ discovery from importance scores
(2019) Deep learning: new computational (TF-MoDISco) version 0.5.6.5.
modelling techniques for genomics. Nature arXiv:181100416 [cs, q-bio, stat]
Reviews Genetics 20:389–403. https://doi. 15. Kingma DP, Ba J (2017) Adam: a method for
org/10.1038/s41576-019-0122-6 stochastic optimization. arXiv:14126980 [cs]
7. Alipanahi B, Delong A, Weirauch MT, Frey BJ 16. Hinton G (2012) Neural networks for machine
(2015) Predicting the sequence specificities of learning, Lecture 6
DNA- and RNA-binding proteins by deep 17. Kim DS, Risca V, Reynolds D et al (2020) The
learning. Nat Biotechnol 33:831–838. dynamic, combinatorial cis-regulatory lexicon
https://doi.org/10.1038/nbt.3300 of epidermal differentiation. bioRxiv
8. Zhou J, Troyanskaya OG (2015) Predicting 2020.10.16.342857. https://doi.org/10.
effects of noncoding variants with deep 1101/2020.10.16.342857
learning-based sequence model. Nat Methods
12:931–934. https://doi.org/10.1038/
nmeth.3547
INDEX

A Barcode ............................................ 9, 59, 156, 157, 159,


161, 165, 168, 173, 174, 176–177, 181, 182,
Absolute occupancy ............................................. 121–151 188, 191, 195–198, 204, 211–213, 216, 217,
Adaptase................................................................ 235, 242 220–226, 235, 236, 241, 246, 254, 260–263, 280
Adapters .................................................. 4–6, 8, 9, 15, 16,
Barcode collision ........................................................... 182
23, 27, 28, 32, 34, 36, 40, 54, 58, 64, 90–93, 104, Basset ............................................................................. 330
109, 111, 123, 125, 141, 146, 188, 191, 199, BedGraph ....................................... 33, 37, 113, 309, 310
200, 207, 210, 221, 294, 306, 307, 315
Bedtools............................. 306, 308, 309, 312, 317, 320
Adapter trimming ......................................................... 110 Beta-mercaptoethanol......................................... 125, 126,
Agarose ....................................................... 24, 25, 28, 29, 128–130, 138, 140
31, 33, 36, 86, 89, 91, 128
BigWig ...................................................64, 113, 219, 310
Alignment ................................................... 32, 64, 66, 82, BioAnalyzer ............................................6, 11, 42, 47, 54,
110, 111, 115, 135, 212, 215, 218, 220, 261, 59, 73, 81, 104, 109, 128, 134, 191, 210, 221,
306–309, 311–313, 315–321
243, 244, 254, 271, 273, 274, 279, 297, 300
Allele ..................................................................... 276, 281 Biotin .....................................................45, 206, 290, 301
AluI .............................................127, 142, 143, 145, 151 Biotin-14-dCTP ..................................................... 41, 296
Amplicon ................................65, 69, 106, 109–111, 180 Biotinylation 73 ........................................................78, 83
AMPure XP ..........................................24, 42, 56, 64, 66,
Bisulfite .............................. 108–110, 232–235, 239, 244
74, 88, 93, 95, 96, 103, 128, 133, 134, 253, 297 Blacklist................................ 82, 280, 309, 318, 321, 322
Antigen ................................................................. 250, 251 Blacklisted.................................................... 275, 309, 310
ArchR ........................ 195, 212, 220, 225, 264, 276, 281
Blocking...................................................... 166, 173, 174,
Area under the precision-recall curve 176, 190, 197, 204, 205, 255
(AUPRC) .................................................. 328, 331 Bovine papillomavirus (BPV) ......................................... 25
Area under the receiver-operator curve
Bovine serum albumin (BSA)............................... 60, 128,
(AUROC) ................................................. 328, 331 164, 166, 176, 182, 201, 202, 252, 271, 290
ASAP-seq .............................................................. 249–266 Bowtie.......................................................... 195, 217, 218
Assay for transposase-accessible chromatin using Bowtie2............................32, 64, 67, 135, 306, 307, 315
sequencing (ATAC-seq)....................3–17, 23, 24,
Bradford................................................................ 286, 288
33–35, 40, 53, 63, 71–84, 101, 109, 122, 155, Bustools ....................................................... 254, 262, 263
188, 219, 222, 225, 250, 251, 270, 274, 286, bwa-meth....................................................................... 104
290, 294, 305–322, 329
B & W buffer.................................... 79, 89, 93, 194, 206
ASTAR-seq .................................................................... 189
B/W/T buffer ...............................................89, 194, 206
ATAC-RSB buffer ................................................ 4, 5, 102
ATAC-see......................................................285–290, 294 C
ATP ...................................................................86, 89, 140
CaCl2 ................................................ 56, 57, 60, 192, 193
B cDNA................156, 177, 196, 197, 199, 206–210, 224
C. elegans ......................................................................... 17
Bacteria ........................................................ 63, 88, 90, 91 Cell lysis ........................................ 77, 105, 118, 139, 202
Bactopeptone .................................................................. 74 Cellranger ................. 254, 261, 263, 275, 276, 278–280
Binary alignment map (BAM)........ 32, 33, 82, 112, 114,
Cell wall ............................................................16, 17, 106
135–137, 212, 215, 218, 220, 307, 308, 311–314 Centered-log-ratio (CLR) ............................................ 264
BamHI ........................................................ 127, 131, 133, Chromatin .................................3, 21, 39, 53, 63, 71, 85,
137, 142–144, 146, 151
101, 121, 155, 188, 232, 249, 270, 285, 293, 305

Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols,
Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7,
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer
Nature 2023

335
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
336 Index
Chromatin accessibility ................ v, 13, 71–96, 101–118, Diploid.......................................................................7, 116
155, 156, 187–227, 232, 233, 249–266, Disuccinimidyl glutarate (DSG)..................................... 60
269–281, 285–290, 293–301, 325–333 Dithiothreitol (DTT)................................. 35, 41, 89, 91,
Chromatin immunoprecipitation sequencing 103, 138, 157, 193, 200–202, 253, 286–289, 296
(ChIP-Seq) ............ 24, 33, 34, 60, 101, 317, 320 DNA fragmentation ........................................................ 45
Chromatin interactions.............................................85–96 DNA LoBind tube .................................. 47, 56, 200, 300
Chromatin looping ......................................................... 86 DNA methylation ............... 39, 102, 106, 156, 231–246
Chromium ....................................................253, 270–272 DNA Polymerase I .........................................41, 296, 301
Chromosome sizes ...................................... 310, 311, 314 DNA replication ........................................................72, 73
Circularization................................................................. 94 DNase ......................... 3, 4, 7, 22, 39, 54, 128, 139, 155
cis-regulatory elements (cREs) ........................... 3, 4, 101, DNase-seq .................................................... 4, 39, 40, 53,
108, 155, 188 54, 63, 71, 101, 155, 294
CITE-seq .............................................189, 252, 264, 265 DNse hypersensitivity (DHS)...................................39, 40
Click-IT .............................................................. 74, 78, 83 dNTP ............................................. 6, 13, 41, 88, 93, 165,
CO2 ............................................................................56, 75 166, 192, 203, 208, 234, 235, 241, 296, 301
Combinatorial indexing ......................156, 188, 189, 198 DOGMA-seq................................................................. 189
Confocal .....................................288, 289, 296, 298, 301 Doublets ..............................................197, 198, 220–222
Convolutional neural networks (CNNs) ..................... 330 Dounce ................................................164, 174, 237, 238
Coomassie.......................................................33, 286, 288 Droplet ....................................................... 155, 156, 188,
CoTECH ....................................................................... 189 189, 222, 249, 251, 252
Covaris .............................................................42, 45, 104, Drosophila ............................................... 72, 76, 105, 116
108, 109, 128, 133, 297, 298 Drosophila melanogaster (D. melanogaster) 72–74, 76, 82
CpG ............................................................ 103, 105–107, dsciATAC-seq ................................................................ 189
112, 113, 115, 116, 232, 233 dSMF .................................................................... 101–118
Cross-correlation......................................... 306, 312, 320 dTTP....................................................................... 41, 296
Crosslinked ...........................................17, 44, 48, 60, 65, Dynabeads .............................................88, 103, 192, 206
105, 182, 195, 196, 206, 300
Cross-linking .............................................................92–93 E
Cross-validation.................................................... 327, 330 EB buffer ..........................................................84, 88, 207
CTCF .................................................................... 115–117 EDTA....................................................33, 42, 44, 48, 50,
CuSO4 ............................................................................. 78 55, 65, 75, 86, 89, 92, 93, 103, 125, 127, 128,
Cutadapt ......................................... 64, 66, 104, 306, 307
133, 139, 141, 165, 174, 175, 182, 192–195,
Cut sites ............................. 144, 146–151, 309, 317, 318 235, 287–289, 297, 298
CutSmart .................... 93, 128, 131, 132, 140, 179, 180 EGTA...................................................................... 55, 127
Cytosolic buffer................... 41, 44, 48, 50, 51, 296, 298
EM-seq .........................................................103, 108–114
ENCODE............ 14, 54, 214, 305–307, 309, 310, 315,
D
317–319, 321
dATP ...................................................... 41, 294–296, 298 End repair ................... 28, 46, 49, 51, 58, 109, 141, 299
dCTP .....................................................41, 166, 177, 296 Enhancers ...............................................3, 14, 28, 34, 46,
Deep learning ...............................................319, 325–333 53, 58, 71, 85, 86, 108, 129, 134, 232, 285, 299
DeepTools ........................................ 33, 59, 64, 306, 320 Epigenome .................................................................... 156
Demultiplex ................................ 254, 260–262, 275, 278 Escherichia coli (E. coli) ........................... 65, 68, 286, 288
Desalting............................................................... 182, 190 Ethidium bromide ....................................................31, 33
Desulphonation.................................................... 234, 240 Ethylene glycol bis(succinimidyl succinate) (EGS)....... 60
3D genome.................................................................... 285 5-ethynyl-2′-deoxyuridine (EdU)...............72–76, 82, 83
dGTP ...................................................................... 41, 296 Euchromatin......................................................... 285, 293
4′,6-diamidino-2-phenylindole (DAPI) .................54–59, Exonuclease ......................... 40, 123, 143, 146, 148, 235
252, 287, 289, 294–296, 298 Exo/rSAP .................................................... 235, 241, 245
Digitonin ............4, 6, 16, 102, 192, 201, 202, 252, 256
Dimensionality reduction .................................... 225, 265 F
Dimethyl Formamide (DMF) ................................ 6, 164, FASTA ............................................................67, 262, 280
192, 194, 202, 287 FASTQ...........................32, 66, 212–214, 217, 307, 314
Dinucleosomal...................................................... 221, 222 FASTQC..............................................32, 64, 66, 82, 261
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
Index 337
Fetal bovine serum (FBS) ........56, 88, 91, 252, 271, 272 High Salt Buffer .................. 42, 45, 46, 49, 50, 297, 299
Fetal Calf Serum.............................................................. 74 HindIII ................................................127, 142, 143, 151
FFPE ..........................................................................49–51 HiSeq ...................................................................... 95, 167
Ficoll gradient ................................................................... 7 Histone ...................................................3, 21, 24, 34, 40,
FITC .............................................................................. 294 63, 72, 121, 122, 156, 293
Fixation ................................48, 68, 91–92, 95, 200–202, Histone modifications...............................................21, 34
251–253, 255–257, 271–273, 279, 289, 296 H3K9me3........................................................... 33, 34, 60
Flowmi ......................................................... 253, 271, 273 H3K27me3 ...............................................................55, 60
Fluorescein .................................................. 294, 296, 298 Hoechst ....................................................... 238, 294, 296
Fluorescence-activated cell sorting (FACS)........... 16, 83, Homoplasmic ....................................................... 276, 281
234, 238, 239, 252, 271–273 HP1.................................................................................. 55
Fluorophores ........................................................ 285, 294 HPLC ..................................................182, 190, 235, 290
FokI......................................................158, 167, 173, 179 HT1080....................................................... 286, 288, 289
Footprints ...........13, 101, 108, 109, 111, 114–117, 305 HU................................................................................... 64
Formaldehyde............................................. 40, 41, 43, 48, Human papillomavirus (HPV) ....................................... 25
51, 55, 57, 60, 64, 65, 68, 86, 91, 139, 155, 252, Hybridization ............................................... 40, 103, 105,
255, 271, 272, 287, 289, 295–297, 300 106, 108–109, 196, 197, 204–206
Formaldehyde-assisted isolation of regulatory elements
sequencing (FAIRE-Seq) ..... 40, 63, 71, 155, 294 I
Fraction of reads in peaks (FRiP)............... 277, 314, 322 i5 .................. 9, 13, 15, 65, 90, 134, 167, 180, 181, 260
Fragment length......................................... 12, 59, 81, 82, i7 ...........................9, 13, 15, 65, 90, 167, 180, 181, 260
108, 144, 146, 210, 221, 223, 245, 314, 320, 321
IDR .................................... 309, 310, 314, 317–319, 322
FS-seq ........................................................................21–37 IGEPAL ...........................................................4, 5, 16, 55,
75, 102, 164, 194, 287, 289, 290
G
Illumina............................................ 9, 12, 13, 15, 23, 24,
GC bias ................................................................. 313, 321 27, 29, 30, 32, 42, 46, 47, 56, 59, 64, 74, 77, 82,
Gelatin ............................................................................. 75 83, 90, 95, 102, 109, 110, 116, 123, 124, 128,
Gene expression ........ v, 63, 85, 156, 212, 270, 285, 293 129, 133, 134, 142, 143, 146, 158, 167, 180,
GentleMACS ............................................... 191, 193, 200 181, 197, 211, 238, 244, 254, 259, 260, 271,
Glycerol.................................................33, 75, 86, 91, 92, 274, 278, 279, 297, 299, 300
116, 140, 157, 193, 195, 199, 200, 210, 287–289 Imidazole ............................................................ 86, 88, 89
Glycine ........................................................ 41, 43, 48, 55, Immuno-staining ................................................. 287, 289
57, 60, 65, 89, 92, 192, 193, 201, 252, 255, 271, Inaccessible ....................................................... 53–60, 108
272, 289, 296, 297 Insulators ......................................................................... 14
Glycogen................................................42, 44, 88, 93, 94 Integrated Genome Browser (IGB).................. 64, 67, 68
GpC ....................................................103, 105–107, 110, Isopropanol ....................... 42, 44, 86, 94, 128, 130, 132
112–116, 232–234, 238, 244 Isopropyl-β-d-1-thiogalactopyranoside
Graphical processing unit (GPU) ................................ 326 (IPTG) ................................................90, 286, 288

H J
H3 .............................................................. 3, 21, 163, 172 Jensen-Shannon distance (JSD) .......................... 313, 321
H4 .............................................................. 3, 21, 163, 172
H2A .............................................................................3, 21 K
Haploid .......................................................................... 116 KAc ................................................................................ 164
H2B .............................................................................3, 21 Kallisto .................................................254, 261–263, 266
HCT116 .................................41, 48, 54–56, 59, 60, 296
KAPA .................................... 13, 19, 166, 167, 181, 192,
HDAC .................................................................. 294, 295 208, 209, 211, 235, 242, 253, 254, 257, 258, 266
HDF5 ............................................................................ 330 Keras .............................................................................. 326
HEK293 ........................................................................ 222
Kipoi .............................................................................. 326
Hemocytometer .............................................................. 41 Kite.......................................................254, 261–264, 266
HEPES............................................................75, 287–289 KOAc .................................................................... 128, 132
Heterochromatin ........................... 59, 60, 285, 293, 294 KOH .............................................................127, 287–289
Heteroplasmic ...................................................... 276, 281
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
338 Index
L Multimodal................................................. 210, 231, 233,
260, 264–265, 270, 281
Lambda ................................................................. 108, 112 Multiomics................................................v, 156, 189, 211
Lamin B1 ....................................................................... 287
Ligation ................................................28, 40, 46, 49, 51, N
58, 60, 94, 123, 125, 129, 133, 134, 141, 146,
156, 165, 176, 179, 190, 204–206, 294, 299 NaClO4 ................................................................. 128, 132
Lineage tracing.............................................................. 263 NEBNext High-Fidelity ..................................6, 8, 9, 104
Linker.................................................... 22, 23, 86, 91–93, NEB NEXT Ultra II FS DNA library prep Kit .......24–27
96, 158, 173, 190, 198, 226 NEB Ultra II DNA library prep Kit ................ 24, 27–32,
Low loss lysis (LLL)............................255, 256, 258, 266 42, 56, 297
Nextera ..................... 15, 64, 65, 68, 167, 180, 181, 193
M NextGEM ...................................................................... 271
Next generation sequencing (NGS) ...................... 22, 23,
MACS2 ......................................... 59, 306, 309, 310, 317 45, 47, 51, 54, 57–59, 72, 73, 86, 238, 244, 294,
MAPQ ..................................................... 59, 82, 313, 316 295, 299–300
MarkDuplicates ............................................................. 308 NextSeq ...........................................................47, 82, 167,
Matplotlib............................................................. 104, 115 211, 254, 259, 271, 274, 278
Maxima H............................................165, 175, 192, 203 NicE-viewSeq ....................................................... 293–301
M.CviPI ...............................................103, 105–107, 232 Nicking enzyme assisted sequencing
mdCTP ............................................................................ 41 (NicE-seq) ............................................ 39–51, 294
mESCs .......................................................................75, 76 Nicks ..................................................40, 47–50, 294, 300
Metaprofile .................................................. 114, 116, 219 Ni-NTA............................................................... 86, 88, 91
MethylDackel ...............................................105, 112–114 NlaIII .........................................................................88, 93
Methylome ...............................................v, 101, 189, 231 NotI-HF ............................................................... 167, 180
Methyltransferase ...................................4, 102, 105, 106, NovaSeq........................................................ 95, 167, 211,
116, 232–234, 238 238, 244, 254, 259, 271, 274, 278
MgAc2 ............................................................................ 164 NP40..............................41, 55, 204, 205, 252, 256, 271
mgatk ................................. 254, 263, 272, 276, 280, 281 Nt.CviPII....................................41, 47, 49, 50, 295, 296
MgCl2 ....................................................... 5, 6, 41, 56, 57, Nuclear periphery......................................................55, 59
60, 75, 89, 102, 103, 127, 157, 192, 194, 201, Nuclei isolation .................. 7, 16, 17, 73, 106, 157–164,
202, 234, 252, 256, 271, 287, 289, 296 174, 194, 196, 226, 233–234, 238–239, 278
Micrococcal nuclease (MNase).................. 22, 40, 54, 55, Nuclei isolation buffer (NIB) .................... 164, 174, 194,
57, 72, 122, 155 201, 203–206
Microscope ................................................. 41, 43, 44, 48, Nucleoid-associated proteins (NAPs) ......................63, 64
56, 75, 76, 296, 298, 301 Nucleosome............................... 3, 17, 21–37, 39, 40, 53,
Milli-Q ............................................................41, 286, 295 63, 71, 72, 101, 106, 117, 122, 127, 148, 155,
MinElute PCR Purification Kit ..........6, 74, 78, 104, 193 188, 231–246, 258, 274, 279, 281, 305, 319, 321
MiSeq.........................................................................25, 32 Nucleosome depleted regions (NDRs)...........39, 40, 232
Mitochondria..........................................7, 12, 16, 17, 40, Nucleosome-free region (NFR) ............. 21, 22, 258, 321
212, 218, 222, 223, 250, 251, 270, 271, 274, Nucleosome Occupancy and Methylome sequencing
277, 278, 287, 311, 319 (NOMe-seq) ............................................. 101–118
Mitochondrial DNA (mtDNA)................. 255, 258–260, Nucleosome positioning................................72, 305, 321
263, 264, 266, 270, 271, 274–281 NUMT.................................................................. 275, 280
Mitochondrial fraction.................................................. 311
Mitochondriall genome ........................... 16, 40, 82, 217, O
218, 259, 263, 269–281, 319
MluCI ........................................................................88, 93 OD600 ........................................... 65, 90, 129, 139, 288
MNase-seq.................................................. 40, 53, 54, 63, Oligos ................................................... 15, 24, 30, 89–90,
71, 72, 101, 122, 155, 294 129, 157, 173, 182, 190–191, 193–199, 226,
Monarch PCR & DNA Cleanup Kit ............................. 25 236, 251, 252, 286, 288, 290, 297
Mononucleosomal .......................................................... 14 OmniATAC .......................................................... 202, 226
Mouse embryonic fibroblast (MEF) ............................ 222 Open chromatin ....................................... 4, 7, 16, 23, 35,
M.SssI ...........................................................103, 105–107 39, 53, 63, 64, 72, 101, 116, 155–183, 188, 189,
mtscATAC ..................................................................... 255 285, 295, 296, 298, 300, 301, 305, 315, 333
ORE-seq ............................................................... 121–151
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
Index 339
P Q
P5 ................... 9, 59, 158, 173, 197, 236, 253, 257, 258 Q5 ................................................... 46, 59, 129, 134, 299
P7 ................... 9, 59, 190, 197, 207–210, 237, 253, 258 qPCR ........................................................ 7, 9–13, 26, 35,
Paired-end .................................................. 13, 59, 82, 95, 56, 59, 65, 66, 104, 109, 167, 181, 183, 191,
110, 134, 142, 260, 306, 318, 321 207, 209, 211, 258, 266, 279
Paired-seq .....................................................155–183, 189 Quantification .........................11–13, 59, 109, 167, 183,
Paired-Tag ..................................................................... 189 210–212, 250, 251, 254, 258, 261, 278, 279, 295
Paraffin............................................................................. 49 Qubit .........................................6, 12, 42, 44, 47, 56, 58,
Paraformaldehyde ......................................................... 290 59, 64–66, 68, 73, 74, 81, 88, 89, 95, 104, 107,
PB buffer ......................................................................... 17 109, 128, 131, 133, 167, 179, 183, 191, 193,
PBST ..........................................................................45, 50 209–211, 244, 254, 258, 271, 273, 274, 297, 300
PCR............................................5, 25, 42, 56, 64, 74, 86,
104, 134, 157, 190, 235, 253, 273, 288, 297, 312 R
PCR duplicates ............................. 82, 112, 141, 280, 316 R................................................33, 59, 89, 135, 137, 165
Peak calling ............................ 14, 82, 219, 309, 317, 318
Read counts .......................................................... 312, 319
PEG...................................... 89, 192, 194, 203, 208, 235 Read mapping ...................................................... 111–112
Penicillin/streptomycin .................................................. 74 repli-ATAC ................................................................71–84
Permeabilization..................................251, 255–257, 279
Rescue ratio ................................................................... 314
PHAGE-ATAC.............................................................. 189 Resection ......................................................123, 146–148
Phenol................................................... 40, 42, 44–45, 86, Restriction enzyme....................... 4, 86, 93–94, 121–151
93, 94, 118, 128, 132, 140, 297, 298
Reverse cross-linking.................................................92–93
Phenol:Chloroform:Isoamyl Alcohol ................... 42, 297 Reverse transcriptase ..................156, 165, 175, 192, 203
PhiX ...........................................................................11–13 Reverse transcription (RT) primer ..................... 157, 173,
Phosphate-buffered saline (PBS) ....................... 7, 41–45,
175, 190, 195–197, 203
48, 50, 56, 57, 60, 74–76, 82, 83, 86, 92, 104, Rhodamine .................................................................... 294
107, 165, 176, 177, 182, 191, 193, 200–202, RNase A ...................................................... 42, 44, 48, 50,
234, 238, 252, 255, 271, 272, 287, 289, 290, 88, 93, 107, 128, 132, 296, 298
296–298, 301
RNase OUT ................................................ 157, 164, 165
Phusion High-Fidelity DNA Polymerase ........................ 9 RNA-seq ...................................................... 187, 188, 224
Picard .....................................................32, 220, 306, 308 Rolling circle amplification (RCA).................... 86, 94–96
Picolyl-Azide-PEG4-Biotin ......................................74, 78
Pipeline .............................................. 124, 211, 217, 220, S
244, 254, 262, 263, 305–307, 315, 319, 326, 329
PitStop2 ................................................................ 164, 165 Saccharomyces cerevisiae.............................. 124–130, 135,
PMSF ..............................................................91, 192, 206 138, 140, 142, 144, 146, 147
Position-weight matrix (PWM).................. 328, 332, 333 S-adenosylmethionine (SAM) ............................ 103, 107,
Primary antibodies ............................................... 287, 289 215, 234, 238, 311, 312
Primers.........................................6, 9, 15, 25, 27, 30, 42, SAMstats ................... 215, 218, 220, 306, 311, 312, 319
59, 91, 104, 109, 157, 167, 175, 180–182, Samtools .......................33, 64, 105, 111, 112, 135, 195,
190–191, 207, 210, 235–237, 239–242, 253 215, 217, 218, 220, 306–308, 312, 315–317, 319
Prokaryotic chromatinOpenness Profiling sequencing SbfI-HF ................................................................ 167, 179
(POP-seq) ......................................................64–68 scATAC ..................................................... 4, 17, 188, 189,
Promoters .............3, 53, 71, 85, 86, 108, 114, 232, 285 212, 223, 249–251, 254, 255, 260–262, 264,
Protease inhibitor.................. 57, 75, 157, 164, 192, 193 265, 270, 271, 273–275, 278, 280
Protect-seq.................................................................53–60 scDNase ........................................................................... 40
Proteinase K .............. 42, 44, 48, 50, 58, 107, 128, 130, Schizosaccharomyces pombe................. 124, 125, 127, 128,
131, 139, 141, 166, 177, 192, 206, 233, 297, 298 130–132, 135, 138–140, 142, 144, 146, 147, 149
Pseudoreplicates .......................................... 314, 318, 319 sciATAC-seq .................................................................. 189
pTXB1 .................................................................. 286, 288 sci-CAR-seq ................................................................... 189
pUC19.................................................................. 108, 112 SciPy............................................................................... 104
pyBigWig .............................................................. 105, 195 sci-RNA-seq................................................................... 188
Python ........................................................ 104, 105, 195, scNMT-seq .................................................................... 189
254, 262, 263, 266, 326, 330 scNOMe-seq.................................................189, 231–246
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
340 Index
Scythe............................................................................... 32 TapeStation.......................................................... 6, 11, 12,
Secondary antibodies ........................................... 287, 289 17, 66, 67, 104, 109, 128, 167, 183, 191, 193,
Seurat ..........................................195, 212, 217, 225, 264 209, 210, 254, 258
SHARE-seq .......................................................... 187–227 TD buffer ................................................. 4, 6, 8, 77, 193,
Shearing ...................................................... 104, 106, 108, 194, 210, 287, 289
123–125, 128, 133, 141, 144, 146–151 TDE1 ............................................................................... 77
Signac........................................................... 264, 276, 281 T4 DNA ligase ......... 165, 173, 176, 179, 192, 204, 205
Simian Virus 40 (SV40)............................. 21, 22, 24–27, T7 DNA ligase ................................................... 88, 89, 94
32, 33, 35–37 TEA-seq......................................................................... 189
Single-cell ..................................................... 4, 40, 55, 63, TE buffer .............................................. 42, 44–47, 49, 51,
155, 187, 231, 249, 269, 326 56, 74, 103, 128, 130–135, 198, 199, 206, 252,
Single-cell RNA-seq (scRNA-seq) ..................... 188, 189, 271, 287, 288, 297–300
212, 224, 249, 250, 262, 265 Template switching oligo (TSO) ........................ 190, 208
Single-end...............................13, 82, 110, 315–318, 320 Tensorflow ..................................................................... 326
Single molecule footprinting Terminal transferase ............................................. 166, 177
(SMF).............101–103, 105, 107–110, 115, 116 Texas Red ............................................................. 294–296
Single nucleotide polymorphism (SNP) ...................... 332 TF-MODisco........................................................ 326, 328
SMC ................................................................................. 64 Thermocycler ............................10, 73, 77, 94, 129, 133,
SnapATAC ..................................................................... 276 134, 157, 165, 167, 173, 175, 177–181, 198,
SNARE-seq ................................................................... 189 199, 203, 208, 237, 239, 241, 242, 245, 246, 288
Sodium acetate ................................................... 88, 93, 94 Thermomixer...................................................6, 8, 56, 58,
Sodium chloride (NaCl) .................................5, 6, 41, 42, 73, 82, 104, 107, 165, 166, 175–177, 191
55, 65, 75, 86, 88, 89, 91, 94, 102, 103, 157, 177, THPTA ......................................................................74, 78
192, 194, 195, 199, 201, 202, 234, 252, 256, Tissue dissociation ...................................... 193, 200, 226
271, 286–289, 296–298 Tn5 ......................................4–6, 8, 9, 15–17, 40, 63, 65,
Sodium dodecyl sulfate (SDS)..........................33, 42, 44, 68, 73, 86, 89–92, 96, 155–157, 165, 173, 175,
48, 57, 58, 75, 88, 91, 92, 94, 128, 140, 177, 182, 179, 184, 188, 193, 196, 199, 203, 210, 227,
192, 194, 287, 289, 290, 298 251, 270, 280, 285–290, 294, 305, 317
Sodium hydroxide (NaOH) ................................ 103, 286 Topologically associating domains (TADs) ................... 85
Somatic mutation.......................................................... 271 TotalSeq............................. 252, 259, 260, 262, 265, 266
Sonication ................................................... 24, 40, 45–48, Trac-looping ..............................................................85–96
58, 91, 123, 124, 288 Transcription factor...................................... 3, 32, 37, 71,
Sonicator........................42, 89, 109, 128, 133, 288, 297 72, 106, 116, 117, 232, 270
Sorbitol ................................................................. 126, 129 Transcription factor binding sites (TFBS) ...............32, 37
S phase ............................................................................. 72 Transcription start sites
Split-pool .............................................................. 188, 198 (TSSs)................................ 14, 114–116, 281, 321
SPRI beads .........................................166, 167, 177–180, Transcriptome ............................ 155–184, 187–227, 249
183, 209, 235, 241, 243, 245, 253 Transposase........................................... 4–6, 8, 15, 16, 23,
STAR.............................................................195, 214–216 40, 63, 71, 73, 77, 83, 86, 92, 96, 156, 193, 195,
Streptavidin....................... 42, 45, 49–51, 73, 74, 79, 84, 196, 250, 270, 285–290, 305, 309, 315, 317
88, 93, 94, 103, 109, 195, 196, 297, 299, 300 Transposome ........................40, 199–200, 210, 288–289
Sub-library ...........................................177, 182, 183, 206 TrimGalore ............................................................. 82, 104
Subnucleosomal .........................14, 17, 72, 82, 221, 222 Trimmomatic........................................................ 104, 110
Subsampling ........................................312, 313, 315, 320 Tris-Ac (Tris-acetate) ................... 89, 128, 164, 192, 202
Sucrose......................................16, 41, 75, 103, 157, 296 Tris-HCl ................................................... 5, 6, 41, 42, 55,
SUPERase IN.............................................. 157, 164, 165 65, 74, 75, 80, 86, 88, 89, 91, 102, 103, 128, 134,
SYBR Green ...................... 6, 9, 13, 24, 27, 30, 193, 208 157, 193, 194, 201, 202, 234, 235, 252, 256,
271, 287, 289, 296, 297
T Triton-X100 .........................................42, 45, 46, 49, 50,
TAE buffer....................................................................... 31 57, 88, 89, 164, 165, 175, 287, 289, 297, 299
tagAlign ...............................................308–314, 317, 318 TruSeq .................................................167, 180, 181, 253
Tagmentation ........................................... 65, 73, 75, 156, Trypan Blue ...........................................41, 191, 201, 273
164, 165, 174, 175, 179, 180, 182, 196, 199, TrypLE ......................................................................41, 43
209, 210, 273, 287, 289 Trypsin ................................................... 43, 56, 74–76, 82
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
Index 341
TSS enrichment.......................................... 219, 220, 222, V
223, 314, 321, 322
TSS score ....................................................................... 220 Visualization ............................... 112, 293–301, 306, 319
Tween-20 ................................................. 4, 6, 16, 26, 42,
X
74, 75, 88, 102, 192, 194, 201, 202, 252, 255,
256, 278, 287, 301 10x Genomics ............................................ 253, 255–257,
265, 271–273, 278, 279
U
Y
UCSC Genome Browser ............................ 104, 195, 219
UCSC tools ................................................................... 306 Yeast ................................................... 7, 17, 74, 105, 106,
Unique molecular identifier (UMI)................... 197, 211, 116, 127, 130, 134, 138, 286
212, 214, 216, 224, 225, 260, 261 Yeast extract............................................................ 74, 127
Universal nicking enzyme-assisted sequencing
(UniNicE-seq) ......................................49, 51, 294 Z
Uracil ........................................................... 105, 108, 127
Zymo Clean & Concentrate.................................. 15, 208
USER enzyme ................................ 28, 46, 129, 134, 299
Zymolyase..................................... 17, 127, 129, 130, 139

You might also like