Genome Engineering With Cas9 and AAV Repair Templates Generates Frequent Concatemeric Insertions of Viral Vectors

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

nature biotechnology

Article https://doi.org/10.1038/s41587-024-02171-w

Genome engineering with Cas9 and AAV


repair templates generates frequent
concatemeric insertions of viral vectors

Received: 14 September 2022 Fabian P. Suchy 1,2,10 , Daiki Karigane 1,3,4,5,10, Yusuke Nakauchi 1,3,4,
Maimi Higuchi1,2, Jinyu Zhang1,2, Katja Pekrun 2,6, Ian Hsu 1,2,7,
Accepted: 8 February 2024
Amy C. Fan 1,3,4,8, Toshinobu Nishimura 1,2, Carsten T. Charlesworth1,2,
Published online: xx xx xxxx Joydeep Bhadury 1,2, Toshiya Nishimura1,2, Adam C. Wilkinson 1,2,7,
Mark A. Kay 2,6, Ravindra Majeti 1,3,4,10 & Hiromitsu Nakauchi 1,2,9,10
Check for updates

CRISPR–Cas9 paired with adeno-associated virus serotype 6 (AAV6) is


among the most efficient tools for producing targeted gene knockins. Here,
we report that this system can lead to frequent concatemeric insertions of
the viral vector genome at the target site that are difficult to detect. Such
errors can cause adverse and unreliable phenotypes that are antithetical
to the goal of precision genome engineering. The concatemeric knockins
occurred regardless of locus, vector concentration, cell line or cell type,
including human pluripotent and hematopoietic stem cells. Although these
highly abundant errors were found in more than half of the edited cells,
they could not be readily detected by common analytical methods. We
describe strategies to detect and thoroughly characterize the concatemeric
viral vector insertions, and we highlight analytical pitfalls that mask their
prevalence. We then describe strategies to prevent the concatemeric
inserts by cutting the vector genome after transduction. This approach is
compatible with established gene editing pipelines, enabling robust genetic
knockins that are safer, more reliable and more reproducible.

Genome engineering promises curative therapies for a multitude efficient for producing site-specific gene modifications5–8. As a virus,
of genetic conditions, such as cystic fibrosis, epidermolysis bul- AAV has evolved to deliver DNA in a less toxic and more efficient
losa and most blood disorders, including sickle cell disease1–3. Still, manner than most other transfection protocols9. Typically, Cas9
the tools available to genome engineers are imperfect; there is ribonucleoprotein (RNP) is electroporated into cells and induces
room to increase editing efficiency, decrease off-target effects and a double-stranded break at the genomic target site, whereas AAV6
improve on-target fidelity4. The combination of CRISPR–Cas9 and delivers single-stranded DNA repair templates into the nucleus.
adeno-associated virus serotype 6 (AAV6) has proven to be highly The cell’s homology-directed repair machinery then repairs the

Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA, USA. 2Department of Genetics,
1

Stanford University School of Medicine, Stanford, CA, USA. 3Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA. 4Department
of Hematology, Stanford University School of Medicine, Stanford, CA, USA. 5Japan Society for the Promotion of Science, Tokyo, Japan. 6Department
of Pediatrics, Stanford University, Stanford, CA, USA. 7MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK. 8Immunology
Graduate Program, Stanford University School of Medicine, Stanford, CA, USA. 9Distinguished Professor Unit, Division of Stem Cell Therapy, Institute
of Medical Science, University of Tokyo, Tokyo, Japan. 10These authors contributed equally: Fabian P. Suchy and Daiki Karigane; Ravindra Majeti and
Hiromitsu Nakauchi. e-mail: fsuchy@stanford.edu; rmajeti@stanford.edu; nakauchi@stanford.edu

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Cas9-induced break using the AAV6-delivered DNA as a template. Additional insertions are concatemers
This approach has been used to make both small changes and large As is commonplace, the ddPCR analysis in Fig. 1e was performed with
insertions in the genome of cells in vitro and in vivo10. The safety restriction enzymes to fragment the genomic DNA in a controlled man-
of AAV is well studied as it has been used in over 100 clinical trials ner. When rerunning the reaction without fragmentation, the number
worldwide11 for multiple purposes, including gene correction and of detected additional insertions was greatly diminished in most of the
enzyme replacement therapy. subclones (Extended Data Fig. 1i). Because ddPCR is similar to limiting
The use of AAV6 to deliver a repair template has boosted gene dilution assays, linked regions of the genome partition together in the
knockin rates approximately 100-fold compared to plasmid transfec- same droplet when the DNA is not fragmented. Therefore, concate-
tion in various cell types. However, we found inconsistent results when meric knockins would be counted only once in non-fragmented DNA
comparing PCR-based genotyping methods, which raised questions (Extended Data Fig. 1j,k). This suggests that the additional insertions are
about the fidelity of our targeted gene modifications. After developing concatemeric in all subclones except number 12 (Extended Data Fig. 1i).
new strategies to detect more complex and unexpected genotypes, our Southern blots can measure the size of large regions of DNA with-
preliminary data suggested that half of the gene-edited cells contained out PCR amplification. Because the concatemers were not readily
target-site concatemeric knockins of the AAV genome. Although other detected by PCR, 11 subclones with various ddPCR genotypes were
groups reported anomalous insertions of AAV into Cas9 cut sites, those further expanded and analyzed by Southern blot (Fig. 1f and Extended
analyses used PCR preamplification12,13. In contrast, the highly abun- Data Fig. 1l,m). The Southern blot revealed that many samples had
dant concatemeric knockins that we describe here have a distinct struc- larger knockin bands at regular intervals. As expected, most sam-
ture from previously reported anomalies, cannot be readily detected ples had only one or two bands; however, sample 12 displayed a third
by common PCR genotyping strategies and are more abundant than band of irregular size, which corresponds to the off-target insertion
other precision gene editing errors. In preclinical studies, concatemeric detected by ddPCR. The Southern blot analysis was 100% concordant
knockins could obscure conclusions and hinder reproducibility. In with the ddPCR allele counting strategy. Collectively, these data sug-
clinical studies, they could produce genetic knockouts or other unex- gest that concatemeric knockins frequently occur when using Cas9/
pected phenotypes, yielding unintended and potentially dangerous AAV-mediated genome editing. These concatemers are difficult to
outcomes. A comprehensive analysis of the frequency and potential detect by classic PCR genotyping strategies, but they are revealed by
consequences of concatemeric knockins has never been performed Southern blots and ddPCR after controlled genomic fragmentation
because these anomalies and their abundance are not readily quanti- (Fig. 1g). Because Southern blots generally require multiple days of
fied by conventional methods. preparation and consume more than 1,000-fold more starting mate-
In the present study, to improve Cas9/AAV6-mediated gene edit- rial than ddPCR, ddPCR analyses were used for subsequent genotyp-
ing, we (1) developed and validated detection strategies to quantify ing. Repeat experiments were performed at the AAVS1 locus and the
complex genotypes while showing why AAV-mediated concatemeric genotype analyzed by PCR and ddPCR. Results similar to those at the
insertions are not detected by common analytical methods; (2) report CD14 locus were seen at the AAVS1 locus, with concatemeric insertions
here the frequency of concatemeric insertions at multiple loci of human occurring in 39% of CD14 subclones and 36% of AAVS1 subclones (Fig. 1h
pluripotent stem cells (PSCs) and hematopoietic stem and progeni- and Extended Data Fig. 2).
tor cells (HSPCs); and (3) developed an approach to greatly decrease
the prevalence of concatemeric knockins, with minimal change to Concatemers contain ordered repeats of full vector genome
established preclinical and clinical gene editing pipelines. Using our The band sizes on the CD14 Southern blot were measured to deter-
optimized method, we decreased the prevalence of unwanted con- mine the length of the concatemers (Fig. 2a). Bands were spaced at
catemeric knockins by approximately 90%, with a minimal change in concise intervals, with an average distance of 2.6 kb between groups
knockin rate. of concatemeric bands. Because this is the same size as the full AAV6
vector genome used to deliver the knockin template, multiple inser-
Results tions could result from linked vector genomes as shown in Fig. 2b.
Additional insertions detected by digital PCR PCR was attempted across the potential head-to-tail concatemeric
Because Cas9 RNP electroporation followed by AAV6 transduction junction (Extended Data Fig. 3a–c). Indeed, bands appeared only in
enables high-efficiency editing, knockins can be created without samples that contained concatemeric knockins. However, the PCR was
selection, and purified populations can be isolated by subcloning inefficient and resulted in two weak bands (Fig. 2c). Sanger sequenc-
and genotyping (Fig. 1a). Employing this method, CRE was knocked ing revealed that the bottom band was missing the inverted terminal
in to human induced PSCs (iPSCs) at the 3′ end of the CD14 locus, and repeats (ITRs) entirely, containing a perfect linkage of opposing left
36 subclones were expanded. DNA was extracted from the subclones and right homology arms. Sanger sequencing failed on the upper band,
and analyzed by three common PCR methods (Fig. 1b,c and Extended so it was sequenced with nanopore technology instead. This revealed
Data Fig. 1a–d) to determine if the knockins were biallelic, monoallelic a unique chimeric ITR composed of two back-to-back ITRs missing
or wild-type. Three-primer in–out PCR consistently reported the same their distal ends (Extended Data Fig. 3d and Supplementary File 1).
genotype when analyzing either the left (PCR-L) or the right (PCR-R) Reamplification of gel-extracted upper and lower bands resulted in
side of the editing site. However, two-primer PCR, which is designed only the lower bands, suggesting that the lower band is an artifact of
to amplify the full region on both sides of the editing site (PCR-F), PCR that occurs with an increasing number of PCR cycles. No bands
was not concordant with PCR-L and PCR-R. In eight of 36 samples, were seen when using primer combinations to detect head-to-head or
PCR-F was missing the upper (KI: knockin) band shown in PCR-L and tail-to-tail concatemers (Extended Data Fig. 3b,c). Collectively, these
PCR-R (Fig. 1d and Extended Data Fig. 1e). To further elucidate the data suggest that (1) concatemeric knockins are joined by the viral
genotype, an alternate method was developed, which uses droplet vector ITRs, which creates a distinct chimeric ITR; (2) PCR amplifica-
digital PCR (ddPCR) to count wild-type alleles (no KI) and CRE (KI) tion and subsequent sequencing across the chimeric ITR is difficult;
alleles (Extended Data Fig. 1f,g). Because each cell contains two cop- and (3) the concatemers exist primarily in a head-to-tail orientation.
ies of the CD14 locus, the sum of no KI and KI alleles should equal two We next sought to test the regularity of the concatemeric knock-
per cell. This strategy found that more than half of the edited cells ins. Because the full-length concatemeric KI cannot be readily PCR
contained additional copies of CRE at integer intervals (Fig. 1e and amplified and directly sequenced, ddPCR was used to detect the stoi-
Extended Data Fig. 1h,i). Additional copies were detected in all samples chiometry of DNA segments corresponding to regions of the vector
with discordant PCR analyses. genome. For each additional CRE, there was also an additional left

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

a
Knockin Subclone Genotype

Cas9 WT Mono Bi

Locus: LHA RHA


No No No
KI KI KI KI KI KI

AAV6: LHA Insert RHA

b PCR-L (left) PCR-R (right) PCR-F (full) c

o
F R2 F2 R F R

on
T
W

Bi
M
KI:
KI
F R1 F1 R F R
No No KI
KI:

d S9 S11 S26 S10 S8 S13 S17 g

R
R
L

PC
R-
R-

R-
PC
PC

PC

dd

SB
PCR-L KI
9
No KI

Individual subclones
1
2
4
6
PCR-R KI 7
15
No KI 18
19
20
WT Mono Bi Mono Bi Mono Bi 22
23
25
29
KI 32
PCR-F 35
10
No KI 13
21
WT Mono Bi WT Bi WT ? 30
31
11
5
e 34
12
26
5 3
Extra copies/cell

33
KI (CRE) 8
27
Total copies/cell

4 2 36
3
No KI 28
17
3 1 24
16
14
2 0
WT Mono Bi
1
Irregularity
0
WT Mono Bi Mono Bi Mono Bi
+1 +1 +2 +3
h CD14CRE AAVS1Ubc-CRE
100%
f
KI + 2 80%
KI + 1 60%

KI 40%
39% 36%
20%
No KI
0%
WT Mono Bi Mono Bi Mono Bi
L

L
R

R
R

R
F

F
R-

R-
PC

PC
R-

R-
R-

R-

+1 +1 +2 +3
PC

PC
PC

PC
PC

PC
dd

dd

Fig. 1 | Concatemeric knockins detected by digital PCR. a, Pipeline schematic bottom row. Red indicates samples with discordance between PCR-L/PCR-R and
for Cas9/AAV-mediated gene editing in PSCs. AAV6 vector ITRs are indicated PCR-F. e, Digital PCR genotype of select samples indicated in d. Two copies per
as hairpins in orange. Bi, biallelic knockin; Mono, monoallelic knockin; WT, cell are expected (left axis). Extra copies are shown on the right axis, interpreted
wild-type (no knockins). b, Schematic showing three PCR genotyping strategies. genotype at bottom. + indicates additional insertions. f, Southern blot of select
PCR-L = three-primer in–out PCR performed spanning the left homology arm. samples indicated in d. g, Comparison of PCR-L, PCR-R, PCR-F, ddPCR and
PCR-R = three-primer in–out PCR performed spanning the right homology Southern blot genotype after knocking in CRE into the CD14 locus in human PSCs.
arm. PCR-F = two-primer PCR with primers located outside of both homology Each row indicates an individual sample; sample number is indicated on the left.
arms. c, Schematic showing expected gel electrophoresis banding patterns and Red indicates unexpected result due to incorrect band size, missing information
corresponding genotypes from PCR reactions in b. KI, band corresponding to (for example, no band) or additional insertions. SB, Southern blot; WT, wild-
knockin allele. No KI, band corresponding to allele without a knockin. d, PCR type. h, Summary of genotype frequencies detected by different methods after
genotype of select subclones after knocking in CRE in the CD14 locus in human knocking in CRE into the CD14 locus (left) and Ubc-CRE into the AAVS1 locus
PSCs. Sample number is indicated on the top row, interpreted genotype on the (right). Bar colors are the same as g.

homology arm and right homology arm detected at a 1:1 ratio (Fig. 2d and sample 14, which had an end-joining mediated on-target knockin
and Extended Data Fig. 3e,f). This was true for all samples except sample verified by PCR and sequencing. ddPCR failed to detect the ITR in all
12, which had a partial off-target insertion verified by Southern blot, samples except sample 14 (Extended Data Fig. 3i; ITR −RE), suggesting

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

a b Linked viral genomes d 10


16 1.46 kb 2.55 kb LHA

Extra LHA, RHA, ITR KI (copies/cell)


14 LHA CRE RHA LHA CRE RHA RHA
KI+4 8
+3.2 kb ITR
12
Genomic locus
SB band size (kb)

KI+3
10 ~0.8 kb 6
+2.6 kb Concatemeric knockin
8 KI+2 CRE RHA LHA CRE
+2.6 kb 4
6
KI+1
c WT M M + Off M + 1 M + 2 Bi Bi + 1 Bi + 2 Bi + 3 Bi + 4 Bi + 5
4 +2.4 kb
0.85 kb
KI 0.65 2
2 +1.4 kb 0.50
No KI 0.40
0 S9 S11 S12 S10 S13 S26 S8 S36 S17 S27 S24 L
1 2 3 4 5
Extra CRE KI (copies/cell)
e
Dual knockin Sort & subclone Analyze
Cas9 Bi Mono+ h
mCherry

GFP = 1 GFP = 4 GFP = 1 GFP = 1


mCh = 5 mCh = 2 mCh = 8 mCh = 3
LHA Ubc-GFP RHA KI KI KI + 1 No KI

High gate
LHA Ubc-mCh RHA
GFP
GFP = 2 GFP = 4 GFP = 3 GFP = 2
f 5 g mCh = 3 mCh = 4 mCh = 1 mCh = 5
10
n = 48
Colonies w/ concatemers

100%

4
80%
10
GFP = 1 GFP = 1 GFP = 1 GFP = 1
60%
mCh = 1 mCh = 1 mCh = 1 mCh = 1
High gate
40% Low gate
10
3
(13.3%)
n = 24
Low gate 20%
(7.7%) GFP = 1 GFP = 1 GFP = 1 GFP = 1
0% mCh = 1 mCh = 1 mCh = 1 mCh = 1
mCherry

0 High
Highgate
gate Low
Lowgate
gate
GFP GFP or mCh mCh

0
3
10 10
4
10
5
GFP mCh
GFP

Fig. 2 | Characterization of concatemeric knockins. a, Consolidated plot of all additional insertions of CRE per cell. The y axis indicates ITRs per cell (yellow),
Southern blot (SB) band sizes. Samples are 11 selected subclones from CRE knockin additional left homology arms per cell (LHA, purple) and additional right
to the CD14 locus in human PSCs. Average size of each group is indicated by dashed homology arms per cell (RHA, green). e, Schematic showing dual knockin
lines. Red band indicates off-target insertion in subclone 12. b, Schematic showing and double selection. f, Flow cytometry plot of dual knockin as indicated in e.
viral vector genome size and theoretical arrangement of concatemeric knockin. Performed on human PSCs at the TET2 locus, 5 d after editing. g, Graph indicating
Red arrows indicate primers used for junction PCR in c. c, Junction PCR of 11 subclones with at least one concatemeric knockin. Subclones were sorted and
selected samples (same as a). Primers span concatemer linkage site. Sample ID is expanded from high gate and low gate indicated in f (n = 48 high gate; n = 24 low
shown on bottom. Red indicates that sample had off-target insertion as indicated gate). Blue indicates that the subclone had at least one concatemeric knockin with
by ddPCR. Genotype (ddPCR) is listed on top. B, biallelic knockin; M, monoallelic GFP and mCherry inserted in the same allele (GFP and mCherry linkage). Gray
knockin; WT, wild-type (no knockins). + indicates the number of additional indicates all other concatemeric knockins. h, Representative fluorescent images
concatemeric insertions. +Off indicates off-target insertion(s). d, Plot comparing of GFP and mCherry in subclones from f. All 16 images were processed identically.
the copy number of concatemer segments measured by ddPCR. Eleven subclones GFP and mCherry copies per cell indicated on image (detected by ddPCR).
were analyzed (same as c) after restriction enzyme digestions. The x axis indicates Gray scale bar (lower right) indicates 200 µm.

that the chimeric ITR formed in concatemeric knockins is more PCR Concatemers affect gene expression
resistant than the ITR found in end-joining mediated knockins that If ITRs are the drivers of concatemerization, then concatemeric knock-
were previously reported12–14. ins should occur between two viral vector particles carrying different
Because PCR inefficiencies can be due to nearby DNA secondary DNA sequences. To test this, a dual knockin of mCherry and GFP was
structures, ITR quantification was repeated with restriction enzymes attempted in iPSCs at the TET2 locus (Fig. 2e). Five days after editing,
that cut within the chimeric ITR, adjacent to the amplicon (Extended the cells were analyzed and sorted by flow cytometry for subcloning
Data Fig. 2d). This strategy led to more efficient qPCR and ddPCR detec- (Fig. 2f). The single-positive quadrants (that is, mCherry only and GFP
tion of the ITR (Extended Data Fig. 3g–i), revealing that two ITR regions only) appeared bimodal, which likely corresponds to a monoallelic or
exist for each additional insertion of CRE (Fig. 2d), which agrees with biallelic knockin of a single color. The double-positive quadrant had a
the schematic shown in Fig. 2b. To ensure that the detected ITRs were large, polydisperse region. This heterogenous gene expression could
not episomal, a ddPCR linkage analysis was performed to determine be due to concatemeric knockins. Twenty-four colonies were sorted
whether the genomic locus was linked to the ITRs (Extended Data and expanded from the dimmer, more homogeneous cluster (Fig. 2f,
Fig. 3j). All loci with concatemers had ITRs linked to one or both CD14 low gate), and 48 colonies were sorted and expanded from the brighter,
alleles (Extended Data Fig. 3k). These data prove that the concatemeric polydisperse cluster (Fig. 2f, high gate). ddPCR linkage analysis was
knockins have ITR sequences that are integrated into the genomic used to analyze the genotype (Extended Data Fig. 4a,b), revealing
locus, and these integrated chimeric ITRs are hidden without careful that 46 of 48 high gate subclones contained concatemers (Fig. 2g). In
restriction enzyme digestion. contrast, only four of 24 low gate subclones contained concatemers.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

a d NS NS e
Concatemer Locus Cell line n
120% ** *
LHA Insert RHA LHA Insert RHA TET2 PT-iPSC 64

Relative KI efficiency
100%
AAVS1 PT-iPSC 79
Genomic locus
80%
AAVS1 H9 57
60%
LHA Insert RHA LHA Insert RHA
AAVS1 WT-iPSC 58
40%
Delinked monomers
20% HBB PT-iPSC 65

b Internal ITR cut (IC) Distal cut (DC) 0% HBB H9 67


NC IC DC NC IC DC
Virus: LHA Insert RHA LHA Insert RHA Positive Double-pos. RUNX1 H9 68
cells cells
Cas9
Cas9
Cas9
f g
Locus: 100% **** 60%
*** Positive cells
50% KI alleles w/
59%
c

Colonies w/ concatemers
80% concatemer
5
No cut (NC) ITR cut (IC) Distal cut (DC)
10 40%
15.7% 25.5% 16.2% 17.3% 19.9% 16.1%
60%

10
4 30%

40%
13% 20%
3
10
4.8%
20%
mCherry

10%

0 31.8% 27.0% 35.2% 31.3% 38.4% 25.5% 0% 0%


3 4 5 3 4 5 3 4 5 NC IC DC 3,162 1,000 316 100 32 10
0 10 10 10 0 10 10 10 0 10 10 10
GFP MOI

Fig. 3 | Frequency and prevention of concatemers in human PSCs. analysis of concatemer frequency after double knockin (determined by ddPCR).
a, Schematic showing delinking of AAV vector genomes. b, Schematic showing two Comparison of NC, IC and DC methods. Loci and cell lines shown in e. 16–28
methods to remove vector ITRs after transduction using Cas9. c, Flow cytometry colonies were analyzed per datapoint; n = 458 total. Mean is indicated above the
plot 5 d after double knockin of Ubc-GFP and Ubc-mCherry into TET2 locus in bar in bold. *** and **** indicate P = 2 × 10−6 and P = 2 × 10−7, respectively. P values are
PT-iPSCs. d, Change in knockin efficiency as determined by flow cytometry. IC and from a two-sided t-test. g, Bulk analysis of Ubc-GFP knocked in to HBB locus in H9
DC are shown relative to NC. n = 7 (seven test groups shown in e). ** and * indicate embryonic stem cells (ESCs) at various vector dilutions. GFP+ cells (determined by
P = 0.0008 and P = 0.01, respectively. NS, no significance (P > 0.05). P values flow cytometry) are indicated by the gray line. The percent of KI alleles that contain
are from a two-sided paired t-test. e, Table of different loci and cell lines tested; a concatemeric insertion (determined by ddPCR) is indicated by the red line. Error
legend for f. n indicates the number of subclones analyzed in each group. f, Clonal bars indicate 1 s.d.; center indicates the mean.

Fluorescence imaging revealed that high gate subclones were brighter the same time as the RNP that cuts the target gene. For the second
and had more well-to-well fluorescence variability than the low gate strategy, the 23-bp target sequence recognized by the RNP that cuts
subclones, which was made more apparent with uniform image process- the genomic locus was inserted into the vector genome on both ends
ing of the entire set of subclones (Fig. 2h and Extended Data Fig. 4c). flanking the homology arms, such that the gene-targeting RNP would
The mean fluorescence intensities (MFIs) correlated strongly with the also make two distal cuts (DCs) in the vector genome and remove the
total number of insertions (Extended Data Fig. 4d,e). Linkage analysis ITRs. IC and DC methods were compared to the original method (no
revealed that most of the low gate subclones with concatemers had a cut (NC)) in three PSC lines and four loci, including the clinically rel-
monoallelic knockin genotype, such that both mCherry and GFP were evant HBB locus. For all conditions, double-fluorescent knockins were
inserted into a single allele, whereas the other allele did not contain a selected as previously shown in Fig. 2e. Flow cytometry revealed a
knockin. This is particularly problematic for gene editing strategies clear decrease in polydispersity when using IC or DC methodology,
that employ dual knockins to achieve a selectable biallelic genotype particularly in the double-positive quadrant (Fig. 3c and Extended
because some cells will still contain an unedited allele. Overall, 50 of Data Fig. 5a–h). The overall knockin rate (that is, percent of fluorescent
72 subclones had concatemeric knockins, of which 21 of 50 had at least cells) for NC ranged from 27% to 79%, which is similar to previously
one mCherry and one GFP linked in the same allele. These data show published values5. IC and DC treatments resulted in minor decreases
that concatemeric knockins can also occur between vector particles in average overall editing efficiencies by 19% and 9%, respectively;
carrying different DNA. Notably, concatemeric knockins may greatly however, the reduction in the double-positive cells was more pro-
change the level of gene expression. nounced (42% and 23% decreases, respectively; Fig. 3d and Extended
Data Fig. 5i–k). From each condition (Fig. 3e), 16–28 subclones (458
ITR removal prevents concatemers in PSCs total) were sorted and expanded from the double-positive population.
If the ITRs are driving the concatemeric knockins, then removal of ITRs ddPCR genotyping revealed that an average of 59% of the subclones
should result in more monomeric knockins (Fig. 3a). Two strategies in the NC group had concatemeric knockins (Fig. 3f). This decreased
were designed that use Cas9 RNPs to remove the viral vector ITRs after to 13.2% and 4.8% for the IC and DC groups, respectively. These data
transduction with minimal modification to gene editing pipelines. For show that concatemeric knockins occur at high frequency in human
the first strategy, a single guide RNA (sgRNA) was designed to make PSCs, regardless of the cell line or locus. Notably, post-transduction
an internal cut (IC) within the ITR at an endogenous PAM site (Fig. 3b). removal of the viral vector ITRs significantly reduces the frequency
This sgRNA is complexed with Cas9 and electroporated into cells at of concatemeric knockins.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Rate of concatemers unaffected by viral vector concentration Only a subset of the CD34+ HSPCs can enable long-term recon-
It is known that the multiplicity of AAV infection (MOI) will affect stitution of the blood after bone marrow transplantation. Therefore,
knockin rate8, but how MOI affects the rate of concatemeric knockins it is necessary to determine whether concatemeric insertions occur
is unknown. For this analysis, Ubc-GFP was knocked in to the HBB locus in the functional stem cells or just the progenitor cells. To test this,
at various MOI dilutions ranging from approximately 3,000 to 10. In Ubc-GFP was knocked in to human CD34+ cells at the HBB locus, and
this case, MOI refers to the number of vector genomes (determined by GFP+ cells were transplanted into immunodeficient mice a few days
ddPCR titer) added per cell. To avoid the high sampling error that can later (Fig. 4a). After 4 months, bone marrow was collected from the
be prevalent in small-population-size clonal analyses, a ddPCR linkage mice, and the human cells were isolated by flow cytometry. Because
analysis strategy was developed to detect concatemeric alleles in a transplanted blood progenitor cells generally exhaust after 12 weeks
bulk, mixed population (Extended Data Fig. 6a,b). in mice, the remaining human cells were likely derived from engrafted
As expected, flow cytometry revealed that the percent of posi- human HSCs16. Bulk ddPCR analysis revealed that an average of 32% of
tive cells decreased from 56% to 2% as MOI was reduced (Fig. 3g the edited human alleles had concatemeric knockins in the four mice
and Extended Data Fig. 6c). The positive cells were then sorted and from the NC group. This dropped to below 5% for the DC group (Fig. 4e).
expanded in each MOI group without subcloning. Bulk allele analy- Notably, the average engraftment rate of edited cells was 0.22% and
sis showed that approximately 35% of the knockin alleles were con- 2.5% for the NC and DC groups, respectively, suggesting that the DC
catemeric at all MOIs (Fig. 3g and Extended Data Fig. 6d–h). This was approach did not impair HSC engraftment (Extended Data Fig. 8m).
repeated in male PSCs at the IL2RG locus, located on the X chromosome, Collectively, these data demonstrate that Cas9/AAV6-mediated con-
which yielded similar results (Extended Data Fig. 6i–m). In both cases, catemeric knockins also occur at high frequency in human HSCs. Like
the concatemeric insertions contained an average of 3–4 repeats at all in iPSCs, this can be minimized by post-transduction ITR removal.
MOIs. Collectively, these data suggest that the concatemeric knockin
rate is constant in human PSCs, regardless of locus, cell line, MOI or Discussion
number of genomic target sites. If an allele has a knockin, there is a 35% Here we report an analytical blind spot in the detection of repeated
chance that it is concatemeric. Thus, the theoretical frequency of at genetic sequences, particularly when linked by a region containing com-
least one concatemeric knockin occurring in a biallelically edited PSC plex secondary structures. Our ddPCR assays found that approximately
is 58%. This matches our empirical result from Fig. 3f. 35% of Cas9/AAV6-mediated KI alleles contain at least one hidden
concatemeric insertion, regardless of loci or cell type. We developed
ITR removal does not increase non-homologous end-joining drop-in-ready methods that decrease the number of these abnormal
knockin gene modifications by approximately 10-fold.
ITR removal changes the structure of the vector genome, potentially All data were generated using human HSCs14 and human PSCs17
altering its dynamics regarding the frequency of non-homologous end because of their clinical relevance for cell and gene therapy. HSCs give
joining (NHEJ)-mediated insertions. This could cause imprecise knock- rise to all lineages of blood cells and can be collected from a patient’s
ins. To analyze this, in–out PCR was used to check for NHEJ-mediated blood or bone marrow. The HSCs can then be genetically modified and
knockins at the TET2 locus in iPSC subclones with seemingly correct transplanted back into the patient to cure conditions such as sickle
ddPCR genotypes (that is, biallelic knockins without concatemers). All cell disease3. In contrast, iPSCs can be derived from many different tis-
samples except for one subclone from the IC group had normal band sues, including skin or blood. iPSCs can be expanded indefinitely, ena-
sizes (Extended Data Fig. 7a–d). This indicates that NHEJ-mediated bling numerous and extensive genetic modifications in vitro. Because
knockin events are uncommon in all groups, and subcloning is not iPSCs resemble one of the earliest lineages of cells during embryonic
an efficient method to measure their frequencies. As an alternative, development, patient-derived iPSCs can be differentiated into other
flow cytometry was used to measure NHEJ-mediated insertions by cell types and transplanted back into the patient. This has been a pro-
repeating HBB and IL2RG knockin experiments while intentionally posed treatment for cystic fibrosis and epidermolysis bullosa1,2. Cas9/
mismatching the Cas9 RNP genomic target sites and vector homol- AAV6-mediated gene editing has been performed on both HSCs and
ogy arms. When making a cut at the HBB locus, a vector that carried iPSCs with remarkably high efficiencies5,6,8, and it is used in other clini-
Ubc-GFP with homology arms targeting the IL2RG locus was added. The cally relevant cell types, including CAR-T cells18.
reciprocal experiment was also performed (Extended Data Fig. 7e,f). Many common gene editing goals are susceptible to inconsist-
When cutting the HBB locus, insertions occurred in 0.53% and 0.50% ent gene modifications. When creating repairs or knockins in coding
of the cells in the NC and IC groups, respectively. When cutting the regions, concatemeric insertions could result in frameshift-induced
IL2RG locus, insertions occurred in 0.15% and 0.08% of the cells in the knockouts (Fig. 5a). A similar point is true if exchanging or adding
NC and IC groups, respectively. These data suggest that ITR removal exons, particularly at the 5′ end of the gene. In other cases, it is essential
does not increase the frequency of unwanted NHEJ-mediated knockins. to create biallelic, selectable knockouts to ensure 100% gene disrup-
tion. As previously published, this can be performed by knocking in two
Concatemers also occur in HSPCs different selectable markers into the target gene, one in each allele19.
CD34+ blood cells contain both hematopoietic stem and progenitor However, with concatemeric insertions, both selectable cassettes
cells. Because the stem cell population (HSCs) is difficult to enrich from could be knocked in to a single allele, whereas the other remains intact
HSPCs, HSPCs are the targets of multiple gene therapies15. To deter- (Fig. 5b). Concatemeric insertions can also cause problems related to
mine if concatemeric knockins occur in HSPCs, Cas9/AAV6-mediated inconsistent and high levels of gene expression. This could be critical
double-knockin experiments were repeated in human cord blood for transgenic safeguards, such as inducible caspase-9 (ref. 20), which
(CB)-derived CD34+ HSPCs. Because clonal HSPC expansion is ineffi- usually require addition of a small molecule to induce dimerization
cient and the DC method appeared more effective than the IC method, and initiate apoptosis. However, at high concentrations, inducible
only NC and DC groups were compared. When performing knockins caspase-9 can spontaneously dimerize and uncontrollably kill cells21.
at the TET2 and HBB loci, results were similar to those of PSCs, with an This is more likely to occur with concatemeric insertions because they
average of 58% and 7.0% of the subclones having concatemeric knockins often induce higher expression (Fig. 5c). In other cases, researchers
in the NC and DC groups, respectively. The experiment was repeated simultaneously knock in two different inserts into separate target
with male CB at the IL2RG locus (X chromosome), resulting in 38% and sites19. However, concatemerization could cause knockins to occur at
1.4% of colonies containing concatemeric knockins in the NC and DC the wrong locus (Fig. 5d), leading to numerous downstream problems.
groups, respectively (Fig. 4a–d and Extended Data Fig. 8a–l). A final concern pertains to the ITR itself, which has been shown to act as

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

a
CD34+ Sort Colony expansion Concatemer
HSPCs analysis

mCherry
3 weeks 5
4
3

GFP 2
1
0

Knockin

4 months
NC: LHA GFP, mCh RHA

DC: LHA GFP, mCh RHA Transplantation Human blood cells

b * * c
120% Locus KI Type n

TET2 GFP, mCh colony 42


100%
Relative KI efficiency

TET2 GFP, mCh colony 31


80%
HBB GFP, mCh colony 33
60%
HBB GFP, mCh colony 33
40%
IL2RG GFP colony 46
20%
IL2RG GFP colony 38

0% IL2RG GFP colony 34


NC DC NC DC
Positive Double-pos. HBB GFP Bulk (transplant) 4 mice
Positive
cells Double
cells pos.
HBB GFP Bulk (transplant) 4 mice

d TET2 & HBB colonies IL2RG colonies e Transplanted cells

80% 50%
*** * ***
40%
KI alleles w/ concatemer
Colonies w/ concatemer

60%

30%

40%

20%

20%
10%

0% 0%
NC DC NC DC NC DC

Fig. 4 | Frequency and prevention of concatemers in human HSPCs. knockin (right) as determined by ddPCR. Comparison of NC and DC methods.
a, Schematic showing strategy for analyzing Cas9/AAV-mediated concatemeric Loci and cell lines are shown in c. 15–23 colonies were analyzed per datapoint;
knockins in hSPCs. b, Change in knockin efficiency as determined by flow n = 258 total. *** and * indicate P = 0.0002 and P = 0.03, respectively. P values are
cytometry. DC is shown relative to NC. Test groups are shown in c. n = 9 for from a two-sided t-test. e, Bulk ddPCR concatemer analysis of recovered human
positive cells; n = 4 for double positive. * indicates P = 0.02. P values are from a cells 4 months after HSPC transplantation. Before transplantation, HSPCs had
two-sided paired t-test. c, Table of different loci and cell lines tested; legend for Ubc-GFP knocked in to the HBB locus. *** indicates P = 0.0004. P values are from a
d and e. n indicates the number of subclones or mice analyzed in each group. two-sided t-test. Error bars indicate 1 s.d.; center indicates the mean.
d, Clonal analysis of concatemer frequency after double knockin (left) and single

a transcriptional regulator22,23. Therefore, a concatemeric knockin may alternative method is needed. In the present study, we developed three
affect the expression of nearby genes, resulting in wildly unpredictable types of ddPCR assays that can be used to simultaneously genotype and
phenotypes (Fig. 5e). detect concatemeric knockins: (1) ‘allele counting’, (2) ‘linkage analy-
As we have shown, concatemeric knockins are difficult to detect sis’ and (3) ‘nuclease-mediated loss of linkage’. Each assay is modular
by classic PCR (Fig. 1d–h). ddPCR has been previously used to genotype and has strengths in different research settings. For example, allele
cells by adapting in–out PCR to quantify on-target knockins24. However, counting with fragmentation is best for subcloned cells with no epi-
in–out PCR must span a homology arm, resulting in long amplicons somal vector DNA, particularly if researchers want to perform only a
that can be difficult to multiplex and detect by ddPCR. Additionally, single run for easy screening. However, in bulk populations or samples
in–out PCR is not designed to detect concatemeric knockins, so an that contain episomal vector DNA, allele counting cannot be used to

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

a Gene correction concatemeric alleles are spread across a variety of sizes (for example,
Insert KI + 1, KI + 2, KI + 3, etc.), resulting in a higher proportion of wild-type
and monomeric KI alleles in bulk samples. Because subcloning can
Gene knockout be technically challenging, it is often avoided. Third, concatemeric
Insert ITR Insert alleles are bigger than monomeric KI alleles. This can lead to PCR bias
and allelic dropout26,27. Therefore, strategies that do not require PCR
b Selectable biallelic knockout
preamplification across full concatemers are needed, such as Southern
blot or ddPCR. However, obtaining enough DNA for Southern blot
Ubc-GFP
analysis can be challenging, particularly in cells that do not expand well,
Ubc-mCh
such as HSPCs. Fourth, even if using ddPCR methodology designed to
One allele intact count insertion numbers in subcloned cells, concatemeric knockins
will not be detected unless first separated by the addition of nucleases
Ubc-GFP ITR Ubc-mCh
(Extended Data Fig. 1j,k). The DNA must also be in proper form (for
example, double stranded and properly annealed) for the nucleases to
cut. Fifth, the ITR itself is considerably more difficult to detect by PCR
c Low gene expression in its linked, chimeric form (Extended Data Fig. 3g–i, sample 10). This
Ubc-iCasp9 is particularly misleading because we can easily amplify the ITR region
when tittering the vector in its single-stranded DNA form. Addition-
High gene expression
ally, the ITR can be readily detected when inserted into the genome
Ubc-iCasp9 ITR Ubc-iCasp9 ITR Ubc-iCasp9 ITR Ubc-iCasp9
in an NHEJ-mediated manner (Extended Data Fig. 3g–i, sample 14).
However, to detect the chimeric ITR, restriction enzymes must be
d Multi-site editing added to dismantle nearby secondary structures. Sixth, although
Insert 1 fluorescent concatemeric knockins can be detected by variable fluo-
Insert 2 rescence intensities, the log-scaled axis visually attenuates this affect
when analyzed by flow cytometry. The use of contour maps further
Insert into wrong site minimizes the polydispersity caused by concatemeric knockins. In
Insert 1 ITR Insert 2 ITR Insert 1 subclones, clear differences can be seen in the fluorescence intensities
Insert 2 of multiple samples only if they are imaged and processed identically
(Extended Data Fig. 4c). The difference is less noticeable if using com-
e Knockin
mon ‘auto-contrast’ options. For these reasons, the high frequency of
Cas9/AAV6-mediated concatemeric knockins has not been reported.
Insert
We considered multiple strategies to prevent concatemeric
Altered nearby gene expression
knockins, including decreasing the MOI and performing intracellular
Insert ITR Insert removal of ITRs. As expected, lowering the MOI greatly diminished
the number of KI cells and shifted most edited cells from biallelic to
monoallelic genotypes. Low MOIs had no effect on the rate of concate-
meric KIs. Mechanistically, this suggests that viral vector concatemeric
Fig. 5 | Abnormal phenotypes with concatemeric knockins. a, Concatemeric
linkages occur faster than genomic insertion. This could be explored
knockins often cause frameshift mutations (indicated by black stripes), resulting
in future experiments by altering the interval between vector trans-
in gene knockout. b, Use of two reporters to make selectable biallelic knockouts
fails if both reporters are inserted in one allele. c, Large concatemers inserted in a
duction and Cas9 transfection while measuring the concatemer rate
single site can increase gene expression to unwanted levels. d, Multi-site genome at various intervals.
editing can result in knockin of template in the wrong locus. e, Enhancer activity It is also possible that concatemers may form within the viral
of ITR can affect nearby gene expression. vector particles before transduction. However, all vector genomes
containing Ubc-GFP and Ubc-mCherry are more than 3,200 bp long.
A single concatemeric vector genome would have to be larger than
detect concatemeric knockins. In contrast, linkage analysis can detect 6,400 bp, which is difficult to package in AAV. Additionally, many
concatemeric knockins in subcloned and bulk populations, even if a concatemeric repeats are more than 3 units long and some more than
moderate amount of episomal vector DNA or off-target insertions are 10, suggesting that prepackaging is not the cause of concatemeric
present. However, shortly after transduction with AAV, the amount of knockins. Finally, experiments were performed using two viral vec-
episomal vector DNA is too high and may fall outside the limits of reli- tors from different preparations, in which approximately half of the
able ddPCR linkage quantitation. In this case, the nuclease-mediated concatemeric insertions contained DNA from both vectors knocked in
loss of linkage strategy can be employed to quantify knockins and to one allele (Fig. 2g). This suggests that linkage occurs after transduc-
concatemeric inserts (Extended Data Fig. 3m, bottom). Although not tion, and it is unbiased between vectors originating from the same or
extensively discussed here, this unique assay uses site-specific nucle- different preparations.
ases (for example, restriction enzymes) to detect target sequences Post-transduction removal of ITRs resulted in multiple advantages.
rather than using PCR amplicons. As such, nuclease-mediated loss of First, the inter-cluster polydispersity was decreased after intracellular
linkage is the best choice for genotyping and quantifying concatemeric removal of the ITRs. Previous research suggests that the ITR primes the
knockins shortly after adding AAV or in regions that are difficult to single-stranded AAV vector genome for second-strand synthesis28. It
amplify by PCR. is only after second-strand synthesis occurs that the vector genes are
Although the AAV genome is known to form linked episomal seg- expressed. Because MOI and second-strand synthesis may vary cell to
ments25, conventional analytical methods are unsuited to detecting cell, vector genes are expressed at varying levels leading to interclus-
them for several reasons. First, methods must be designed to consider ter polydispersity. This fades after the vector DNA is diluted through
detection of concatemeric insertions. In–out PCR, a more common numerous cell divisions. Because the DC method decreases the polydis-
genotyping method, will not detect concatemers. Second, concate- persity, it is possible that Cas9 removes the ITR before second-strand
meric knockins are easiest to detect in subclones. This is because the synthesis occurs, leaving a long single-stranded DNA template. This is

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

possible because Cas9 has been shown to cut single-stranded DNA29. 8. Charlesworth, C. T. et al. Priming human repopulating
In contrast, the IC method may cut the ITR in a hairpin, resulting in hematopoietic stem and progenitor cells for Cas9/sgRNA gene
a double-stranded break; however, we did not detect an increase in targeting. Mol. Ther. Nucleic Acids 12, 89–104 (2018).
NHEJ-mediated knockins. Regardless, the DC method may have more 9. Romero, Z. et al. Editing the sickle cell disease mutation in human
appeal because only a single RNP is used, minimizing the chance of hematopoietic stem cells: comparison of endonucleases and
additional off-target cuts in the genome. In all cases, NC, IC and DC homologous donor templates. Mol. Ther. 27, 1389–1406 (2019).
methods have similar concentrations of episomal vector DNA within 10. Zheng, Y. et al. Efficient in vivo homology-directed repair within
the cell 5 d after editing. This suggests that the decreased polydisper- cardiomyocytes. Circulation 145, 787–789 (2022).
sity after ITR removal is due to decreased episomal expression, which 11. Kuzmin, D. A. et al. The clinical landscape for AAV gene therapies.
ultimately leads to more defined clusters and fewer false positives. The Nat. Rev. Drug Discov. 20, 173–175 (2021).
second advantage of ITR removal is the clear decrease in concatemeric 12. Hanlon, K. S. et al. High levels of AAV vector integration into
knockins. Because these concatemers can create unexpected problems CRISPR-induced DNA breaks. Nat. Commun. 10, 4439 (2019).
(Fig. 5), their prevention is critical to ensuring reproducible, safe and 13. Nelson, C. E. et al. Long-term evaluation of AAV-CRISPR genome
high-quality gene modifications. However, ITR removal also resulted editing for Duchenne muscular dystrophy. Nat. Med. 25, 427–432
in a decrease in double-positive cells when performing dual-color (2019).
knockins. This is not surprising because cells in the double-positive 14. Koniali, L., Lederer, C. W. & Kleanthous, M. Therapy development
population also have the highest rate of concatemeric knockins. The by genome editing of hematopoietic stem cells. Cells 10, 1492
decrease in editing efficiency is minor compared to the increase in (2021).
bona fide knockins. 15. Haltalli, M. L. et al. Hematopoietic stem cell gene editing and
Gene-modified HSPCs can be transplanted to cure genetic dis- expansion: state-of-the-art technologies and recent applications.
eases. Here we show that our methods can be immediately applied to Exp. Hemat. 107, 9–13 (2022).
existing pipelines to decrease the concatemer rate. Engraftment rates 16. Notta, F. et al. Isolation of single human hematopoietic stem cells
suggest that ITR removal does not negatively affect HSC function and capable of long-term multilineage engraftment. Science 333,
may even enhance engraftment. The potential increase in engraftment 218–221 (2011).
efficiency could be due to reported genotoxic effects of the ITR30, which 17. Soldner, F. & Jaenisch, R. Stem cells, genome editing, and the
may be mitigated in the IC and DC methods. However, the disposition path to translational medicine. Cell 175, 615–632 (2018).
of the ITR fragment after cleavage is unclear, and more work is needed 18. Moço, P. D., Aharony, N. & Kamen, A. Adeno‐associated viral
to analyze the effect of ITR removal on HSC function. vectors for homology‐directed generation of CAR‐T cells.
Although we did not reach a 0% concatemer rate, a 10-fold decrease Biotechnol. J. 15, 1900286 (2020).
is an important first step. Future experiments could modify the ITR or 19. Bak, R. O. et al. Multiplexed genetic engineering of human
modulate DNA repair pathways to further decrease concatemeric hematopoietic stem and progenitor cells using CRISPR/Cas9 and
rate. Regardless, having available tools to properly assess a genotype AAV6. eLife 6, e27873 (2017).
is critical, particularly when selecting a clonally expanded population 20. Martin, R. M. et al. Improving the safety of human pluripotent
of edited cells or when developing safety profiles. stem cell therapies using genome-edited orthogonal safeguards.
Nat. Commun. 11, 2713 (2020).
Online content 21. Straathof, K. C. et al. An inducible caspase 9 safety switch for
Any methods, additional references, Nature Portfolio reporting sum- T-cell therapy. Blood 105, 4247–4254 (2005).
maries, source data, extended data, supplementary information, 22. Haberman, R. P., McCown, T. J. & Samulski, R. J. Novel
acknowledgements, peer review information; details of author contri- transcriptional regulatory signals in the adeno-associated virus
butions and competing interests; and statements of data and code avail- terminal repeat A/D junction element. J. Virol. 74, 8732–8739
ability are available at https://doi.org/10.1038/s41587-024-02171-w. (2000).
23. Flotte, T. R. et al. Expression of the cystic fibrosis transmembrane
References conductance regulator from a novel adeno-associated virus
1. Vaidyanathan, S., McCarra, M. & Desai, T. J. Lung stem cells and promoter. J. Biol. Chem. 268, 3781–3790 (1993).
therapy for cystic fibrosis. In Lung Stem Cells in Development, 24. Bak, R. O., Dever, D. P. & Porteus, M. H. CRISPR/Cas9 genome
Health and Disease (eds Nikolić, M. Z. & and Hoganheffield, B. L. M.) editing in human hematopoietic stem cells. Nat. Protoc. 13,
306–321 (European Respiratory Society, 2021). 358–376 (2018).
2. Itoh, M. et al. Footprint-free gene mutation correction in induced 25. Duan, D. et al. Circular intermediates of recombinant
pluripotent stem cell (iPSC) derived from recessive dystrophic adeno-associated virus have defined structural characteristics
epidermolysis bullosa (RDEB) using the CRISPR/Cas9 and responsible for long-term episomal persistence in muscle tissue.
piggyBac transposon system. J. Dermatol. Sci. 98, 163–172 (2020). J. Virol. 72, 8568–8577 (1998).
3. Wilkinson, A. C. et al. Cas9-AAV6 gene correction of beta-globin 26. Shestak, A. G. et al. Allelic dropout is a common phenomenon
in autologous HSCs improves sickle cell disease erythropoiesis in that reduces the diagnostic yield of PCR-based sequencing of
mice. Nat. Commun. 12, 686 (2021). targeted gene panels. Front. Genet. 12, 62033721 (2021).
4. Khalil, A. M. The genome editing revolution. J. Genet. Eng. 27. Kanagawa, T. Bias and artifacts in multitemplate polymerase chain
Biotechnol. 18, 68 (2020). reactions (PCR). J. Biosci. Bioeng. 96, 317–323 (2003).
5. Martin, R. M. et al. Highly efficient and marker-free genome 28. McCarty, D. et al. Adeno-associated virus terminal repeat (TR)
editing of human pluripotent stem cells by CRISPR–Cas9 RNP and mutant generates self-complementary vectors to overcome the
AAV6 donor-mediated homologous recombination. Cell Stem rate-limiting step to transduction in vivo. Gene Ther. 10, 2112–2118
Cell 24, 821–828 (2019). (2003).
6. Dever, D. P. et al. CRISPR/Cas9 β-globin gene targeting in human 29. Ma, E. et al. Single-stranded DNA cleavage by divergent CRISPR–
haematopoietic stem cells. Nature 539, 384–389 (2016). Cas9 enzymes. Mol. Cell 60, 398–407 (2015).
7. Gaj, T. et al. Targeted gene knock-in by homology-directed 30. Ferrari, S. et al. Choice of template delivery mitigates the
genome editing using Cas9 ribonucleoprotein and AAV donor genotoxic risk and adverse impact of editing in human
delivery. Nucleic Acids Res. 45, e98 (2017). hematopoietic stem cells. Cell Stem Cell 29, 1428–1444 (2022).

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Publisher’s note Springer Nature remains neutral with the author(s) or other rightsholder(s); author self-archiving of the
regard to jurisdictional claims in published maps and accepted manuscript version of this article is solely governed by the
institutional affiliations. terms of such publishing agreement and applicable law.

Springer Nature or its licensor (e.g. a society or other partner) holds © The Author(s), under exclusive licence to Springer Nature America,
exclusive rights to this article under a publishing agreement with Inc. 2024

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Methods overnight incubation, cells were washed. For iPSCs/ESCs, Y-27632


Ethical use of stem cells dihydrochloride was added to media 48 h after electroporation.
All experiments were performed in accordance with ethical recommen-
dations put forth by the International Society for Stem Cell Research Flow cytometry and subcloning
in the 2021 Guidelines for Stem Cell Research and Clinical Translation. Cells were washed and stained with propidium iodide at a final concen-
Oversight was conducted and approved by Stanford University’s tration of 1 μg ml−1 or DAPI at a final concentration of 0.1 μg ml−1 right
Research Compliance Office and the corresponding Stem Cell Research before analysis or sorting. Cell analysis and cell sorting were performed
Oversight panel. on a FACSAria II (BD Biosciences). To subclone edited PSCs, 5 d after
electroporation, DAPI-negative and fluorescence-positive cells ( just
PSC lines and maintenance DAPI-negative cells for no-marker vector in CD14 P2A-Cre and AAVS1
Four human PSC lines were used in this study. Patient-derived iPSCs UBC-CreERT2) were sorted into 96-well plates. After 7 d, the subclones
(PT-iPSC, iSU223n)31,32, H9 embryonic stem cells (ESC, WiCell), H1 ESCs were further expanded in 12-well plates. Edited PSCs that were not sub-
(WiCell) and wild-type iPSCs (WT-iPSC, SUN004.1)33 and their derivative cloned (data in Fig. 3g and Extended Data Figs. 6 and 7f) were sorted
subclones were routinely propagated feeder free in StemFit Basic02 5 d after electroporation into 12-well plates, followed by a second sort
medium supplemented with 20 ng ml−1 bFGF or Basic04 complete type approximately 2 weeks later to ensure purity. For edited CD34+ cells,
(Ajinomoto) on cell culture plastics with iMatrix-511 (Nippi) basement 3 d after electroporation, DAPI-negative and fluorescence-positive cells
membrane matrix, with single-cell dissociation by using TrypLE Express were sorted into 150 μl of myeloid differentiation media consisting of
Enzyme (Gibco). Y-27632 dihydrochloride (Tocris) was added for 24 h MyeloCult H5100 (STEMCELL Technologies) supplemented with SCF
after passage unless otherwise indicated. PT-iPSCs were used in Figs. 1 (20 ng ml−1), TPO (20 ng ml−1), Flt3-Ligand (20 ng ml−1), IL-6 (20 ng ml−1),
and 2 and Extended Data Figs. 1–4 and 7. In all other cases, specific PSC IL-3 (20 ng ml−1), GM-CSF (20 ng ml−1) and G-CSF (20 ng ml−1). Approxi-
lines are indicated in the text and figures. mately 3 weeks later, large fluorescent colonies were used for subse-
quent analysis.
AAV vector production
AAV vector plasmids were cloned in the pAAV-MCS plasmid containing DNA extraction
ITRs from AAV serotype 2 (AAV2)19. CD14 vector contained P2A and Cre. For ddPCR, qPCR and standard PCR analysis, DNA was extracted using a
AAVS1 vectors contained a UBC promoter and CreERT or fluorescence crude lysis buffer containing 0.1% SDS, 5 mM EDTA, 0.2 mg ml−1 protein-
reporter genes, such as GFP or mCherry. HBB, TET2, RUNX1 and IL2RG ase K (Thermo Fisher Scientific), 15 mM tris (pH 8.2) and 100–200 mM
vectors contained UBC promoters and fluorescence reporter genes, NaCl. For PSCs, 100 µl of lysis buffer was added to approximately 1 × 105
such as GFP or mCherry. The homology arms for CD14, AAVS1, TET2 and to 1 × 106 pelleted cells, followed by incubation at 55 °C for 10 min and
IL2RG were approximately 400 bp (379–414 bp), and the left and right heat inactivation at 80 °C for 10 min. For HSPCs, 35 µl of lysis buffer
homology arms for HBB were 537 bp and 420 bp, respectively. Vectors was added to approximately 5 × 103 to 1 × 105 cells, followed by the same
with DC sites for ITR removal contained the same sequences as genomic heating protocol. All samples were homogenized by pippette tritura-
sgRNA target sequences, with the external PAM (ExP) facing inward tion until smooth, and debris was pelleted through centrifugation at
between the ITR and homology arm. AAV6 particles were produced 6,000g for 3 min.
in 293FT cells (Thermo Fisher Scientific) transfected using standard For Southern blots, genomic DNA was purified from approxi-
PEI transfection with ITR-containing plasmids and pDGM6 containing mately 1 × 107 cells using multiple columns from the DNeasy Blood
the AAV6 cap genes, AAV2 rep genes and adenovirus serotype 5 helper and Tissue Kit (Qiagen) and following the recommended protocol
genes. Particles were harvested after 72 h, purified using the AAVpro with RNase A.
Purification Kit (Takara Bio) according to the manufacturerʼs instruc-
tions and then stored at −80 °C until further use. Viral vector titers CB processing
measured as vector genomes per microliter (vg/µl) were determined Frozen mononuclear cells derived from cord blood (CB) were pur-
by ddPCR analysis of serially diluted vector samples using primers and chased from the New York Blood Center. CD34+ HSPCs were isolated
probes that target the vector ITR (Supplementary Table 1). using a human CD34 MicroBead Kit (Miltenyi Biotec). CD34+ HSPCs
were cultured in HSPC expansion media consisting of StemSpan SFEM
Electroporation and transduction of cells II (STEMCELL Technologies) supplemented with SCF (20 ng ml−1),
All synthetic sgRNAs and Cas9 protein were purchased from Inte- TPO (20 ngm l−1), Flt3-Ligand (20 ng ml−1), IL-6 (20 ng ml−1) and UM171
grated DNA Technologies. The genomic sgRNA target sequences (35 nM). Cells were cultured at 37 °C, 5% CO2 and 5% O2.
with PAM were: HBB, 5′-CTTGCCCCACAGGGCAGTAACGG-3′;
IL2RG, 5′-TGGTAATGATGGCTTCAACATGG-3′; RUNX1, 5′-TACCTT HSPC transplantation
GAAAGCGATGGGCAGGG-3′; AAVS1, 5′-GGGGCCACTAGGGACAG Three days after electroporation/transduction, GFP+ edited CD34+ cells
GATTGG-3′; TET2, 5′-TCATGGAGCATGTACTACAATGG-3′; and CD14, were sorted (2 × 105 to 3 × 105 cells) and transplanted into one femur
5′-CTAGCGCTCCGAGATGCATGTGG-3′. of sub-lethally irradiated mice (200 rad, 2–24 h before transplant).
Cas9 RNP was made by incubating protein with sgRNA at a Four months after transplantation, the human cell-engrafted mice
molar ratio of 1:2.5 at 25 °C for 10 min immediately before elec- were euthanized, and all bone marrow was harvested by crushing
troporation into CD34+ HSPCs or iPSCs/ESCs, with a final concentra- the bones. Non-specific antibody binding was blocked (human and
tion of Cas9 at 150 ng µl−1 during electroporation. For experiments mouse Fc block, BD Biosciences) and stained (30 min, 4 °C, dark) with
employing the IC method of ITR removal, sgRNA of ITR sequence PE-Cy7-conjugated anti-mouse CD45.1 antibody (A20, Invitrogen),
(5′-GCGCGCTCGCTCGCTCACTGAGG-3′) was separately incubated PE-Cy5-conjugated anti-mouse TER-119 antibody (TER-119, Invitrogen),
with Cas9 and added at the same final concentration in addition to the V450-conjugated anti-human CD45 antibody (HI30, BD Horizon) and
other RNP. Both CD34+ HSPCs and iPSCs/ESCs were electroporated PE-conjugated anti-HLA-ABC antibody (W6/32, eBioscience) and ana-
using the Lonza Nucleofector 4D (program DZ-100 for CD34+ HSPCs lyzed by flow cytometry. For cell sorting, the engrafted human cells were
and program CB-150 for iPSCs/ESCs). Immediately after electropo- isolated using human CD45 MicroBeads (Miltenyi Biotec). The enriched
ration, AAV6 donor vectors were added at an MOI (vector genomes human cells were stained (30 min, 4 °C, dark) with V450-conjugated
per cell) of 10,000 for CD34+ HSPCs, 5,000 for PT-iPSCs and 1,000 anti-human CD45 antibody and PE-conjugated anti-HLA-ABC antibody,
for WT-iPSCs/ESCs, unless indicated otherwise in the figures. After isolated by flow cytometry and analyzed by digital PCR. Antibodies

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

were diluted in accordance with the manufacturerʼs recommendations Fig. 1b. Three-primer in–out PCR can be used to analyze either the left
with fewer than 1 × 106 cells per 50 µl. side (PCR-L) or the right side (PCR-R) of the knockin site. Two-primer
PCR spans the full knockin site (PCR-F). Both methods result in an
Mice and animal care expected banding pattern shown in Fig. 1c. SeqAmp DNA Polymerase
All mouse experiments were conducted according to a protocol (Takara Bio) was used for PCR amplification. All PCR reactions were
approved by the Institutional Animal Care and Use Committee (Stanford run following the manufacturerʼs recommendations at a final volume
Administrative Panel on Laboratory Animal Care, no. 22264) and of 20 µl with 0.5–1 µl of extracted DNA and 0.25 µM of each primer. All
adhering to the National Institutes of Health’s Guide for the Care and reactions were loaded on a 1% agarose gel in TAE mixed with 1:10,000
Use of Laboratory Animals. Six- to 8-week-old male or female NOD. SYBR Safe (APExBio). Two microliters of TrackIt 1 Kb Plus (Invitrogen)
Cg-Prkdcscid Il2rgtm1Wjl/SzJ (NSG) mice were used (The Jackson Labo- or GeneRuler 1 kb (Thermo Fisher Scientific) was used for ladders. PCR
ratory). All mice were housed in a pathogen-free animal facility in microi- product was diluted 1:6 with loading buffer, and 5–10 µl was loaded
solator cages, and the experimental protocol was approved by Stanford in each lane. In–out PCR was also used to analyze GFP and mCherry
University’s Administrative Panel on Lab Animal Care (no. 22264). knockins at the TET2 locus in Extended Data Fig. 7. In this case, only
two primers were used for PCR-L and PCR-R because all samples had a
Southern blot genotyping biallelic concatemer-free genotype. All primer sequences are listed in
Southern blot analysis was performed to genotype the CD14 locus after Supplementary Table 1.
knocking in CRE in PT-iPSCs. Standard methods for Southern blot with
genomic DNA were applied. In brief, 10 µg of each DNA sample was Junction-spanning PCR and sequencing
subjected to restriction digest with 200 units of NdeI (New England PCR was used to amplify a product spanning the concatemer junction
Biolabs) and 200 units of BlpI (New England Biolabs). Swiss-Webster as shown in Extended Data Fig. 3a using similar PCR conditions
albino mouse genomic DNA (Promega) was used as a negative control. as described above. Samples were the same as those analyzed by
For preparation of size standards, 6.6 × 106 copies of a 22-kb plasmid Southern blot. Then, 10 µl of PCR product was analyzed by gel elec-
that contained the right homology arm CD14 sequence was spiked into trophoresis. For sequencing, upper and lower bands were cut from
1.5 µg of mouse gDNA and digested in a volume of 100 µl with various the gel and purified with a NucleoSpin Gel and PCR Clean-Up Kit
combinations of enzymes to yield the desired fragment size. A total (Machery-Nagel). The bands were submitted for Sanger sequencing
of seven different size standard digests were performed separately (Elim Biopharmaceuticals) and Primordium Labs’ nanopore-based
and pooled only after addition of phenol-chloroform-isoamyl alcohol sequencing (Primordium).
(Invitrogen). Fragments were phenol-chloroform purified following
the overnight restriction digest and precipitated using one volume of qPCR
2-propanol after the addition of 1/10 volume of 3 M sodium acetate and ITR qPCR was performed on a QuantStudio 7 Flex Real-Time PCR System
5 µl of GlycoBlue (15 mg ml−1, Invitrogen). After an overnight incubation (Applied Biosystems) using TaqMan Fast Advanced Master Mix (Applied
at −20 °C, fragments were pelleted by centrifugation at 4 °C, washed Biosystems); final reaction volume was 10 µl in a 384-well plate. All
with 70% ethanol, air dried and resuspended in 40 µl of Tris-EDTA buffer samples were analyzed with ITR and CD14-LHA primer/probe sets
with gel loading buffer (Blue Juice, Invitrogen). Samples were loaded (listed in Supplementary Table 1) at a final concentration of 0.45 µM
onto a 10 × 6-inch 0.8% agarose gel using TAE as running buffer. The and 0.125 µM for primers and probes, respectively. The analyzed human
gel was run at 28 V for 24 h, incubated with denaturating buffer (3 M genomic DNA was extracted from PT-iPSCs with CRE knocked in to CD14
NaCl, 0.4 M NaOH) twice for 30 min with agitation and incubated in (sample 10 and sample 14 in Extended Data Fig. 1). Samples were diluted
transfer buffer (3 M NaCl, 8 mM EDTA) for 15 min with agitation. The gel to similar molar concentrations (final concentration of ~300 copies of
was then blotted overnight onto a positively charged nylon membrane CD14-LHA per microliter as determined by ddPCR without restriction
(Roche) using a piece of Whatman paper serving as a wick to transfer enzymes). Both samples were analyzed with and without 2.5 units of
the buffer and, thus, the DNA from the gel onto the membrane by the the restriction enzyme AhdI (New England Biolabs) added directly to
means of capillary action. After crosslinking (GS GeneLinker, Bio-Rad), the qPCR mixture. Thermocycler conditions were run in accordance
the membrane was pre-hybridized with 10 µg ml−1 salmon sperm DNA with the manufacturerʼs recommendations with the addition of an
in PerfectHyb Plus Hybridization buffer (Sigma-Aldrich) for a minimum initial 37 °C × 5-min incubation.
of 2 h at 60 °C with rotation. The probe was generated by amplification
of the 400-bp-long right homology arm sequence from the rAAV vector Microscopy
using primers RHA-F (5′-GCGTGGTCCCAGCCTGTGC-3′) and RHA-R Cells were imaged using the Operetta High Content Imaging System
(5′-GCAGCCCTAGCCAGGAGTC-3′). This fragment encompasses part (PerkinElmer) with GFP and mCherry fluorescent filter sets. Harmony
of the CD14 coding sequence, the 3′ UTR and some intergenic sequence version 3.5.2 (PerkinElmer) was used to export microscopy data. ImageJ
downstream of the CD14 gene. The amplicon was gel purified, and 10 ng 1.52p (National Institutes of Health) was used to subtract background
was labeled with [a-32P] dCTP using the BcaBEST Labeling Kit (Takara and measure the average GFP and mCherry fluorescent intensity within
Bio) according to the manufacturer’s instructions. Unincorporated the cell colonies. In Extended Data Fig. 4c, images are displayed using
nucleotides were removed with an Illustra MicroSpin G-25 column, and individualized or uniform imaging processing. For individualized
the probe was added to the pre-hybridized membrane. Hybridization image processing, auto-brightness-&-contrast was used to set display
was allowed to occur for 2–3 d at 60 °C with rotation. The membrane thresholds before converting and combining images. For uniform pro-
was washed twice under low-stringent conditions (2× SSC, 20 min at cessing, display thresholds were set to the same values for all images.
room temperature), followed by one wash under high-stringent con- Uniform processing was also used in Fig. 2h.
ditions (2× SSC with 0.1% SDS, 30 min at 60 °C). The membrane was
exposed onto a phosphoimager screen and visualized using a Personal Polydispersity in flow cytometry quadrants
Molecular Imager (Bio-Rad). Band sizes were determined by interpola- Polydispersity in flow cytometry plots was measured as relative change
tion between the log of adjacent size standards. in average absolute deviations. In brief, the four centroids of the cell
cluster in each quadrant were calculated separately, using GFP as the
PCR genotyping x axis and mCherry as the y axis. The average distance of each cell to
Two types of PCR genotyping strategies were used: three-primer in–out the respective centroid was then measured, and IC and DC measure-
PCR24 and two-primer PCR. Approximate primer locations are shown in ments were normalized to the control (NC). This was repeated for the

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

seven groups shown in Extended Data Fig. 5a–g, and the average of all To determine the total number of insertions after performing a
groups was plotted in Extended Data Fig. 5h. All measurements were knockin, this assay must be run with restriction enzymes that separate
performed with linear (not log-transformed) data points. concatemeric knockins (Extended Data Fig. 1j). When using a primer/
probe set to target the KI gene segment, extra insertions are indicated
Digital PCR genotyping: concept by copies per cell greater than 0, 1 or 2 for wild-type, monoallelic or bial-
Three methods of digital PCR genotyping were developed. Each lelic genotypes, respectively. However, if reanalyzed without restriction
method can detect the basic genotype (that is, wild-type, monoallelic enzymes, the number of measured insertions per cell should decrease
and biallelic) in subcloned cells, whereas the methods have varying if concatemeric insertions exist (Extended Data Fig. 1k). This assay can
capabilities to detect other more complex genotypes, such as concate- be used in this way to measure the number of concatemeric inserts per
meric knockins. All assays use relatively short amplicons (<150 bp), cell in subcloned samples.
which allows efficient multiplexing and shorter run times. Up to four Compared to the linkage analysis assay discussed below, the
primer/probe sets are multiplexed per reaction. Because only two allele counting assay has two advantages. First, the linkage analysis
channels are used (FAM and HEX), amplitude multiplexing34 is neces- assay requires at least two runs (that is, with and without restriction
sary and results in two-dimensional plots, as shown in Extended Data enzymes) to determine genotype and the total number of insertions.
Fig. 1g. Samples are often analyzed multiple times with various restric- In contrast, the allele counting assay requires only a single run (that
tion enzymes added directly to the ddPCR reaction. These restriction is, with restriction enzyme); thus, it works well as a quick screen to
enzymes are necessary and play direct roles in determining complex select subcloned cell lines with intended genotypes. Second, the
genotypes, as discussed below. references designed for this assay work in trans and can be located
anywhere on the genome. Therefore, they can be recycled for other
Allele counting. This genotyping strategy is a copy number variation assays.
(CNV) analysis, which measures the concentration of various genetic
segments and compares their ratios to determine copies per cell and Linkage analysis. This strategy compares the linkage of two or more
corresponding genotype. Four primer/probe sets are multiplexed gene segments to determine genotypes. Linkage can be determined by
(Extended Data Fig. 1f): assessing the number of digital PCR droplets that are positive for two
• Ref1 and Ref2: The first two primer/probe sets target two differ- or more gene segments targeted by different primer/probe sets. Three
ent reference sites. These serve both as a reference for overall to four primer/probe sets are multiplexed (Extended Data Fig. 4a):
cell number and a quantitation control. Concentrations of the • Ref1 and Ref2: The first two primer/probe sets target two dif-
two references should always measure at a 2:2 ratio (2 is used ferent cis reference sites that flank both sides of the intended
because most reference genes exist twice per cell). If the ratio is genomic knockin site. These primer/probe sets amplify short
off by more than 10%, there may be problems, such as abnormal regions located 0–1,000 bp outside the homology arms and
karyotypes, DNA degradation or improper gating, and repeats should be linked if DNA degradation has not occurred. Similar
should be considered. The two references can be in cis or in trans to the allele counting strategy described above, these primer/
with the knockin site. probe sets serve as references for overall cell numbers and
• No_KI: The third primer/probe set is designed to span the quantitation controls. Because these primer/probe sets are in cis
genomic cut site (that is, knockin site), such that the primers and and are nearby the editing site, these references also serve two
probe are at least 25 bp from the cut. This primer/probe set will additional purposes. First, digital PCR linkage analysis can be
detect the wild-type allele or alleles with indels, but it will fail if used to determine if these references are linked to knockin gene
there is a large knockin due to the increased distance between segments to determine genotypes. Second, the two references
the forward and reverse primers. This primer/probe set is used serve as linkage controls to ensure that the DNA is not consider-
to determine if a knockin is wild-type, monoallelic or biallelic. ably fragmented or degraded after purification, which would
It is important to ensure that larger Cas9-induced deletions diminish the measured linkage of nearby gene segments. We
(>25 bp) are infrequent at the measured loci before employing find that this works best if the two references have more than
this method. 75% linkage.
• KI: The fourth primer/probe set is designed to target the knockin • KI: Another primer/probe set is designed to target knockin
sequence, homology arm or any other part of the vector genome sequences or other parts of the vector genome, depending on
depending on the purpose. This primer/probe set can be used the purpose. Linkage analysis can be used to determine the
to determine if a knockin is concatemeric, off-target or NHEJ percentage of references that are linked to KI and infer complex
mediated. genotypes.

The concentration of each target and corresponding copies per Linkage analyses are not bidirectionally equivalent. For example,
cell are calculated as shown below. in the case of a monoallelic knockin of GFP into a human autosome,
100% of GFP will be linked to Ref1 (written as GFP → Ref1 = 100%), but
C = (ln(ntotal ) − ln(nneg ))/0.00085
only 50% of Ref1 will be linked to GFP (written as Ref1 → GFP = 50%).
Additionally, because Ref2 → GFP should be similar to Ref1 → GFP, the
C’ = C/(((CRef 1 /aRef 1 ) + (CRef2 /aRef2 ))/2) average is usually used to determine Ref → GFP.
To determine the basic genotype, this assay must not be run with
C: volumetric concentration (copies/µl) restriction enzymes that cut within the knockin sequence or homol-
C’: cellular concentration (copies/cell) ogy arms. Restriction enzymes that cut outside of the reference sites
ntotal: total number of ddPCR droplets are acceptable, and they may be necessary to homogenize DNA and
nneg: number of ddPCR droplets that are negative for a given minimize digital PCR rain. However, this assay should be run a second
measured target time for all samples with restriction enzymes that separate potential
a: alleles per cell for a given locus (usually 1 or 2) concatemers to determine the total number of insertions as deter-
The constant 0.00085 indicates the average droplet volume in mined by allele counting CNV (Extended Data Fig. 4b).
microliters used in our ddPCR equipment. a = 2 for most experiments. The calculation for determining the percent linkage of
a = 1 for reactions that use IL2RG references in male cells. Targ1 → Targ2 is:

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

%LTarg1→Targ2 = (1− ((CTarg1_or_Targ2 −CTarg2 ) /CTarg1 ) ) ∗ 100 Thermocycler conditions: 95 °C × 10 min; 50 cycles of 94 °C × 30 s and
60 °C × 60 s; 98 °C × 10 min; and hold at 4 °C. Cluster identification
was performed manually with QuantaSoft Analysis Pro version 1.0.596
%L: percent linkage (Bio-Rad). Cluster information was exported into Microsoft Excel for
C: volumetric concentration (copies/µl) further analysis. Forward primer, reverse primer and probe were at a
CTarg1_or_Targ2: determined by treating any droplet positive for Targ1 3.6:3.6:1 ratio. Primer and probe sequences are listed in Supplementary
and/or Targ2 as positive Table 1. Restriction enzymes and multiplexing parameters are listed in
Linkage analysis has multiple advantages over allele counting. Supplementary Table 2. Raw digital PCR cluster information is listed in
First, it is easier to design because there is more latitude for the cis Supplementary Table 3. NC/IC/DC PSC subclone genotypes and NC/DC
reference primer/probe sets than there is for the No_KI primer/probe HSPC subclone genotypes are summarized in Supplementary Table 4.
sets. Second, because the No_KI primer/probe set is not needed, it can Terminology:
be replaced by another primer/probe set. This is particularly useful • Insert: DNA sequence corresponding to a part of the vector
when analyzing double knockins, such as GFP and mCherry, because genome (for example, CRE, GFP, mCherry, etc.); the number of
both targets and references can be multiplexed in one reaction. Third, measured insertions may change if analyzed with and without
large deletions that would disrupt the No_KI primer/probe set would restriction enzymes that separate concatemers
be less disruptive because the references are located hundreds of • KI: an allele at the target locus (for example, CD14, AAVS1, etc.)
base pairs away. Fourth, episomal DNA or off-target insertions can be that contains at least one insert
distinguished from on-target insertions by measuring linkage between • No KI: an allele at the target locus that does not contain any
cis references and insert sequences. This is a unique property of this inserts
assay and is particularly useful if long-range PCR and Southern blots • On-target insert: insert that is knocked into target locus
are not practical. • Off-target insert: insert that exists in the cell but not at the target
Linkage analysis can also be used to detect ITRs or other viral site
vector regions linked to genomic loci, enabling the quantification • Extra insert: number of insertions greater than 1 or 2 for monoal-
of %-alleles-with-concatemeric-KI. Therefore, linkage analysis can lelic and biallelic knockins, respectively
measure the allelic abundance of concatemers in bulk edited samples • Concatemeric insert: extra on-target inserts
without subcloning. This would not work well with ITR CNV because • +/−RE: indicates an analysis is performed with (+) or without
the CNV alone does not distinguish between on-target and off-target (−) restriction enzymes (New England Biolabs) that cut sites
knockins. Finally, when coupled with restriction enzymes that cut only to separate concatemers (some −RE runs may have restriction
one side of inserts, linkage analysis can determine orientations and/or enzymes added to enhance ddPCR; however, they will not cut
sequences of multiple knockins. any sequence within the editing site nor any sequence flanked by
cis references)
Nuclease-mediated loss of linkage. This strategy employs nucleases
(that is, restriction enzymes, Cas9, etc.) to recognize and quantify gene To analyze genotype after knocking in CRE at the CD14 locus in
segments. Cis reference primer/probe sets are designed for linkage human iPSCs (results shown in Fig. 1e,g and Extended Data Fig. 1h,i),
analysis as described above. The ddPCR reaction is then run with and the ddPCR allele counting strategy was used. Two trans references were
without nucleases that target unique sites within knockins or other multiplexed with a CD14_No_KI primer/probe set and a CRE primer/
parts of the vector genome. If the recognition site exists between the probe set. The reaction was run with and without a restriction enzyme
reference primer/probe sets, linkage between the two reference sites that separates concatemers. C′CD14_No_KI rounding to 0, 1 or 2 was scored
will be lost. This can be quantified by the equation below: as biallelic, monoallelic and wild-type, respectively. Extra insertions
were determined by the equation:
%allelesTarget_KI = (1− (%Lwith_nuclease /%Lwithout_nuclease ) ) ∗ 100
C’Extra_CRE = C’CRE_+RE − (2 − C’CD14_No_KI )
This strategy has three unique advantages. First, only two primer/
probe sets are used, lowering cost and complexity. Instead of KI or Similar primers/probes were used to analyze genotypes after
No_KI primer/probe sets, nucleases such as restriction enzymes are knocking in Ubc-CRE at the AAVS1 locus in human iPSCs (results shown
used, which many laboratories readily have. Second, PCR-resistant in Fig. 1h and Extended Data Fig. 2b,c). However, the AAVS1_No_KI reac-
sequences, such as sequences with inhibiting secondary structures, do tion did not run efficiently and required addition of the BspEI restric-
not have to be efficiently amplified by primer/probe sets to be detected. tion enzyme. This enzyme does not cut within the editing site, so it was
Third, the assay is agnostic to episomal vector DNA and can, therefore, added to all reactions using this primer/probe set to eliminate rain. The
detect knockins or concatemers shortly after editing without waiting reaction was also run with and without another restriction enzyme
for episomal DNA to dissipate. Although linkage analysis can also distin- that separates concatemers. Genotyping was performed similarly to
guish between episomal and on-target sequences, ddPCR reactions can the CD14 locus.
easily be overloaded and pushed outside of their dynamic range when To count the number of LHA and RHA gene segments after knock-
too many episomal copies of DNA exist, such as shortly after editing. ing in CRE at the CD14 locus in human iPSCs (results shown in Fig. 2d
and Extended Data Fig. 3e,f), the ddPCR allele counting strategy was
Digital PCR genotyping: procedure used. Two CD14 cis references were multiplexed with primer/probe sets
Each ddPCR reaction was prepared and analyzed with a QX200 targeting LHA and RHA. The reaction was run +/−RE. Extra LHA and RHA
ddPCR system (Bio-Rad) and ddPCR Supermix for Probes (No dUTP) insertions were accounted for by subtracting 2 from their measured
per Bio-Rad’s standard recommendations, unless otherwise stated. copies per cell in the +RE reaction. Vector ITRs inserted into the genome
QuantaSoft version 1.6.6.0320 (Bio-Rad) was used to operate the were also quantified. These reactions were run with the same two CD14
machine. All reactions were mixed to 25 µl and contained up to 1–5 µl cis references, a CRE primer/probe set and a primer/probe set targeting
of DNA. Then, 2.5–5 U of restriction enzyme was frequently added, the vector ITR region. The reaction was run three times (with no RE, with
and digestion was performed immediately before droplet generation AhdI and with MseI and AhdI) and analyzed by allele counting (Fig. 2d
at 37 °C × 5 min. For droplet generation, 20 µl was loaded into the and Extended Data Fig. 3i). To determine if concatemeric knockins
droplet generator cassette in groups of eight per Bio-Rad’s protocol. occurred in one or two alleles, the dataset was analyzed by linkage

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

analysis. Specifically, CD14_Ref1 → ITR and CD14_Ref2 → ITR were less than 100 bp were inserted flanking the homology arms, as shown
measured on the samples run with AhdI (Extended Data Fig. 3k, upper in Extended Data Fig. 6a. Primer/probes sets were designed to detect
and middle row). Genomic ITR insertions were further validated with these two sequences, termed ID1 and ID2. The bulk samples were first
nucleus-mediated loss of linkage, by analyzing CD14_Ref1 → CD14_Ref2 analyzed −RE with cis reference primer/probe sets and the GFP primer/
on the same +AhdI dataset (Extended Data Fig. 3k, bottom row). This probe set and then reanalyzed with the No_KI primer/probe set. These
method showed that sample 14 had at least one ITR inserted into the datasets were used to calculate percent KI alleles using both linkage
target site, but its orientation (that is, lack of linkage to references when analysis and allele counting:
adding AhdI) suggests that it was an NHEJ-mediated insertion. This is
%KI = %LCorr_Avg_Ref→GFP
further supported by allele counting, which shows that the sample has
an additional LHA and RHA but only one copy of CRE.
All subclones (both PSCs and HSPCs) with knockins of %KI = (1 − (CNo_KI / ((CRef 1 + CRef2 )/2)) ) ∗ 100
Ubc-mCherry and/or Ubc-GFP were genotyped by ddPCR linkage
analysis. Each loci had two cis references designed as described above,
which were multiplexed with Ubc-mCherry and Ubc-GFP primer/probe To detect on-target concatemeric knockins, the samples were
sets. Each reaction was run with and without restriction enzymes that analyzed −RE with the cis references and ID primer/probe sets. The
separate concatemeric knockins. For basic genotyping calculations, percent of KI alleles with concatemeric insertions was determined by:
linkage analysis was performed by pooling Ubc-mCherry and Ubc-GFP
clusters into a single ‘insert’ population. Because linkage analysis is %LAvg_Ref→ID = (%LRef 1→ID1_or_ID2 + %LRef2→ID1_or_ID2 ) /2
unidirectional, the average linkages were determined by:
%LCorr_Avg_Ref→ID = (%LAvg_Ref→ID /%LAvg_Ref→Ref ) ∗ 100
%LAvg_Ref→Ref = (%LRef 1→Ref2 + %LRef2→Ref 1 )/2

%KIConcat = (%LCorr_Avg_Ref→ID /%KI) ∗ 100


%LAvg_Ref→Ins = (%LRef 1→Ins + %LRef2→Ins )/2

A correction can be made for DNA degradation: To determine the average size of concatemers, the samples were
run with the cis references and GFP primer/probe sets +RE. The average
%LCorr_Avg_Ref→Ins = %LAvg_Ref→Ins /%LAvg_Ref→Ref ∗ 100
concatemer size (including the first insert) was calculated by:

Avg_concat_size = ((C’GFP_+RE − C’GFP_−RE )/(%KIConcat /100))+1


KI and No KI alleles were calculated from the digital PCR run −RE:

C’KI =%LCorr_Avg_Ref→Ins ∗a/100

Statistics and graphs


C’No_KI = a − C’KI ddPCR one-dimensional and two-dimensional plots were generated
Basic genotypes were determined by rounding C′No_KI to the nearest with QuantaSoft Analysis Pro version 1.0.596 (Bio-Rad). Flow cytom-
integer. 0, 1 and 2 indicate biallelic, monoallelic and wild-type geno- etry graphs were generated with FlowJo version 10.8 (BD Biosciences).
types, respectively. All other graphs were generated in Microsoft Excel and PowerPoint.
All runs were reanalyzed with +RE, and the total copies per cell Linear regressions were measured in Microsoft Excel. Significance
of GFP and mCherry were calculated as described in allele counting. was calculated with two-sided t-tests using Microsoft Excel. For paired
Runs that did not have the following quality parameters were excluded: t-tests, the following function was used:
total droplet number >8,000; the concentration of Ref1 is no more p-value = T.TEST(array1,array2,2,1)
than 10% different than the concentration of Ref2; the average concen- For unpaired t-tests, the following function was used:
tration of Ref1 and Ref2 is >12 copies per microliter; %LAvg_Ref→Ref >75% p-value = T.TEST(array1,array2,2,2)
without fragmenting restriction enzyme and <15% with fragmenting
restriction enzyme; and the copies per microliter of GFP and mCherry Reporting summary
without fragmentation is not >20% the copies per microliter of GFP and Further information on research design is available in the Nature
mCherry with restriction enzyme. Portfolio Reporting Summary linked to this article.
If both runs (+/−RE) passed quality control, more complex geno-
types were determined. Off-target and concatemeric insertions were Data availability
determined by rounding each value in the following equations to the All data generated or analyzed during this study are included in the
nearest integer: published article and its supplementary information files. Regarding
digital PCR data, raw individual cluster information is included in Sup-
C’Off−targ_Ins = C’Ins_−RE − C’KI
plementary Table 3 in accordance with the 2020 guidelines regarding
the minimum information necessary for publication of quantitative
C’Concat_Ins = C’Ins_+RE − C’KI − C’Off−targ_Ins digital PCR experiments.

Linkage between GFP and mCherry was determined to be true if the References
sum of %LGFP→mCh and %LmCh→GFP was more than 20% when analyzed −RE. 31. Nishimura, T. et al. Sufficiency for inducible Caspase-9 safety
For bulk genotyping, Ubc-GFP was knocked in to the target locus, switch in human pluripotent stem cells and disease cells. Gene
and DNA was extracted from a group of GFP+ cells for analysis. This Ther. 27, 525–534 (2020).
is distinct from previous genotyping analyses, which had all been 32. Chao, M. P. et al. Human AML-iPSCs reacquire leukemic properties
performed on DNA extracted from a pure, clonal population. Bulk after differentiation and model clonal variation of disease. Cell
genotyping was performed on the vector dilution series in PSCs Stem Cell 20, 329–344 (2017).
(Fig. 3g and Extended Data Fig. 6), and the human cells recovered 33. Ang, L. T. et al. Generating human artery and vein cells from
after HSPC xenotransplantation (Fig. 4e). For these experiments, the pluripotent stem cells highlights the arterial tropism of Nipah and
vector backbone was modified such that two unique DNA sequences Hendra viruses. Cell 185, 2523–2541 (2022).

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

34. Whale, A. S., Huggett, J. F. & Tzonev, S. Fundamentals of Southern blot. A.C.F. and Toshinobu Nishimura cloned plasmids. C.T.C.,
multiplexing with digital PCR. Biomol. Detect. Quantif. 10, J.B., Toshiya Nishimura and A.C.W. analyzed data. H.N., R.M., M.A.K.
15–23 (2016). and F.P.S. acquired funding and supervised the experiments. H.N. and
R.M. contributed equally. F.P.S. wrote the manuscript, and all authors
Acknowledgements edited and approved the final manuscript.
We thank K. C. Chan, M. Rivera, F. Zhao and S. Homma for laboratory
and administrative support. We thank S. Tsuji for scientific advice. Competing interests
This work was supported by grants from the National Institutes of R.M. is on the advisory boards of Kodikaz Therapeutic Solutions,
Health (NIH) (R01DK121851, H.N.; R21OD030009, H.N.; R21OD030529, Orbital Therapeutics, Pheast Therapeutics and 858 Therapeutics.
H.N. and R.M.; and R01HL064274, M.A.K.). F.P.S. received mentorship R.M. is a co-founder of and equity holder in Pheast Therapeutics,
and financial support from Stanford’s SPARK Translational Research MyeloGene and Orbital Therapeutics. H.N. is a co-founder of and
Program; the Stanford Clinical and Translation Science Award shareholder in Megakaryon, Century Therapeutics and Celaid
to Spectrum (UL1TR003142); the National Science Foundation Therapeutics. The remaining authors declare no competing interests.
Graduate Research Fellowship; and the Pat Tillman Fellowship.
D.K. was supported by the Japan Society for the Promotion of Science Additional information
(JP21J01690) and the Osamu Hayaishi Memorial Scholarship for Study Extended data is available for this paper at
Abroad. A.C.F. is supported by the Stanford Graduate Fellowship, https://doi.org/10.1038/s41587-024-02171-w.
the National Science Foundation Graduate Research Fellowship
Program and the Stanford Lieberman Fellowship. Toshiya Nishimura Supplementary information The online version contains supplementary
was supported by the Japan Society for the Promotion of Science material available at https://doi.org/10.1038/s41587-024-02171-w.
(JP18K14602 and JP18J00499). A.C.W. was supported by the NIH
(K99HL150218), the Leukemia & Lymphoma Society (3385-19) and the Correspondence and requests for materials should be addressed to
Edward P. Evans Foundation. J.B. was supported by the International Fabian P. Suchy, Ravindra Majeti or Hiromitsu Nakauchi.
Postdoc Grant from the Swedish Research Council (2017-00344) and
by the Assar Gabrielsson Foundation. Peer review information Nature Biotechnology thanks Casey Maguire
and the other, anonymous, reviewer(s) for their contribution to the
Author contributions peer review of this work.
F.P.S. and D.K. conceived the research, performed most experiments
and analyzed data. Y.N. performed flow cytometry and HSPC Reprints and permissions information is available at
transplant. M.H., J.Z. and I.H. performed ddPCR. K.P. performed the www.nature.com/reprints.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 1 | See next page for caption.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 1 | Comparison of PCR, ddPCR, and Southern blot for FAM indicate probe color. High and Low indicate concentration of probe (used
genotyping subcloned colonies after AAV6/Cas9-mediated knockin. for ddPCR amplitude multiplexing), resulting in high or low clusters shown in
a, Schematic showing location of primers in 3-primer in-out PCR. Primers panel g. g, Representative two-dimensional ddPCR plot of reaction shown in
span left-homology arm. b, Schematic showing location of primers in 3-primer panel f. h, References from ddPCR genotyping the 36 samples shown in panel
in-out PCR. Primers span right-homology arm. c, Schematic showing location e. Average of the two references set to 2 copies/cell. Sample # shown at bottom.
of primers in 2-primer PCR genotyping. Primers span entire editing site. i, ddPCR genotyping results. Normalized to references shown in panel h. The
d, Theoretical gel electrophoresis after genotyping as shown in panel a-c. reaction was run +/− restriction enzyme (RE) as shown in panel f. Interpreted
WT = wildtype; Mono = monoallelic knockin; Bi = biallelic knockin. e, Gel genotype indicated by circle at bottom graph: white = WT; gray = monoallelic;
electrophoresis from PCR genotyping. 36 subclones expanded after knocking black = biallelic; red = additional insertions. j, Schematic indicating concatemeric
in CRE into CD14 locus in PT-iPSCs. Ladder size shown in kb. Expected band knockin. Without restriction enzyme, concatemeric inserts will be linked.
sizes shown in panel a-c. Sample number indicated at top and bottom of gels: k, Schematic indicating ddPCR droplet partitioning. Without restriction enzyme,
L = ladder; N = no-template-control. Interpreted genotype indicated by circle linked concatemeric inserts will partition in the same droplet and be counted
at top of each gel: white = WT; gray = monoallelic; black = biallelic; red = cannot only once. l, Schematic showing Southern blot genotyping strategy. RE1 and
be determined. Top, middle, and bottom row of gel indicate PCR-L, PCR-R, and RE2 indicate two different restriction enzymes. Probe designed to bind right
PCR-F genotyping strategies, respectively. f, Schematic showing ddPCR allele homology arm. m, Southern blot of 11 select subclones. Size references
counting strategy for genotyping. Four ddPCR targets are multiplexed in a single indicated on left in kb. Expected sizes shown in panel l. * indicates off-target
well for each sample. Primers indicated by arrows, and probes indicated by small insertion. Interpreted genotype indicated by circle at top of blot: white = WT;
rectangles flanked by primers. Ref-1 and Ref-2 indicate trans references used gray = monoallelic; black = biallelic; red = additional insertions.
to determine overall cell number. RE indicates restriction enzymes. HEX and

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 2 | Comparison of PCR and ddPCR genotyping at AAVS1 b, References from ddPCR. Average of the two references set to 2 copies/
locus. a, Gel electrophoresis from PCR genotyping. 36 subclones expanded cell. Sample ID shown at bottom. c, ddPCR genotyping results. Normalized
after knocking in Ubc-CRE into AAVS1 locus in PT-iPSCs. Ladder size shown in to references shown in panel b. The reaction was run +/− restriction enzyme
kb. Sample number indicated at top and bottom of gels; L = ladder. Interpreted (RE). Interpreted genotype indicated by circle at bottom graph: white = WT;
genotype indicated by circle at top of each gel: white = WT; gray = monoallelic; gray = monoallelic; black = biallelic; red = additional insertions or cannot
black = biallelic; red = cannot be determined. Top, middle, and bottom row be determined.
of gel indicate PCR-L, PCR-R, and PCR-F genotyping strategies, respectively.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 3 | Characterization of concatemers. a, Concatemer to references shown in panel e. g, qPCR measurement of AAV ITR when analyzed
junction-spanning PCR of 11 select samples (CRE knockin into CD14 locus in +/− AhdI RE. Y-axis is fluorescent intensity; x-axis is cycle number. Left graph
PT-iPSC—same samples as used in Southern blot in Extended Data Fig. 1). Top: shows DNA extracted from sample #10 (concatemeric monoallelic KI). Right
schematic showing location of primers. Size of expected amplicon indicated graph shows DNA extracted from sample #14 (monoallelic KI by end-joining).
assumes viral ITRs are joined end-to-end. Bottom: gel electrophoresis of PCR Δ indicates change in cycle threshold when run with and without AhdI. h, qPCR
products. Top of gel indicates genotype previously determined by ddPCR measurement of control primer-probe set Cis Ref-1. Same samples and axis as
and Southern blot. Bottom of gel indicates sample number. L = ladder; panel g. i, ddPCR results of counting ITR insertions in the 36 subclones +/− RE
N = no-template control. Ladder size indicated in kb. F and R indicate forward and (+RE is AhdI and MseI). Normalized to cis-references (not shown). j, Schematic
reverse primers, respectively. b, Repeat of panel a with forward primer only. showing ddPCR linkage analysis of ITR and cis-references. Left indicates
c, Repeat of panel a with reverse primer only. d, Schematic showing sequence of concatemeric knockin with chimeric ITR inserted. Right indicates end-joining-
chimeric ITR mapped to regions of viral vector genome. Green triangle indicates mediated knockin. Legend at bottom. Dashed lines indicate amplicons are linked.
AhdI restriction enzyme cutsite. Orange arrows and rectangle indicate primer Red X indicates linkage is lost when adding RE. k. Linkage heat map between
and probe sites for ITR ddPCR or qPCR (used in panel g-k). e, References from cis-reference sites and ITRs measured by ddPCR (analyzed with addition of AhdI).
ddPCR analysis of 36 subclones (same samples as in Extended Data Fig. 1). Sample number shown at top. Amplicon sites and legend shown in panel j. Top
Average of the two references set to 2 copies/cell. Sample number shown row indicates the % of Ref-1 sites bound to an ITR; middle row indicates linkage
at bottom. References are in cis: Ref-1 is upstream (5′) of LHA and Ref-2 is between Ref-2 sites bound to an ITR; bottom row indicates the % of Ref-1 sites
downstream (3′) of RHA. f, ddPCR results of counting LHA and RHA. Analyzed bound to Ref-2 sites.
+/− restriction enzyme (RE) used to separate concatemeric regions. Normalized

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 4 | Characterization of dual-knockin subclones. show the same images, processed differently. The left images were processed
a, Schematic indicating strategy for genotyping double knockin subclones using auto-contrast prior to stitching. The right images were processed with
using ddPCR linkage analysis. Cis reference sites shown as small light-gray and uniform settings. The upper 24 colonies were selected from the low-gate during
dark-gray squares. Ubc-GFP amplicon site indicated by small green square. FACS and the lower 48 colonies were selected from the high gate as shown in
Ubc-mCherry amplicon site indicated by small red square. Dashed-lines Fig. 2f. White scale bar (lower right) indicates 400 microns. d, Correlation of GFP
indicate linkage, which can be measured by ddPCR. b, Schematic indicating mean fluorescent intensity (MFI) and copies of Ubc-GFP per cell. MFI measured
strategy for counting concatemers in double knockin subclones using ddPCR. from uniformly processed images. Ubc-GFP copies/cell measured as shown
Scissors indicate restriction enzymes. Red X indicates linkage lost, such in panel b. Dashed line indicates linear regression—equation and R2 indicated
that concatemeric inserts will be separated. c, Images of 72 subclones after on graph. e, Correlation of mCherry MFI and copies of Ubc-mCherry per cells
dual knocking of Ubc-GFP and Ubc-mCherry into the TET2 locus in PT-iPSCs. (similar to panel d).
Subclones were selected and expanded as shown in Fig. 2e. Left and right side

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 5 | Flow cytometry plots and editing rate in human PSCs clonal analyses. h, Change in polydispersity in flow cytometry quadrants in
+/−ITR removal. a-g, Flow cytometry plots 5 days after double knockin of panel a-g (n = 7, each panel in a-g is one measurement). Q1-Q4 indicate upper left,
Ubc-GFP and Ubc-mCherry into human PSCs. Loci (TET2, AAVS1, HBB, RUNX1) upper right, lower right, and lower left quadrant, respectively. Polydispersity
and cell line (PT-iPSC, WT-iPSC, H9) shown at top. NC indicates no cut (ITR not measured as absolute average deviation and shown relative to NC. Error bars
removed). IC and DC indicate internal cut and distal cut methods for ITR removal, indicate 1 standard deviation; center indicates mean. * indicates P = 0.02 to 0.05.
respectively. GFP and mCherry fluorescent intensity indicated on x- and y-axis, ** indicates P = 0.005. P values are from two-sided paired t-test. i, Legend for
respectively. Gating indicated by quadrants; % cells in each quadrant shown on panel j-k. j-k, % cells positive for GFP and/or mCherry ( j), and % double positive
plot. Double positive cells were sorted into individual wells and used for later (GFP and mCherry, k), as indicated by panel a-g.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 6 | Concatemer rate at various MOIs. a, Schematic by trapezoid; % cells in each gate shown on plot. d, Flow-chart indicating
indicating ddPCR linkage analysis strategy for measuring KI frequency in bulk subpopulations analyzed for panel e-h and j-m. e, Editing rate (%GFP positive
(non-subcloned) samples. Cis reference sites shown as small light-gray and cells) at various MOIs as indicated by flow cytometry in panel c. f, Knockin rate
dark-gray squares. Ubc-GFP amplicon site indicated by small green square. (%HBB alleles with GFP knockin) at various MOIs as indicated by ddPCR. Black line
Dashed-lines indicate linkage, which can be measured by ddPCR. ID1 and ID2 indicates measurements performed by ddPCR linkage analysis as shown in panel
are small (<100 bp) unique DNA sequences added outside the homology arms a. Blue dashed-line indicates measurements performed by ddPCR allele counting
in the viral vector genome as shown at the top. b, Schematic indicating ddPCR as shown in Extended Data Fig. 1f. Samples analyzed were GFP+ cells sorted and
linkage analysis strategy for measuring concatemeric KI frequency in bulk expanded at each MOI as shown in panel c. The same samples are used for ddPCR
(non-subcloned) samples. Similar depiction as panel a. ddPCR amplicons for analysis in panel g-h. g, Concatemeric knockin rate (%GFP knockin alleles with
ID1 and ID2 shown as dark purple squares. c, Flow cytometry plots 5 days after concatemer) as indicated by ddPCR. h, Average size of concatemeric knockin
KI of Ubc-GFP into the HBB locus in H9 ESCs. Editing performed at various (for example, 1 = single KI, 2 = a concatemeric knockin with a single repeat, etc.)
MOIs indicated by number on each plot. Right gate on each plot indicates the as indicated by ddPCR. i, Flow cytometry plots 5 days after KI of Ubc-GFP into the
positive population, which was sorted and expanded for ddPCR analysis. Side- IL2RG locus (x chromosome) in H1 (male) ESCs. Similar to panel c. j-m, Similar to
scatter-area on y-axis and GFP fluorescent intensity on x-axis. Gating indicated panel e-h. Samples analyzed are from panel i.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 7 | Comparison of HDR- and NHEJ-mediated knockin rate comparing HDR- and NHEJ-mediated knockin rate by flow cytometry. Mismatched
+/−ITR removal. a-d, In-out PCR and gel electrophoresis of subclones after dual- homology arms (HA) measure NHEJ-mediated knockin. Matching homology arms
knockin of Ubc-GFP and Ubc-mCherry into the TET2 locus in PT-iPSCs. Subclones measure HDR-mediated knockin rate. f, Results of editing with matched versus
were selected from NC, IC, and DC groups that contained biallelic knockins mismatched homology arms as measured by flow cytometry. Number at top of
without concatemers (determined by ddPCR). Primer location and expected band graph corresponds to schematic in panel e. Experiment was performed with the
size shown on schematic at top of each gel. Larger than expected bands, indicated IC method of ITR removal (orange bars) or no removal (gray bars). GFP+ cells
by *, could result from NHEJ-mediated knockin. e, Schematic showing strategy for measured by flow cytometer more than 2 weeks after editing.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-024-02171-w

Extended Data Fig. 8 | Flow cytometry plots and editing rate in human HSPCs cells were sorted into single wells and used for later clonal analyses. h-i, Flow
+/−ITR removal. a-d, Flow cytometry plots 5 days after double knockin of Ubc- cytometry plots 5 days after knockin of Ubc-GFP in human CD34+ HPSCs at the
GFP and Ubc-mCherry into human CD34+ HSPCs. Loci (TET2, HBB) and repeat HBB locus. Gating indicated by rectangles; % cells in each gate shown on plot. GFP
number shown at top. NC indicates no cut (ITR not removed). DC indicates distal positive cells were sorted and transplanted into mice for later analyses. j, Table
cut method for ITR removal. GFP and mCherry fluorescent intensity indicated summarizing HSPC analyses. Legend for panel k-m. k, % cells positive for GFP
on x- and y-axis, respectively. Gating indicated by quadrants; % cells in each and/or mCherry as indicated by panel a-i. l, % cells positive for GFP and mCherry
quadrant shown on plot. Double positive cells were sorted into individual wells as indicated by panel a-d. m, % CD45+ GFP+ human cells in mouse whole bone
and used for later clonal analyses. e-g, Flow cytometry plots 5 days after knockin marrow 4 months after transplantation. Transplanted cells initially sorted from
of Ubc-GFP into in male human CD34+ HSPCs at the IL2RG locus (x chromosome). panel h-i and transplanted into 2 mice each (n = 8 total mice). Error bars indicate
GFP fluorescent intensity indicated on x-axis; side-scatter-area on y-axis. 1 standard deviation; center indicates mean.
Gating indicated by rectangles; % cells in each gate shown on plot. GFP positive

Nature Biotechnology
μ
μ
μ
μ

You might also like