Professional Documents
Culture Documents
Sciadv Abi6508
Sciadv Abi6508
INTRODUCTION electronic excited states. Any two adjacent pyrimidine bases can
Cancer genome sequencing has identified millions of somatic dimerize via their 5,6 double bonds. The second most frequent lesion
mutations in different types of human cancer (1–4), in some cases produced by UVB irradiation is the pyrimidine (6-4) pyrimidone
pointing to a potential origin of these mutations. Cancers linked to photoproduct [(6-4) photoproduct]. In comparative mutagenesis
environmental exposures show tumor-specific mutational signatures experiments, the CPD has been shown to be at least five times more
In this study, we set out to investigate the CPD deamination we named circle-damage-seq. This method is based on DNA
mechanism with the goal of determining whether this specific pathway damage–specific repair enzymes and allows sensitive detection and
can be linked to the frequent C-to-T mutations found in melanoma. genome-scale mapping of DNA base damage. The general outline
To perform this work, we first developed a new method for base of this method is depicted in Fig. 1. The cells are first exposed to a
resolution and whole-genome analysis of DNA damage. We used DNA-damaging agent and then harvested. DNA is purified and
this new methodology, termed circle damage sequencing (circle- sonicated to an average size of 300 to 400 base pairs (bp), a suitable
damage-seq), to separately map CPDs and deaminated CPDs in range for DNA circularization and downstream sequencing analysis
human UVB-irradiated cells at the genome scale and at the single-base (see Methods). After sonication and end repair, a DNA circulariza-
level. Our data provide strong support for the CPD deamination tion step is carried out using a diluted ligation reaction with T4
model being the mechanistic cause of most melanoma mutations. DNA ligase. T4 ligase also repairs DNA single-strand breaks that would
contribute to background signals. Any remaining noncircularized
fragments are eliminated by applying an exonuclease mixture
RESULTS (Fig. 1A). Then, the lesion is excised with a lesion-specific DNA repair
Development of the circle-damage-seq method enzyme and abasic site–processing enzymes within the circularized
To determine mutational mechanisms at the whole-genome scale, it is DNA to introduce a 1-nucleotide (nt) gap. Next, a double-strand
critical to map the underlying DNA damage at single-base resolution break is created at the DNA damage position by removing the single
and genome wide. However, these methods need to be specific and nucleotide on the opposite strand of the gapped DNA molecules
require high sensitivity. Because of the limitations often associated with single-strand specific S1 nuclease, thus producing a double-
with currently used methods, we developed a new strategy, which strand break (Fig. 1B). The use of multiple enzymes has the potential
A B C
Circularization and DNA double-strand break Library sequencing
remove linear DNA (DNA repair enzymes) and analysis
Ligated bp Adaptor
DNA damage
N
Repair enzyme Nick:
Plasmid-safe ATP-dependent DNase
S1 nuclease DSB:
Damage selective
1-bp gap sequencing
at damage site
D O
CPD
NH2
CH3
N 5 5 N F
6 6 hg19: chr12:64,959,118-64,959,184
O N N O
E
UVB damage CPD:
Photolyase Repair:
S1 nuclease DSB:
Fig. 1. Outline of circle-damage-seq. (A) DNA containing damaged bases (red) is sonicated, circularized, and then treated with an exonuclease cocktail. (B) DNA lesions
(arrows) are processed into DNA double-strand breaks (DSB) by base excision repair enzymes and S1 nuclease. Sequencing adaptors are ligated to the breaks. (C) DNA
library preparation for DNA damage–specific sequence enrichment. Divergent paired-end reads with single-base gaps indicate the damaged base positions. (D) Structure
of a CPD at a TC sequence. (E) Method for mapping of UVB-induced CPDs at single-nucleotide resolution. (F) Divergent paired reads of CPD mapping by circle-damage-seq.
Purple and lavender segments represent reads mapped to the plus and minus strands, respectively. Green arrows indicate single-base gaps matching the 5′ base of a
pyrimidine dimer. ATP, adenosine triphosphate; DNase, deoxyribonuclease.
to increase background of the procedure. We carefully optimized T4-PDG) (21). This enzyme can also cleave abasic sites; therefore, it
each enzymatic step including enzyme concentrations, incubation is important to minimize depurination during DNA isolation steps.
temperatures, and incubation times to minimize nonspecific DNA CPD induction was confirmed by gel electrophoresis after cleavage
cleavage. Circle-damage-seq uniquely enables sequencing of both of DNA with T4-PDG and S1 nuclease. As shown in Fig. 1E, to create
sides of the single cleavage site in one DNA molecule using paired-end double-strand breaks at the CPD positions after T4-PDG incision,
sequencing. Closed circular DNA without any recognized damaged we first used AP endonuclease 1 (APE1) to cleave the 5′ side of the
bases is excluded from the breakage and ligation steps, resulting in sugar-phosphate backbone at pyrimidine dimers. We followed this
lesion-selective DNA library enrichment by polymerase chain reac- treatment by incubation with E. coli photolyase under long-wave
tion (PCR) (Fig. 1C). Notably, sequencing reads obtained from circle- UVA light to remove dimerized pyrimidine bases that remain after
damage-seq show a specific aligned read pattern, divergent reads the DNA glycosylase incision. Next, S1 nuclease treatment generated
with a single-base gap in the center representing the damaged DNA ligatable DNA double-strand breaks within the circularized DNA.
base, because each paired-end read is derived from a single opened The combined treatment creates a single-nucleotide gap representing
circular DNA molecule (Fig. 1C). Thus, circle-damage-seq provides the 5′ pyrimidine of the dimer. We aligned the sequencing reads from
high selectivity for DNA damage mapping at single-nucleotide circle-damage-seq to the human reference genome. We observed the
resolution with low background at reduced sequencing depth. This divergent paired reads with a single-base gap in the center and con-
method should be applicable for mapping of any type of DNA dam- firmed the matching bases in the gaps as pyrimidines that have another
age for which a DNA excision repair or cleavage enzyme is available. pyrimidine base in the 3′ position (Fig. 1F). We found this pattern with
We first tested the feasibility of circle-damage-seq by analyzing a high frequency at 5′TT and 5′TC sites and at lower frequency at
the endogenous DNA modification N6-methyladenine (N6mA) in 5′CT and 5′CC dinucleotide sequences, consistent with the known
Escherichia coli dam+ cells (fig. S1). DNA double-strand breakage at specificity of CPD formation at dipyrimidine sequences (10). Addi-
N6mA bases within circularized DNA was achieved through Dpn I tional examples of genomic CPD mapping are shown in fig. S2.
cleavage, which generates a blunt end by splitting the target DNA
between 5′GN6mA and TC sequences. N6mA could readily be Sequence specificity of CPD formation
mapped in E. coli (fig. S1). Previous work has focused mainly on the dinucleotide specificity of
genomic UV damage (10, 12, 13, 22–24). We now analyzed tri-
Genomic mapping of UVB-induced CPDs nucleotide sequences that reflect the sequence specificity of CPD for-
To map UVB-induced CPDs (Fig. 1D), the major mutagenic UVB- mation in human fibroblasts using ~100 million aligned, divergent
induced lesion at dipyrimidine sequences (14), we irradiated human read pairs with single-nucleotide gaps (Fig. 2). When normalized
A B
PyPyN3' 5'NPyPy
0.5 0.6
Event rate (relative to hg19 genome)
Event rate (relative to hg19 genome)
0.4
0.3
0.3
0.2
0.2
0.1 0.1
0.0 0.0
0.02 0.02
Event rate (relative to hg19 genome)
NT NT
0.01 0.01
0.00 0.00
Fig. 2. Sequence context of CPD formation in human fibroblasts irradiated with UVB. (A) Distribution plot of trinucleotide sequences (PyPyN3′) undergoing CPD
formation in UVB-irradiated human cells (green). NT, nontreated control (red). Some background is seen in the NT samples at non-dipyrimidine sequences (the four trinu-
cleotides to the right). (B) Distribution plot of trinucleotide sequences (5′NPyPy) undergoing CPD formation in UVB-irradiated human cells (green). Data in (A) and (B) are
normalized for the trinucleotide frequencies of the human genome.
dimerized pyrimidines at the 3′ side were 5′TTA and 5′TCT (CPD Genome-wide mapping of cytosine-deaminated CPDs
underlined), followed by 5′TCA (Fig. 2A). Thus, T is preferred in CPDs are known to be the most prevalent and most mutagenic
the first position of the dimer, and T and A are the preferred bases UVB-induced DNA lesions (14, 10). CPDs containing cytosine are
3′ to the most abundant CPDs (5′TT and 5′TC). The most highly unstable. Owing to saturation of the 5,6 double bond of cytosine
enriched trinucleotide sequences considering the 5′ neighboring upon dimer formation (Fig. 1D), they are prone to undergo hydro-
base of the CPDs were 5′CTT and 5′CTC, followed by 5′TTC and lytic cytosine deamination to form uracil, which could be a potentially
5′TTT (Fig. 2B) showing a preference for having another pyrimidine mutagenic pathway (33–35). We next applied the circle-damage-seq
5′ to a pyrimidine dimer. Divergent reads with single-base gaps method to analyze the extent of CPD cytosine deamination. We
were >300 times less frequent in nontreated (NT) cells than in irradiated human fibroblast cells with UVB and harvested them
UVB-irradiated cells with no particular enrichment of sequencing 24 and 48 hours later to allow time for deamination. To specifically
reads associated with such trinucleotides in the control samples map the deaminated CPDs, we applied a photolyase-mediated re-
[Fig. 2, A and B (bottom)]. versal of the CPDs first, followed by the excision of U bases by uracil
We also analyzed the preferred tetranucleotide sequence con- DNA glycosylase (UDG) in the circle-damage-seq method (Fig. 4A).
texts for CPD formation (fig. S3), which were generally consistent This method successfully displays CPDs containing deaminated
with the trinucleotide patterns. The sequence enrichments for the cytosines at single-base resolution and shows cytosine in the single-
5′NPyPyN tetranucleotide context (fig. S3C) were dominated by a base gap, as predicted (Fig. 4B and fig. S4A). With this method,
T in the second position and by having another pyrimidine as the deamination of CPDs that contain 5-methylcytosine cannot be
first base. This pattern is similar to one obtained after in vitro irra- detected because the deaminated base is thymine. We note that we
diation of DNA with UVB (25). also observed double-base gaps at CC bases, indicating likely double
To identify even broader possible sequence motifs, we obtained deamination events, but these were difficult to quantify because of
a base enrichment on 20-nt regions around the gapped base pair the unknown efficiency of UDG incision at UU sequences. There-
using ~100 million aligned, divergent read pairs (Fig. 3A). Position fore, we focused on the single C gaps.
11 represents the nucleotide in the gap between divergent reads (the The patterns of deaminated CPDs were largely independent of
5′ pyrimidine of the CPD). We observed a strong enrichment of total CPD patterns (fig. S4, B and C) because of their respective
5′TT or 5′TC contexts at positions 11 and 12, representing the differential sequence requirements. At a genomic scale, deaminated
pyrimidine dimers. In this motif, T or C was enriched at position CPDs were strongly depleted at many TSS regions and even more so
10, and T and A bases were enriched at position 13, consistent with than total CPDs (Figs. 4, C and D, and 5, A and B, and fig. S5).
the tri- and tetranucleotide patterns. Unexpectedly, no further Genes with CpG island (CGI) promoters showed a dip of total
A B
70
50
40
30
–1.5 TSS TES 1.5 kb
C D E
2.10
2.10 2.20
2.05
CPD coverage
CPD coverage
CPD coverage
2.05
2.00 2.10
2.00
1.95
2.00
1.90 1.95
–1.5 TSS TES 1.5 kb –1.5 TSS 1.5 kb –1.5 TES 1.5 kb
Genes
Genes
–1.5 TSS TES 1.5 kb –1.5 TSS 1.5 kb –1.5 TES 1.5 kb
Gene distance (bp) Gene distance (bp) Gene distance (bp)
hg19 chr2:27,710,100-27,721,800
Fig. 3. CPD mapping by circle-damage-seq shows the distribution of CPDs at the sequence and gene level. (A) Nucleotide composition of mapped CPD positions.
Position 11 and 12 represent UVB-damaged dipyrimidine sequences undergoing CPD formation. At each base position, the height of each letter represents the relative
frequency of that nucleic acid base. (B) G+C sequence enrichment along human genes (hg19). (C) Heatmap and metagene profiles of CPD distribution along all genes of
the hg19 human genome. CPD coverage is sorted from high (top, blue) to low (bottom, red). The CPD signals were mapped and binned in 50-bp windows from 1.5 kb
upstream of the transcription start site (TSS) and then normalized relative to gene length over the gene bodies to the transcription end site (TES) and 1.5 kb downstream
of the TES. (D) Heatmap and profile of CPD distribution around the TSS and 1.5 kb of flanking sequence of all hg19 genes. (E) Heatmap and profile of CPD distribution
around the TES and 1.5 kb of flanking sequence of all hg19 genes. (F) Example of CPD sequence read distribution along several genes on chr2. There is a reduced frequency
of CPDs near the TSS (blue arrows).
A UDG
APE1
UVB damage Deamination Photolyase S1 nuclease
Sequencing
library
Circularization
Circle damage sequencing
B C
NT
T Deamination
Deami
minatio
n on (0 h) Deamination
Deamin
nation
on (24 h) Deamination
Deamina
nation
a on (48 h)
hg19: chr11
TCC TCA
Genes
TCA
TCA
CCA
CCT TCC
Fig. 4. Mapping of deaminated CPDs at the sequence and gene level. (A) Outline of the method used for mapping of deaminated CPDs. The red “=” symbol indicates
a CPD. (B) Divergent paired reads of deaminated CPDs obtained by circle-damage-seq. Purple and lavender segments represent reads mapped to the plus and minus
strands, respectively. Green arrows indicate single-base gaps at cytosine bases matching the deaminated base of a pyrimidine dimer. The trinucleotide context of the
deaminated cytosine is indicated. (C) Heatmaps and metagene profiles of deaminated CPDs in UVB-irradiated human fibroblasts. Total signal is sorted from high to low
(top to bottom). Metagene profiles are shown for untreated cells (NT) and at 0, 24, and 48 hours following UVB irradiation. (D) Browser views of total CPDs (top) and cytosine-
deaminated CPDs (bottom) along the DKK3 gene on chromosome 11 in human fibroblasts. Note the reduced levels of deaminated CPDs at the TSS containing a CGI (red bar).
$ %
TSSs with CpG islands TSSs without CpG islands
Total CPDs Deaminated CPDs Total CPDs Deaminated CPDs
1.40 1.30
1.4
1.20
Coverage
Coverage
1.30 1.2
1.0
1.10
1.20
0.8
–1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb
4.0
4.0
3.5
3.5
3.0
3.0
2.5
2.5
Genes
Genes
2.0
2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
–1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb
Gene distance (bp) Gene distance (bp) Gene distance (bp) Gene distance (bp)
&
Total CPDs Deamination CPDs
2.2
0.9
CPD coverage
2.1 0.8
0.7
2.0
0.6
–1.5 TES 1.5 kb –1.5 TES 1.5 kb
3
Genes
Genes
0
–1.5 TES 1.5 kb –1.5 TES 1.5 kb
Gene distance (bp) Gene distance (bp)
Fig. 5. Levels of total CPDs and deaminated CPDs at TSSs and TESs. (A) Human genes with G+C-rich promoters (CGIs) show low frequencies of total CPDs and deaminated
CPDs near TSS (red arrows) and increased levels just upstream of the TSS. (B) Human genes without CGI promoters do not show CPD depletion near the TSS but still show
reduced levels of deaminated CPDs near the TSS albeit at a lesser extent than at CGIs (see scales). (C) Heatmap and metagene profile of total CPDs and deaminated CPDs
near the TES ± 1.5 kb in human UVB-irradiated fibroblasts. The data in this figure were prepared using 100 million aligned, divergent read pairs with single-base gaps.
C-containing CPDs (the A-rule). A competing model proposes that aligned reads and observed a prominent enrichment of TCC, TCA,
cytosines within CPDs first undergo deamination to form uracil, be- TCT, CCT, and CCC sequences (Fig. 6, B to D). The trinucleotide
fore “error-free” polymerase bypass, e.g., by POLH, incorporating data also suggest that cytosine needs to be in the second position of a
adenines across the deaminated, uracil-containing CPDs (Fig. 6A). CPD to score as a prominent deaminated CPD (Fig. 6, C and D; see for
To evaluate the deamination model, we next determined the tri- example the much higher levels of deaminated CPDs at trinucleotides
nucleotide sequence preference of deaminated CPDs using ~100 million beginning with 5′TC versus those beginning with 5′CT). A broader
A B
UVB :
Probability
Deamination :
Mutation (C T) :
C
NPyPy for total CPDs
NPyN for deaminated CPDs and mutation
Percentage of trinucleotide sequences
D E
PyPyN for total CPDs, deaminated CPDs,
and mutation
Percentage of trinucleotide sequences
Fig. 6. Sequence specificity of deaminated CPDs in the UVB-irradiated human genome and melanoma mutations. (A) Outline of CPD deamination and potential
mutagenic consequences. (B) Sequence context of the occurrence of deaminated CPDs at 48 hours following UVB irradiation. Position 11 is the deaminated cytosine.
(C) Trinucleotide sequence specificity of total CPD formation, formation of deaminated CPDs, and the prominent mutational signature 7 (SBS7) in melanoma. For total CPDs,
we show the sequences in the context of 5′N-dipyrimidine (NPyPy; blue line plot, ranked according to levels; CPDs at NPyPy are underlined). For deaminated CPDs and
mutations, the NPyN (n = 32) trinucleotide context is shown (green and red bars, respectively). (D) Trinucleotide sequence specificity of total CPD formation, formation of
deaminated CPDs, and the mutation signature SBS7 in melanoma genomes. We show the sequences in the context 5′dipyrimidine-N (PyPyN) (n = 16). Data in (B) to (D)
are not normalized for total genomic trinucleotide frequencies (see also fig. S6). (E) Heatmap showing the cosine similarity scores for total CPDs (NPyPy, n = 32), deami-
nated CPDs, and the prominent mutational signature 7 (SBS7/SBS7a/SBS7b) in melanoma along with the 30 COSMIC v2 mutational signatures. We only used the C-to-T
and T-to-C mutation windows for comparisons with the COSMIC trinucleotide signatures. Dark blue colors indicate high similarity (see also fig. S7).
sequence analysis defined the preferred sequence context of deami- Other methods are available to accomplish DNA damage mapping
nated CPD formation as 5′-Py-C-T/A/C (deaminated C is underlined) by next-generation sequencing (22, 24, 37, 43). The advantages of
(Fig. 6B). These data were not corrected for trinucleotide frequencies circle-damage-seq over comparable DNA damage mapping tech-
of the human genome, which therefore leads to an underestimation niques are several-fold. (i) The single incision point of the modified
of the relative event frequencies at TCG or CCG sequences, but see base and the divergent reads emanating from this break position give
fig. S6 for the normalized values. The common occurrence of deam- a clear indication of where the damage is located. (ii) Background
inated CPDs at these preferred sequence contexts can at least partially signals are minimized by removal of DNA molecules that did not
be attributed to high levels of CPDs forming at the sequences, in par- undergo circularization. (iii) Most single-strand breaks that also
ticular at 5′TC (Fig. 2). This difference was also observed in an in vitro contribute to background are removed during circular ligation. (iv)
study of flanking sequence effects on CPD deamination and was The method represents DNA damage–specific sequencing because DNA
attributed to a more facile attack of water at the 3′-C than at the 5′-C circles or linear molecules without damaged bases are not sequenced,
because of greater steric interference from pi stacking at the 5′-C (40). thus contributing to lower background and lower sequencing costs.
Notably, the deaminated CPD trinucleotide distribution rep- The circle-damage-seq method depends on the ability to convert
resents an excellent match with the mutated trinucleotide sequences the modified base into a DNA single-strand break. Cleavage with S1
from melanoma genomes, particularly in its similarity to single-base nuclease on the opposite DNA strand then creates a damage-specific
substitution signature 7a (SBS7a) or the classical COSMIC (v2) mu- ligatable double-strand break. DNA glycosylase enzymes, which
tation signature 7 (SBS7; cosine similarities between 0.83 and 0.85) operate at the initial step of the base excision repair pathway, can
(Fig. 6E and fig. S7) (1, 23). These signatures are highly enriched in easily be adapted for this technique. By choosing the appropriate DNA
melanoma genomes, where the classical signature 7 appears like a glycosylase, for example, 8-oxoguanine DNA glycosylase (OGG1) for
composite of SBS7a and SBS7b. The similarity between the deami- 8-oxoguanine or alkyladenine DNA glycosylase (AAG) for alkylated
nation signature and SBS7b was somewhat lower (cosine similarity adenines, specific types of base damage can be mapped. The nucle-
0.78 to 0.79). The most enriched trinucleotide contexts for both the otide excision repair (NER) complex could be used for more bulky
deaminated CPDs and the melanoma mutation datasets were TCC, DNA lesions (44). The longer single-stranded gap remaining after
TCA, TCT, CCC, CCA, and CCT (Fig. 6, C and D). The sequences TCG excision with NER enzymes should be a good substrate for S1 nu-
and CCG represent a special case; they cannot be mapped in their clease, leading to an 11-nt gap for each divergent read pair after
deaminated form by our approach using uracil excision when they circle-damage-seq, for example, when benzo[a]pyrene DNA ad-
originally contain 5-methylcytosine, which deaminates to thymine, ducts are excised with the UvrABC complex from DNA (45, 46). It
and their levels are therefore underestimated. Using in vitro studies, it should also be feasible to map rare endogenous DNA bases with
occur before DNA replication. Since cytosine deamination in CPDs DNA ligase (NEB) in a final volume of 200 l at 16°C overnight.
is strictly a chemical reaction, it will be difficult to prevent it from Then, the reaction mixture was cleaned up with DCC-5 kit, and the
occurring, although CPD removal by skin-penetrating CPD-specific DNA was eluted in 42 l of 10 mM tris-HCl (pH 7.5). To remove
DNA repair enzymes appears to be a reasonable concept, as pro- noncircularized linear DNA from the circularized DNA pool, the
posed earlier (49), and these enzymes work on uracil-containing ligase-treated DNA was incubated with 2 l (10 U/l) of Plasmid-Safe
CPDs. In summary, our data support a specific mutational mecha- ATP-Dependent DNase (Lucigen; Middleton, WI) in 1× reaction
nism of how a prevalent form of sunlight-induced DNA damage buffer [33 mM tris-acetate (pH 7.5), 66 mM potassium acetate, 10 mM
induces specific types of mutations found at high frequency in magnesium acetate, 0.5 mM DTT, and 1 mM ATP] in a final
human skin cancer genomes. volume of 50 l at 37°C for 30 min and then at 70°C for 20 min to
stop the reaction. Then, the DNA was purified with 90 l (1.8×) of
AMPure XP beads (Beckman Coulter; Indianapolis, IN) and eluted
METHODS in 35 l of 10 mM tris-HCl (pH 8.0).
Cell culture Cleavage of DNA at CPD sites
Human skin fibroblasts cells were obtained from the American To generate DNA double-strand breaks at CPD sites, the circular-
Type Culture Collection (ATCC; catalog no. PCS-201-012) and ized DNA was treated with the following DNA repair enzymes.
grown using the Fibroblast Growth Kit (ATCC, catalog no. PCS- First, to specifically incise DNA at CPD positions, the circularized
201-041) supplemented with 2% fetal bovine serum. The cells were DNA was incubated in 1× NEBuffer 4 reaction buffer (NEB) [50 mM
used at passage number <5. potassium acetate, 20 mM tris-acetate (pH 7.9), 10 mM magnesium
acetate, and 1 mM DTT] and 0.4 l of bovine serum albumin
UVB irradiation and genomic DNA isolation (20 mg/ml) with 1 l (10 U/l) of T4-PDG (NEB) and 1 l (10 U/l)
For mapping of UVB-induced CPDs, fibroblasts at 80 to 90% of APE1 (NEB) in a final volume of 40 l at 37°C for 20 min. Then,
confluence on 10-cm culture plates were washed with 1× phosphate- the DNA was cleaned up with 72 l (1.8×) of AMPure XP beads and
buffered saline (PBS) and were irradiated in PBS with a UVB dose eluted in 25 l of 10 mM tris-HCl (pH 8.0). To revert the dimerized
of 1000 J/m2. The UVB lamp (Thermo Fisher Scientific, UVP 3UV pyrimidines remaining after T4-PDG incision, the DNA from above
lamp) had a peak spectral emission at 302 nm. The UVB dose was was treated with 3 l (0.25 g/l) of E. coli phrB photolyase (Novus
determined using a UVX radiometer with a UVB probe (Ultraviolet Biologicals; Centennial, CO) in 1× reaction buffer [50 mM tris-HCl
Products; Upland, CA). After irradiation, the cells were immediate- (pH 7.0), 50 mM NaCl, and 10 mM DTT] in a final volume of 50 l.
ly trypsinized and collected, and then, genomic DNA was isolated The reaction tube was placed at a distance of 15 cm under a UVA
10 mM MgCl2, 50 mM NaCl, and 1 mM DTT] and 0.2 mM dATP, double-strand breaks at Dpn I cleavage sites, the circularized E. coli
with 2 l (5 U/l) of Klenow fragment exo- (NEB) in a final volume DNAs were incubated in 1× CutSmart buffer (NEB) with 1 l (20 U/l)
of 55 l by incubation at 37°C for 30 min, and then, the reaction was of Dpn I restriction enzyme (NEB) in a final volume of 20 l at 37°C
heat-inactivated by further incubation at 70°C for 30 min. To per- for 1 hour. Then, circle-damage-seq library preparation and ampli-
form adaptor ligation, 30 l of NEBNext Ultra II Ligation Master fication by PCR were performed, as described above. The circle-
Mix, 1 l of NEBNext Ligation Enhancer (NEB), and 3 l of 1.5 M damage-seq library was sequenced as 75-bp paired-end reads on an
NEBNext adaptors for Illumina sequencing (NEB) [5′-phos-GATC- Illumina NextSeq500 platform to obtain ~15 million reads.
GGAAGAGCACACGTCTGAACTCCAGTC/ideoxyU/
ACACTCTTTCCTACACGACGCTCTTCCGATC*T-3′ and Bioinformatics analysis
5′-phos-GATCGGAAGAGCACACGTCTGAACTCCAGTC/ideoxyU/ All libraries were hard-clipped from the 3′ end to 2 × 80 nucleotide
ACACTCTTTCCTACACGACGCTCTTCCGATC*C-3′ (*; phos- length using Trim_Galore (www.bioinformatics.babraham.ac.uk/
phorothioate linkage)] were added to the dA-tailing reaction mix- projects/trim_galore/). Reads were then adapter-trimmed and quality-
ture in a total volume of 93 l and incubated at 20°C for 15 min, and trimmed (phred < 20) in a single pass using Trim_Galore default
this reaction step was followed by incubation with 3 l (1 U/l) of parameters. Trimmed reads were aligned to the hg19 reference
USER enzyme (NEB) at 37°C for 20 min. Then, the adaptor-ligated genome using BWA-MEM version 0.7.17 (50) with default param-
circle-damage-seq library was cleaned up with 96 l (1×) of AMPure eters. Duplicate alignments were marked and removed with the
XP beads and size-selected (300 to 1000 bp) with 0.5 volumes of “removeDups” SAMBLASTER option (51). SAMtools version 1.11
AMPure XP beads, and the DNA was eluted in 25 l of 10 mM (52) was then used to remove reads with low mapping quality
tris-HCl (pH 8.0). (MAPQ < 20) and to sort reads by genomic coordinates.
Library amplification and sequencing Divergently aligned read pairs (pairs facing away from each
To amplify the circle-damage-seq library by PCR, the eluted library other) with a single-nucleotide gap between reads were selected by
was mixed with 25 l of NEBNext Ultra II Q5 Master Mix (2×), 2.5 l selecting SAM records with TLEN of 3. TLEN equals the distance
of 10 M NEBNext i5, and 2.5 l of 10 M NEBNext i7 primers between the mapped end of the template and the mapped start of
(NEB) in a final volume of 50 l. The amplification reaction mixture the template. Records with TLEN = 3 represent divergently aligned
was incubated at 98°C for 30 s, and then, 12 cycles of PCR at 98°C reads with one nucleotide between mates. Sequences for the three
for 10 s and 65°C for 75 s were performed, followed by a final exten- nucleotides representing the last nucleotide of each read and the
sion step at 65°C for 5 min. The PCR product was purified with gapped base were identified using BEDtools (53). To visualize the
50 l (1×) of AMPure XP beads and eluted in 20 l of 10 mM tris- reads, bam files were produced by selecting SAM records with
Cosine similarity analyses were performed using R (version 4.0.2) 12. D. E. Brash, UV signature mutations. Photochem. Photobiol. 91, 15–26 (2015).
package “coop” www.R-project.org/ (57). Since we have only 32 or 13. J. Cadet, A. Grand, T. Douki, Solar UV radiation-induced DNA bipyrimidine photoproducts:
Formation and mechanistic insights. Top. Curr. Chem. 356, 249–275 (2015).
16 combinations of trinucleotides for CPDs and deamination avail-
14. Y. H. You, D. H. Lee, J. H. Yoon, S. Nakajima, A. Yasui, G. P. Pfeifer, Cyclobutane pyrimidine
able, we could only use the C-to-T and T-to-C mutation windows dimers are responsible for the vast majority of mutations induced by UVB irradiation
for the NPyPy (n = 32) and the C-to-T windows for the PyPyN in mammalian cells. J. Biol. Chem. 276, 44688–44694 (2001).
(n = 16) to compare with the COSMIC trinucleotide signatures. 15. M. F. Laughery, A. J. Brown, K. A. Bohm, S. Sivapragasam, H. S. Morris, M. Tchmola,
Other mutations, such as C to G, C to A, T to G, and T to A, were A. D. Washington, D. Mitchell, S. Mather, E. P. Malc, P. A. Mieczkowski, S. A. Roberts,
J. J. Wyrick, Atypical UV photoproducts induce non-canonical mutation classes
ignored. This explains some level of similarity between, for example, associated with driver mutations in melanoma. Cell Rep. 33, 108401 (2020).
SBS4 and SBS7, which are otherwise mostly unrelated (SBS4 is 16. A. Besaratinia, J. I. Yoon, C. Schroeder, S. E. Bradforth, M. Cockburn, G. P. Pfeifer,
dominated by C-to-A mutations). Wavelength dependence of ultraviolet radiation-induced DNA damage as determined by
laser irradiation suggests that cyclobutane pyrimidine dimers are the principal DNA
Data availability lesions produced by terrestrial sunlight. FASEB J. 25, 3079–3091 (2011).
17. J. Jans, W. Schul, Y.-G. Sert, Y. Rijksen, H. Rebel, A. P. M. Eker, S. Nakajima, H. van Steeg,
All data are available from the GEO database (accession number F. R. de Gruijl, A. Yasui, J. H. Hoeijmakers, G. T. J. van der Horst, Powerful skin cancer
GSE159807). protection by a CPD-photolyase transgene. Curr. Biol. 15, 105–115 (2005).
18. B. S. Strauss, The "A" rule revisited: Polymerases as determinants of mutational specificity.
DNA Repair (Amst) 1, 125–135 (2002).
Code availability 19. R. E. Johnson, S. Prakash, L. Prakash, Efficient bypass of a thymine-thymine dimer by yeast
All code for this project is available at https://github.com/ DNA polymerase, Poleta. Science 283, 1001–1004 (1999).
deanpettinga/CDseq. 20. C. Masutani, R. Kusumoto, A. Yamada, N. Dohmae, M. Yokoi, M. Yuasa, M. Araki, S. Iwai,
K. Takio, F. Hanaoka, The XPV (xeroderma pigmentosum variant) gene encodes human
SUPPLEMENTARY MATERIALS DNA polymerase eta. Nature 399, 700–704 (1999).
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/ 21. E. H. Radany, E. C. Friedberg, A pyrimidine dimer-DNA glycosylase activity associated
content/full/7/31/eabi6508/DC1 with the v gene product of bacterophage T4. Nature 286, 182–185 (1980).
22. P. Mao, M. J. Smerdon, S. A. Roberts, J. J. Wyrick, Chromosomal landscape of UV damage
View/request a protocol for this paper from Bio-protocol.
formation and repair at single-nucleotide resolution. Proc. Natl. Acad. Sci. U.S.A. 113,
9057–9062 (2016).
23. M. Lindberg, M. Bostrom, K. Elliott, E. Larsson, Intragenomic variability and extended
REFERENCES AND NOTES
sequence patterns in the mutational signature of ultraviolet light. Proc. Natl. Acad. Sci. U.S.A.
1. L. B. Alexandrov, S. Nik-Zainal, D. C. Wedge, S. A. J. R. Aparicio, S. Behjati, A. V. Biankin,
116, 20411–20417 (2019).
G. R. Bignell, N. Bolli, A. Borg, A.-L. Børresen-Dale, S. Boyault, B. Burkhardt, A. P. Butler,
24. S. Premi, L. Han, S. Mehta, J. Knight, D. Zhao, M. A. Palmatier, K. Kornacker, D. E. Brash,
C. Caldas, H. R. Davies, C. Desmedt, R. Eils, J. E. Eyfjörd, J. A. Foekens, M. Greaves,
Genomic sites hypersensitive to ultraviolet radiation. Proc. Natl. Acad. Sci. U.S.A. 116,
39. R. Sabarinathan, L. Mularoni, J. Deu-Pons, A. Gonzalez-Perez, N. Lopez-Bigas, Nucleotide 52. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis,
excision repair is impaired by binding of transcription factors to DNA. Nature 532, R. Durbin; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/
264–267 (2016). Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
40. V. J. Cannistraro, J. S. Taylor, Acceleration of 5-methylcytosine deamination 53. A. R. Quinlan, BEDTools: The Swiss-army tool for genome feature analysis. Curr. Protoc.
in cyclobutane dimers by G and its implications for UV-induced C-to-T mutation hotspots. Bioinformatics 47, 11.12.1-34 (2014).
J. Mol. Biol. 392, 1145–1157 (2009). 54. H. Li, Tabix: Fast retrieval of sequence features from generic TAB-delimited files.
41. V. J. Cannistraro, S. Pondugula, Q. Song, J. S. Taylor, Rapid deamination of cyclobutane Bioinformatics 27, 718–719 (2011).
pyrimidine dimer photoproducts at TCG sites in a translationally and rotationally 55. O. Wagih, ggseqlogo: A versatile R package for drawing sequence logos. Bioinformatics
positioned nucleosome in vivo. J. Biol. Chem. 290, 26597–26609 (2015). 33, 3645–3647 (2017).
42. J. Drost, R. van Boxtel, F. Blokzijl, T. Mizutani, N. Sasaki, V. Sasselli, J. de Ligt, S. Behjati, 56. F. Ramírez, D. P. Ryan, B. Grüning, V. Bhardwaj, F. Kilpert, A. S. Richter, S. Heyne, F. Dündar,
J. E. Grolleman, T. van Wezel, S. Nik-Zainal, R. P. Kuiper, E. Cuppen, H. Clevers, Use T. Manke, deepTools2: A next generation web server for deep-sequencing data analysis.
of CRISPR-modified human stem cell organoids to study the origin of mutational Nucleic Acids Res. 44, W160–W165 (2016).
signatures in cancer. Science 358, 234–238 (2017). 57. D. Schmidt, Co-operation: Fast correlation, covariance, and cosine similarity, R package
43. A. Canela, S. Sridharan, N. Sciascia, A. Tubbs, P. Meltzer, B. P. Sleckman, A. Nussenzweig, version 0.6-2 (2019).
DNA breaks and end resection measured genome-wide by end sequencing. Mol. Cell 63,
898–911 (2016). Acknowledgments: We thank D. Fu, E. Wolfrum and H. Lyong for help with bioinformatics
44. A. Sancar, Mechanisms of DNA excision repair. Science 266, 1954–1956 (1994). and biostatistics analysis. Funding: This work was supported by NIH grant CA228089 to G.P.P.
45. M. F. Denissenko, A. Pao, M. Tang, G. P. Pfeifer, Preferential formation of benzo[a]pyrene Author contributions: S.-G.J. and G.P.P. conceptualized and designed the project. S.-G.J. and
adducts at lung cancer mutational hotspots in P53. Science 274, 430–432 (1996). J.J. performed experiments. D.P., P.L., S.-G.J., and G.P.P. performed data analysis. G.P.P. and
46. M. S. Tang, J. B. Zheng, M. F. Denissenko, G. P. Pfeifer, Y. Zheng, Use of UvrABC nuclease S.-G.J. wrote the manuscript. All authors commented on the manuscript. Competing
to quantify benzo[a]pyrene diol epoxide-DNA adduct formation at methylated versus interests: The authors declare that they have no competing interests. Data and materials
unmethylated CpG sites in the p53 gene. Carcinogenesis 20, 1085–1089 (1999). availability: All data needed to evaluate the conclusions in the paper are present in the paper
47. K. Takasawa, C. Masutani, F. Hanaoka, S. Iwai, Chemical synthesis and translesion and/or the Supplementary Materials. All sequencing data are available from the GEO database
replication of a cis–syn cyclobutane thymine–uracil dimer. Nucleic Acids Res. 32, (accession number GSE159807). Additional data related to this paper may be requested from
1738–1745 (2004). the authors.
48. Q. Song, S. M. Sherrer, Z. Suo, J. S. Taylor, Preparation of site-specific T=mCG cis-syn
cyclobutane dimer-containing template and its error-free bypass by yeast and human Submitted 22 March 2021
polymerase . J. Biol. Chem. 287, 8021–8028 (2012). Accepted 14 June 2021
49. D. B. Yarosh, A. Rosenthal, R. Moy, Six critical questions for DNA repair enzymes in Published 30 July 2021
skincare products: A review in dialog. Clin. Cosmet. Investig. Dermatol. 12, 617–624 (2019). 10.1126/sciadv.abi6508
50. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754–1760 (2009). Citation: S.-G. Jin, D. Pettinga, J. Johnson, P. Li, G. P. Pfeifer, The major mechanism of melanoma
51. G. G. Faust, I. M. Hall, SAMBLASTER: Fast duplicate marking and structural variant read mutations is based on deamination of cytosine in pyrimidine dimers as determined by circle
extraction. Bioinformatics 30, 2503–2505 (2014). damage sequencing. Sci. Adv. 7, eabi6508 (2021).
Science Advances (ISSN ) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science Advances is a registered trademark of AAAS.
Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).