Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

SCIENCE ADVANCES | RESEARCH ARTICLE

CELL BIOLOGY Copyright © 2021


The Authors, some
The major mechanism of melanoma mutations is based rights reserved;
exclusive licensee
on deamination of cytosine in pyrimidine dimers American Association
for the Advancement
as determined by circle damage sequencing of Science. No claim to
original U.S. Government
Works. Distributed
Seung-Gi Jin1, Dean Pettinga2, Jennifer Johnson1, Peipei Li3, Gerd P. Pfeifer1* under a Creative
Commons Attribution
Sunlight-associated melanomas carry a unique C-to-T mutation signature. UVB radiation induces cyclobutane NonCommercial
pyrimidine dimers (CPDs) as the major form of DNA damage, but the mechanism of how CPDs cause mutations is License 4.0 (CC BY-NC).
unclear. To map CPDs at single-base resolution genome wide, we developed the circle damage sequencing
(circle-damage-seq) method. In human cells, CPDs form preferentially in a tetranucleotide sequence context
(5′-Py-T<>Py-T/A), but this alone does not explain the tumor mutation patterns. To test whether mutations arise at
CPDs by cytosine deamination, we specifically mapped UVB-induced cytosine-­deaminated CPDs. Transcription
start sites (TSSs) were protected from CPDs and deaminated CPDs, but both lesions were enriched immediately
upstream of the TSS, suggesting a mutation-promoting role of bound transcription factors. Most importantly, the
genomic dinucleotide and trinucleotide sequence specificity of deaminated CPDs matched the prominent muta-
tion signature of melanomas. Our data identify the cytosine-deaminated CPD as the leading premutagenic lesion
responsible for mutations in melanomas.

INTRODUCTION electronic excited states. Any two adjacent pyrimidine bases can
Cancer genome sequencing has identified millions of somatic dimerize via their 5,6 double bonds. The second most frequent lesion
mutations in different types of human cancer (1–4), in some cases produced by UVB irradiation is the pyrimidine (6-4) pyrimidone
pointing to a potential origin of these mutations. Cancers linked to photoproduct [(6-4) photoproduct]. In comparative mutagenesis
environmental exposures show tumor-specific mutational signatures experiments, the CPD has been shown to be at least five times more

Downloaded from https://www.science.org on September 19, 2021


(5). For example, smoking-associated lung cancers carry a prevalent mutagenic than the (6-4) photoproducts (14). One reason for this
C/G to A/T mutation signature, as initially defined by Alexandrov difference is the generally rapid global repair of (6-4) photoproducts
et al. (1, 6). Another example is urothelial carcinomas linked to (within minutes or hours) relative to CPDs, which can persist for
consumption of plants that contain the mutagen aristolochic acid, days in irradiated cells or human skin. In addition to CPDs and
which produces characteristic A-to-T transversions (7). The theory (6-4) photoproducts, other DNA photoproducts can be produced
that a particular exposure is linked to a cancer is particularly in DNA at lower levels and may be relevant for atypical, non–C-to-T
compelling when it is supported by epidemiological evidence and mutations in melanoma (15). In addition, (6-4) photoproducts are
by mechanistic studies that attempt to recapitulate the mutagenesis produced at even lower levels relative to CPDs when the irradiation
process in vitro or in vivo. wavelength is greater than 300 nm (16), which is the most physiologi-
Most melanomas and nonmelanoma skin cancers are linked to cally relevant part of the UV spectrum. Experiments with transgenic
sunlight exposure. The ultraviolet B (UVB) component of sunlight mice in which either the CPDs or the (6-4) photoproducts were removed
(280 to 315 nm) is strongly carcinogenic in mouse models (8), and by a specific repair enzyme have shown that removal of CPDs pro-
of particular relevance is the wavelength above 300 nm, which vides powerful protection against UVB-induced skin cancers (17).
reaches the earth’s surface. For most sequenced melanoma genomes, The high frequencies of C-to-T and CC-to-TT double mutations
C-to-T mutations at dipyrimidine sequences (for example, at TC or in melanoma and nonmelanoma skin cancers are therefore best ex-
CC) are highly prevalent, suggesting that UVB radiation from sun plained by proposing the CPD as a premutagenic lesion. However,
exposure is involved in their formation (9, 10). Individual melanoma it has remained unclear how CPD lesions induce the characteristic
genomes may harbor thousands of such mutations. These same C-to-T mutations in skin cancer. There are two major models: The
mutations are observed in various experimental systems when UV first one proposes that CPDs that contain cytosine are replicated by
radiation is used as the mutagen (11, 12). However, it is unknown DNA polymerases that incorporate adenine across from cytosine in an
what specific mechanism is responsible for these melanoma mutations. error-prone pathway. This model has been referred to as the “A-rule”
UVB radiation induces various forms of DNA damage in ex- for polymerase bypass of non-instructional DNA lesions (18). When
posed cells (13). The most frequent damage type is the cis-syn the CPD contains thymine, pairing with adenine will not lead to a
cyclobutane pyrimidine dimer (CPD), which arises after absorption mutation. The lesion-tolerant DNA polymerase eta (POLH) has been
of photons by DNA bases and occurs through the formation of shown to bypass TT CPDs correctly by incorporating two adenines
(19, 20). A competing model suggests that cytosines, when part of a
CPD, are much more susceptible to hydrolytic deamination, a pathway
1
Department of Epigenetics, Van Andel Institute, Grand Rapids, MI 49503, USA. that generates CPDs containing uracil. When these U-containing
2
Bioinformatics and Biostatistics Core, Van Andel Institute, Grand Rapids, MI 49503, dimers are then bypassed by a polymerase with incorporation of
USA. 3Department of Neurodegenerative Science, Van Andel Institute, Grand Rapids,
MI 49503, USA. adenine, a C-to-T mutation is the outcome, and it occurs because of
*Corresponding author. Email: gerd.pfeifer@vai.org the deamination reaction and not because of a polymerase error.

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 1 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

In this study, we set out to investigate the CPD deamination we named circle-damage-seq. This method is based on DNA
mechanism with the goal of determining whether this specific pathway damage–specific repair enzymes and allows sensitive detection and
can be linked to the frequent C-to-T mutations found in melanoma. genome-scale mapping of DNA base damage. The general outline
To perform this work, we first developed a new method for base of this method is depicted in Fig. 1. The cells are first exposed to a
resolution and whole-genome analysis of DNA damage. We used DNA-damaging agent and then harvested. DNA is purified and
this new methodology, termed circle damage sequencing (circle-­ sonicated to an average size of 300 to 400 base pairs (bp), a suitable
damage-seq), to separately map CPDs and deaminated CPDs in range for DNA circularization and downstream sequencing analysis
human UVB-irradiated cells at the genome scale and at the single-base (see Methods). After sonication and end repair, a DNA circulariza-
level. Our data provide strong support for the CPD deamination tion step is carried out using a diluted ligation reaction with T4
model being the mechanistic cause of most melanoma mutations. DNA ligase. T4 ligase also repairs DNA single-strand breaks that would
contribute to background signals. Any remaining noncircularized
fragments are eliminated by applying an exonuclease mixture
RESULTS (Fig. 1A). Then, the lesion is excised with a lesion-specific DNA repair
Development of the circle-damage-seq method enzyme and abasic site–processing enzymes within the circularized
To determine mutational mechanisms at the whole-genome scale, it is DNA to introduce a 1-nucleotide (nt) gap. Next, a double-strand
critical to map the underlying DNA damage at single-base resolution break is created at the DNA damage position by removing the single
and genome wide. However, these methods need to be specific and nucleotide on the opposite strand of the gapped DNA molecules
require high sensitivity. Because of the limitations often associated with single-strand specific S1 nuclease, thus producing a double-­
with currently used methods, we developed a new strategy, which strand break (Fig. 1B). The use of multiple enzymes has the potential

A B C
Circularization and DNA double-strand break Library sequencing
remove linear DNA (DNA repair enzymes) and analysis
Ligated bp Adaptor
DNA damage

Downloaded from https://www.science.org on September 19, 2021


Library
enrichment
N
DNA damage Adduct:

N
Repair enzyme Nick:
Plasmid-safe ATP-dependent DNase
S1 nuclease DSB:

Damage selective
1-bp gap sequencing
at damage site

Adaptor ligation ( ) Divergent paired reads


with 1-bp gap in the center

D O
CPD
NH2
CH3

N 5 5 N F
6 6 hg19: chr12:64,959,118-64,959,184
O N N O

E
UVB damage CPD:

T4-PDG & APE1 Nick: hg19: chr12:64,943,869-64,943,929

Photolyase Repair:

S1 nuclease DSB:

Fig. 1. Outline of circle-damage-seq. (A) DNA containing damaged bases (red) is sonicated, circularized, and then treated with an exonuclease cocktail. (B) DNA lesions
(arrows) are processed into DNA double-strand breaks (DSB) by base excision repair enzymes and S1 nuclease. Sequencing adaptors are ligated to the breaks. (C) DNA
library preparation for DNA damage–specific sequence enrichment. Divergent paired-end reads with single-base gaps indicate the damaged base positions. (D) Structure
of a CPD at a TC sequence. (E) Method for mapping of UVB-induced CPDs at single-nucleotide resolution. (F) Divergent paired reads of CPD mapping by circle-damage-seq.
Purple and lavender segments represent reads mapped to the plus and minus strands, respectively. Green arrows indicate single-base gaps matching the 5′ base of a
pyrimidine dimer. ATP, adenosine triphosphate; DNase, deoxyribonuclease.

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 2 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

to increase background of the procedure. We carefully optimized T4-PDG) (21). This enzyme can also cleave abasic sites; therefore, it
each enzymatic step including enzyme concentrations, incubation is important to minimize depurination during DNA isolation steps.
temperatures, and incubation times to minimize nonspecific DNA CPD induction was confirmed by gel electrophoresis after cleavage
cleavage. Circle-damage-seq uniquely enables sequencing of both of DNA with T4-PDG and S1 nuclease. As shown in Fig. 1E, to create
sides of the single cleavage site in one DNA molecule using paired-end double-strand breaks at the CPD positions after T4-PDG incision,
sequencing. Closed circular DNA without any recognized damaged we first used AP endonuclease 1 (APE1) to cleave the 5′ side of the
bases is excluded from the breakage and ligation steps, resulting in sugar-phosphate backbone at pyrimidine dimers. We followed this
lesion-selective DNA library enrichment by polymerase chain reac- treatment by incubation with E. coli photolyase under long-wave
tion (PCR) (Fig. 1C). Notably, sequencing reads obtained from circle-­ UVA light to remove dimerized pyrimidine bases that remain after
damage-seq show a specific aligned read pattern, divergent reads the DNA glycosylase incision. Next, S1 nuclease treatment generated
with a single-base gap in the center representing the damaged DNA ligatable DNA double-strand breaks within the circularized DNA.
base, because each paired-end read is derived from a single opened The combined treatment creates a single-nucleotide gap representing
circular DNA molecule (Fig. 1C). Thus, circle-damage-seq provides the 5′ pyrimidine of the dimer. We aligned the sequencing reads from
high selectivity for DNA damage mapping at single-nucleotide circle-damage-seq to the human reference genome. We observed the
resolution with low background at reduced sequencing depth. This divergent paired reads with a single-base gap in the center and con-
method should be applicable for mapping of any type of DNA dam- firmed the matching bases in the gaps as pyrimidines that have another
age for which a DNA excision repair or cleavage enzyme is available. pyrimidine base in the 3′ position (Fig. 1F). We found this pattern with
We first tested the feasibility of circle-damage-seq by analyzing a high frequency at 5′TT and 5′TC sites and at lower frequency at
the endogenous DNA modification N6-methyladenine (N6mA) in 5′CT and 5′CC dinucleotide sequences, consistent with the known
Escherichia coli dam+ cells (fig. S1). DNA double-strand breakage at specificity of CPD formation at dipyrimidine sequences (10). Addi-
N6mA bases within circularized DNA was achieved through Dpn I tional examples of genomic CPD mapping are shown in fig. S2.
cleavage, which generates a blunt end by splitting the target DNA
between 5′GN6mA and TC sequences. N6mA could readily be Sequence specificity of CPD formation
mapped in E. coli (fig. S1). Previous work has focused mainly on the dinucleotide specificity of
genomic UV damage (10, 12, 13, 22–24). We now analyzed tri-
Genomic mapping of UVB-induced CPDs nucleotide sequences that reflect the sequence specificity of CPD for-
To map UVB-induced CPDs (Fig. 1D), the major mutagenic UVB-­ mation in human fibroblasts using ~100 million aligned, divergent
induced lesion at dipyrimidine sequences (14), we irradiated human read pairs with single-nucleotide gaps (Fig.  2). When normalized

Downloaded from https://www.science.org on September 19, 2021


skin fibroblasts with UVB and immediately isolated the DNA. We for the known trinucleotide content of the human genome, the
cleaved DNA at CPDs with T4 endonuclease V (also known as most frequent sequence contexts considering the base flanking the

A B
PyPyN3' 5'NPyPy
0.5 0.6
Event rate (relative to hg19 genome)
Event rate (relative to hg19 genome)

UVB (1000 J/m2) UVB (1000 J/m2)


0.5
0.4

0.4
0.3
0.3
0.2
0.2

0.1 0.1

0.0 0.0

0.02 0.02
Event rate (relative to hg19 genome)

Event rate (relative to hg19 genome)

NT NT

0.01 0.01

0.00 0.00

Fig. 2. Sequence context of CPD formation in human fibroblasts irradiated with UVB. (A) Distribution plot of trinucleotide sequences (PyPyN3′) undergoing CPD
formation in UVB-irradiated human cells (green). NT, nontreated control (red). Some background is seen in the NT samples at non-dipyrimidine sequences (the four trinu-
cleotides to the right). (B) Distribution plot of trinucleotide sequences (5′NPyPy) undergoing CPD formation in UVB-irradiated human cells (green). Data in (A) and (B) are
normalized for the trinucleotide frequencies of the human genome.

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 3 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

dimerized pyrimidines at the 3′ side were 5′TTA and 5′TCT (CPD Genome-wide mapping of cytosine-deaminated CPDs
underlined), followed by 5′TCA (Fig. 2A). Thus, T is preferred in CPDs are known to be the most prevalent and most mutagenic
the first position of the dimer, and T and A are the preferred bases UVB-induced DNA lesions (14, 10). CPDs containing cytosine are
3′ to the most abundant CPDs (5′TT and 5′TC). The most highly unstable. Owing to saturation of the 5,6 double bond of cytosine
enriched trinucleotide sequences considering the 5′ neighboring upon dimer formation (Fig. 1D), they are prone to undergo hydro-
base of the CPDs were 5′CTT and 5′CTC, followed by 5′TTC and lytic cytosine deamination to form uracil, which could be a potentially
5′TTT (Fig. 2B) showing a preference for having another pyrimidine mutagenic pathway (33–35). We next applied the circle-damage-seq
5′ to a pyrimidine dimer. Divergent reads with single-base gaps method to analyze the extent of CPD cytosine deamination. We
were >300 times less frequent in nontreated (NT) cells than in irradiated human fibroblast cells with UVB and harvested them
UVB-irradiated cells with no particular enrichment of sequencing 24 and 48 hours later to allow time for deamination. To specifically
reads associated with such trinucleotides in the control samples map the deaminated CPDs, we applied a photolyase-mediated re-
[Fig. 2, A and B (bottom)]. versal of the CPDs first, followed by the excision of U bases by uracil
We also analyzed the preferred tetranucleotide sequence con- DNA glycosylase (UDG) in the circle-damage-seq method (Fig. 4A).
texts for CPD formation (fig. S3), which were generally consistent This method successfully displays CPDs containing deaminated
with the trinucleotide patterns. The sequence enrichments for the cytosines at single-base resolution and shows cytosine in the single-­
5′NPyPyN tetranucleotide context (fig. S3C) were dominated by a base gap, as predicted (Fig. 4B and fig. S4A). With this method,
T in the second position and by having another pyrimidine as the deamination of CPDs that contain 5-methylcytosine cannot be
first base. This pattern is similar to one obtained after in vitro irra- detected because the deaminated base is thymine. We note that we
diation of DNA with UVB (25). also observed double-base gaps at CC bases, indicating likely double
To identify even broader possible sequence motifs, we obtained deamination events, but these were difficult to quantify because of
a base enrichment on 20-nt regions around the gapped base pair the unknown efficiency of UDG incision at UU sequences. There-
using ~100 million aligned, divergent read pairs (Fig. 3A). Position fore, we focused on the single C gaps.
11 represents the nucleotide in the gap between divergent reads (the The patterns of deaminated CPDs were largely independent of
5′ pyrimidine of the CPD). We observed a strong enrichment of total CPD patterns (fig. S4, B and C) because of their respective
5′TT or 5′TC contexts at positions 11 and 12, representing the differential sequence requirements. At a genomic scale, deaminated
pyrimidine dimers. In this motif, T or C was enriched at position CPDs were strongly depleted at many TSS regions and even more so
10, and T and A bases were enriched at position 13, consistent with than total CPDs (Figs.  4,  C  and  D,  and  5, A and B, and fig. S5).
the tri- and tetranucleotide patterns. Unexpectedly, no further Genes with CpG island (CGI) promoters showed a dip of total

Downloaded from https://www.science.org on September 19, 2021


sequence enrichment was observed outside of this tetranucleotide CPDs near the TSS, but this dip was missing in genes with non-CGI
context (positions 10 to 13). Our findings define a distinct tetra- promoters (Fig. 5, A and B). However, a lower level of deaminated
nucleotide sequence preference, 5′-Py-T<>Py-T/A, for the major type CPDs was still visible around the TSS, even in non-CGI promoters
of UVB damage in the human genome. This sequence is also found (Fig. 5B). Overall, levels of deaminated CPDs increased with time
at sites hypersensitive to UV radiation (24). (Fig. 4C). Since deamination of CPDs is strictly a chemical reaction,
substantial levels of deaminated CPDs were found at the 0-hour
Gene level distribution of CPDs time point because the deamination reaction is expected to contin-
Next, using ~36 million aligned, divergent read pairs, we examined ue during DNA processing in vitro. The reasons for the lower levels
the global patterns of CPDs, with emphasis on their distribution of deaminated CPDs at the TSS could be the reduced propensity of
along genes. CPD formation in cells is chiefly uniform along the TSS sequences to be transiently single stranded (which favors cyto-
genome and is largely dependent on DNA sequence, where 5′TY sine deamination), the shielding of these regions by general tran-
sequences show the strongest accumulation of CPDs. In the meta- scription factors, and/or the particularly rapid repair of these regions
gene profiles, we observed high levels of CPD formation immedi- (28, 36–38), which may proceed to some extent during the 48-hour
ately upstream of transcription start sites (TSSs) but a dip of CPDs deamination time span even at the relatively high UV dose that we
right around the GC-rich TSS sequences themselves at many gene used. At the TES, there is a spike of total CPDs but a dip of deami-
promoter regions (Fig. 3, B to D; see Fig. 3F and fig. S2 for specific nated CPDs, which is likely based on DNA sequence features, i.e.,
gene examples). This finding suggests that transcription factors at AT richness of these regions (Fig.  5C). The highest level of CPD
promoter upstream regions may enhance CPD formation at many formation and deaminated CPD formation was found at the regions
sites, consistent with previous studies (22, 24, 26–30), even in the immediately upstream of the TSS of CGI promoters (Fig. 5A). This
absence of modulation by DNA repair. Melanoma mutations are finding suggests that bound protein factors can sensitize DNA to
also sharply enriched just upstream of the TSS (31). UV light and promote mutagenesis (26–31, 39).
The metagene profile of CPD formation reveals a notable spike
of CPDs near the transcription end sites (TESs) of genes (Fig. 3E). Sequence specificity of cytosine-deaminated CPDs
Human and mouse genes have an AT-rich sequence context near and relationship to melanoma mutations
the TES that contains polyadenylation signals (AATAAA) and their Although TT and TC are the sequences most susceptible to CPD
surrounding sequences (32), which may lead to an enrichment of formation (Fig. 2, A and B, and fig. S3), the thymine positions of
CPDs containing thymine. However, a GC-rich sequence content is CPDs are not very mutagenic because of incorporation of adenine
only inversely correlated with CPD frequency right at the TSS. Just opposite to thymine by POLH (19) or other DNA polymerases.
upstream of the TSS, high GC content and CPD levels are correlated, However, the mechanism of how CPDs containing cytosines cause
directly pointing to a role of DNA-bound proteins in CPD induc- C-to-T mutations has remained unknown. Existing models include
tion rather than the DNA sequence itself. error-prone DNA polymerases that incorporate adenine across

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 4 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

A B
70

% G+C (hg19 reference)


60
Probability

50

40

30
–1.5 TSS TES 1.5 kb

C D E

2.10
2.10 2.20

2.05
CPD coverage

CPD coverage

CPD coverage
2.05
2.00 2.10

2.00
1.95

2.00
1.90 1.95

–1.5 TSS TES 1.5 kb –1.5 TSS 1.5 kb –1.5 TES 1.5 kb

Downloaded from https://www.science.org on September 19, 2021


Genes

Genes

Genes

–1.5 TSS TES 1.5 kb –1.5 TSS 1.5 kb –1.5 TES 1.5 kb
Gene distance (bp) Gene distance (bp) Gene distance (bp)

hg19 chr2:27,710,100-27,721,800

Fig. 3. CPD mapping by circle-damage-seq shows the distribution of CPDs at the sequence and gene level. (A) Nucleotide composition of mapped CPD positions.
Position 11 and 12 represent UVB-damaged dipyrimidine sequences undergoing CPD formation. At each base position, the height of each letter represents the relative
frequency of that nucleic acid base. (B) G+C sequence enrichment along human genes (hg19). (C) Heatmap and metagene profiles of CPD distribution along all genes of
the hg19 human genome. CPD coverage is sorted from high (top, blue) to low (bottom, red). The CPD signals were mapped and binned in 50-bp windows from 1.5 kb
upstream of the transcription start site (TSS) and then normalized relative to gene length over the gene bodies to the transcription end site (TES) and 1.5 kb downstream
of the TES. (D) Heatmap and profile of CPD distribution around the TSS and 1.5 kb of flanking sequence of all hg19 genes. (E) Heatmap and profile of CPD distribution
around the TES and 1.5 kb of flanking sequence of all hg19 genes. (F) Example of CPD sequence read distribution along several genes on chr2. There is a reduced frequency
of CPDs near the TSS (blue arrows).

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 5 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

A UDG
APE1
UVB damage Deamination Photolyase S1 nuclease
Sequencing
library

Circularization
Circle damage sequencing

B C
NT
T Deamination
Deami
minatio
n on (0 h) Deamination
Deamin
nation
on (24 h) Deamination
Deamina
nation
a on (48 h)
hg19: chr11

Deaminated CPD coverage


CCT –3.0 TSS 3.0 kb –3.0 TSS 3.0 kb –3.0 TSS 3.0 kb –3.0 TSS 3.0 kb
GCC TES TES TES TES
TCT
CCA

TCC TCA
Genes

TCA
TCA
CCA
CCT TCC

Downloaded from https://www.science.org on September 19, 2021


–3.0 TSS TES 3.0 kb –3.0 TSS TES 3.0 kb –3.0 TSS TES 3.0 kb –3.0 TSS TES 3.0 kb
Gene distance (bp) Gene distance (bp) Gene distance (bp) Gene distance (bp)

D hg19: chr11:12,009,675-12,040,000 (DKK3)


Total CPDs
Deaminated CPDs

Fig. 4. Mapping of deaminated CPDs at the sequence and gene level. (A) Outline of the method used for mapping of deaminated CPDs. The red “=” symbol indicates
a CPD. (B) Divergent paired reads of deaminated CPDs obtained by circle-damage-seq. Purple and lavender segments represent reads mapped to the plus and minus
strands, respectively. Green arrows indicate single-base gaps at cytosine bases matching the deaminated base of a pyrimidine dimer. The trinucleotide context of the
deaminated cytosine is indicated. (C) Heatmaps and metagene profiles of deaminated CPDs in UVB-irradiated human fibroblasts. Total signal is sorted from high to low
(top to bottom). Metagene profiles are shown for untreated cells (NT) and at 0, 24, and 48 hours following UVB irradiation. (D) Browser views of total CPDs (top) and cytosine-­
deaminated CPDs (bottom) along the DKK3 gene on chromosome 11 in human fibroblasts. Note the reduced levels of deaminated CPDs at the TSS containing a CGI (red bar).

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 6 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

$ %
TSSs with CpG islands TSSs without CpG islands
Total CPDs Deaminated CPDs Total CPDs Deaminated CPDs
1.40 1.30

1.4

1.20
Coverage

Coverage
1.30 1.2

1.0

1.10
1.20
0.8

–1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb

4.0
4.0

3.5
3.5

3.0
3.0

2.5
2.5
Genes

Genes
2.0
2.0

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0
–1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb –1.5 TSS 1.5 kb
Gene distance (bp) Gene distance (bp) Gene distance (bp) Gene distance (bp)

&
Total CPDs Deamination CPDs

Downloaded from https://www.science.org on September 19, 2021


1.0
Deaminated CPD coverage

2.2
0.9
CPD coverage

2.1 0.8

0.7
2.0

0.6
–1.5 TES 1.5 kb –1.5 TES 1.5 kb

3
Genes

Genes

0
–1.5 TES 1.5 kb –1.5 TES 1.5 kb
Gene distance (bp) Gene distance (bp)

Fig. 5. Levels of total CPDs and deaminated CPDs at TSSs and TESs. (A) Human genes with G+C-rich promoters (CGIs) show low frequencies of total CPDs and deaminated
CPDs near TSS (red arrows) and increased levels just upstream of the TSS. (B) Human genes without CGI promoters do not show CPD depletion near the TSS but still show
reduced levels of deaminated CPDs near the TSS albeit at a lesser extent than at CGIs (see scales). (C) Heatmap and metagene profile of total CPDs and deaminated CPDs
near the TES ± 1.5 kb in human UVB-irradiated fibroblasts. The data in this figure were prepared using 100 million aligned, divergent read pairs with single-base gaps.

C-containing CPDs (the A-rule). A competing model proposes that aligned reads and observed a prominent enrichment of TCC, TCA,
cytosines within CPDs first undergo deamination to form uracil, be- TCT, CCT, and CCC sequences (Fig. 6, B to D). The trinucleotide
fore “error-free” polymerase bypass, e.g., by POLH, incorporating data also suggest that cytosine needs to be in the second position of a
adenines across the deaminated, uracil-containing CPDs (Fig. 6A). CPD to score as a prominent deaminated CPD (Fig. 6, C and D; see for
To evaluate the deamination model, we next determined the tri- example the much higher levels of deaminated CPDs at trinucleotides
nucleotide sequence preference of deaminated CPDs using ~100 million beginning with 5′TC versus those beginning with 5′CT). A broader

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 7 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

A B

UVB :

Probability
Deamination :

Bypass by Pol eta :

Mutation (C T) :

C
NPyPy for total CPDs
NPyN for deaminated CPDs and mutation
Percentage of trinucleotide sequences

Downloaded from https://www.science.org on September 19, 2021


Total CPDs Deaminated CPDs Mutation signature SBS7

D E
PyPyN for total CPDs, deaminated CPDs,
and mutation
Percentage of trinucleotide sequences

Total CPDs Deaminated CPDs Mutation signature SBS7

Fig. 6. Sequence specificity of deaminated CPDs in the UVB-irradiated human genome and melanoma mutations. (A) Outline of CPD deamination and potential
mutagenic consequences. (B) Sequence context of the occurrence of deaminated CPDs at 48 hours following UVB irradiation. Position 11 is the deaminated cytosine.
(C) Trinucleotide sequence specificity of total CPD formation, formation of deaminated CPDs, and the prominent mutational signature 7 (SBS7) in melanoma. For total CPDs,
we show the sequences in the context of 5′N-dipyrimidine (NPyPy; blue line plot, ranked according to levels; CPDs at NPyPy are underlined). For deaminated CPDs and
mutations, the NPyN (n = 32) trinucleotide context is shown (green and red bars, respectively). (D) Trinucleotide sequence specificity of total CPD formation, formation of
deaminated CPDs, and the mutation signature SBS7 in melanoma genomes. We show the sequences in the context 5′dipyrimidine-N (PyPyN) (n = 16). Data in (B) to (D)
are not normalized for total genomic trinucleotide frequencies (see also fig. S6). (E) Heatmap showing the cosine similarity scores for total CPDs (NPyPy, n = 32), deami-
nated CPDs, and the prominent mutational signature 7 (SBS7/SBS7a/SBS7b) in melanoma along with the 30 COSMIC v2 mutational signatures. We only used the C-to-T
and T-to-C mutation windows for comparisons with the COSMIC trinucleotide signatures. Dark blue colors indicate high similarity (see also fig. S7).

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 8 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

sequence analysis defined the preferred sequence context of deami- Other methods are available to accomplish DNA damage mapping
nated CPD formation as 5′-Py-C-T/A/C (deaminated C is underlined) by next-generation sequencing (22, 24, 37, 43). The advantages of
(Fig. 6B). These data were not corrected for trinucleotide frequencies circle-damage-seq over comparable DNA damage mapping tech-
of the human genome, which therefore leads to an underestimation niques are several-fold. (i) The single incision point of the modified
of the relative event frequencies at TCG or CCG sequences, but see base and the divergent reads emanating from this break position give
fig. S6 for the normalized values. The common occurrence of deam- a clear indication of where the damage is located. (ii) Background
inated CPDs at these preferred sequence contexts can at least partially signals are minimized by removal of DNA molecules that did not
be attributed to high levels of CPDs forming at the sequences, in par- undergo circularization. (iii) Most single-strand breaks that also
ticular at 5′TC (Fig. 2). This difference was also observed in an in vitro contribute to background are removed during circular ligation. (iv)
study of flanking sequence effects on CPD deamination and was The method represents DNA damage–specific sequencing because DNA
attributed to a more facile attack of water at the 3′-C than at the 5′-C circles or linear molecules without damaged bases are not sequenced,
because of greater steric interference from pi stacking at the 5′-C (40). thus contributing to lower background and lower sequencing costs.
Notably, the deaminated CPD trinucleotide distribution rep- The circle-damage-seq method depends on the ability to convert
resents an excellent match with the mutated trinucleotide sequences the modified base into a DNA single-strand break. Cleavage with S1
from melanoma genomes, particularly in its similarity to single-base nuclease on the opposite DNA strand then creates a damage-specific
substitution signature 7a (SBS7a) or the classical COSMIC (v2) mu- ligatable double-strand break. DNA glycosylase enzymes, which
tation signature 7 (SBS7; cosine similarities between 0.83 and 0.85) operate at the initial step of the base excision repair pathway, can
(Fig. 6E and fig. S7) (1, 23). These signatures are highly enriched in easily be adapted for this technique. By choosing the appropriate DNA
melanoma genomes, where the classical signature 7 appears like a glycosylase, for example, 8-oxoguanine DNA glycosylase (OGG1) for
composite of SBS7a and SBS7b. The similarity between the deami- 8-oxoguanine or alkyladenine DNA glycosylase (AAG) for alkylated
nation signature and SBS7b was somewhat lower (cosine similarity adenines, specific types of base damage can be mapped. The nucle-
0.78 to 0.79). The most enriched trinucleotide contexts for both the otide excision repair (NER) complex could be used for more bulky
deaminated CPDs and the melanoma mutation datasets were TCC, DNA lesions (44). The longer single-stranded gap remaining after
TCA, TCT, CCC, CCA, and CCT (Fig. 6, C and D). The sequences TCG excision with NER enzymes should be a good substrate for S1 nu-
and CCG represent a special case; they cannot be mapped in their clease, leading to an 11-nt gap for each divergent read pair after
deaminated form by our approach using uracil excision when they circle-damage-seq, for example, when benzo[a]pyrene DNA ad-
originally contain 5-methylcytosine, which deaminates to thymine, ducts are excised with the UvrABC complex from DNA (45, 46). It
and their levels are therefore underestimated. Using in vitro studies, it should also be feasible to map rare endogenous DNA bases with

Downloaded from https://www.science.org on September 19, 2021


was found that a G flanking the 3′-side of a C in a CPD greatly ac- respective cleavage enzymes, for example, 5-formylcytosine and
celerated the deamination of CPDs, followed second by a 3′-A (41). 5-carboxylcytosine with thymine DNA glycosylase or 6-methylade-
On the other hand, total CPDs, because of their enrichment with nine with Dpn I at GATC sequences (fig. S1).
T-containing dimers, did not correlate well with the mutational sig- Using circle-damage-seq, we derived a genome-wide tetranucleotide
natures (Fig. 6, C and D) (cosine similarity between 0.10 and 0.16; consensus sequence for CPD formation, (5′PyPy<>PyT/A) with
Fig. 6E and fig. S7). Other cosine similarities of higher values were unexpectedly no further sequence enrichment outside of this con-
noted for SBS2, which is related to apolipoprotein B mRNA editing text. However, this consensus sequence pattern alone did not match
enzyme, catalytic polypeptide-like (APOBEC)-induced cytosine with the sequences commonly mutated in melanomas. We there-
deamination (not prevalent in melanoma), SBS11, which is due to fore focused on the analysis of a mutagenesis pathway that proposes
temozolomide treatment of patients with glioma, or SBS30, thought deamination of cytosine within CPDs as a major premutagenic
to be due to rare mutations in the DNA repair enzyme NTHL1 (42). mechanism. We were able to specifically detect and map cytosine-­
We also focused on the dinucleotide specificity of deaminated deaminated CPDs by developing a variation of circle-damage-seq
CPD formation after considering only those positions in which the that uses photolyase reversal of the CPD and uracil excision as the
cytosine can be assigned unambiguously to only one position of the initial steps. Although uracil-containing CPDs are known to exist
dimer by requiring a purine as the flanking base (e.g., 5′PuCT and in vitro and in vivo after UV irradiation (33–35), their relevance for
5′PuCC versus 5′TCPu and 5′CCPu) (fig. S8). These cases can dis- UV mutagenesis has remained unclear. In vitro, synthetic CPDs
tinguish deamination at the 5′C of a dimer from deamination of the containing uracil can be bypassed with incorporation of adenine by the
3′C of a dimer. The frequency of CPDs with 3′C deamination is lesion-tolerant DNA polymerase eta (47). If a similar bypass reaction
about three times greater than that of CPDs with 5′C deamination, occurs in vivo, perhaps catalyzed also by other DNA polymerases,
and melanoma C-to-T mutations are also three to four times more the mutation occurs by deamination and not by a polymerase error.
common at the 3′C position (fig. S8A). Note, in this context, that both the yeast and human forms of the
In conclusion, the sequence-specific distribution of deaminated Pol Eta enzyme insert G opposite 5-methylcytosine in a CPD with
CPDs in the human genome presents an excellent match with the greater than 100:1 selectivity in vitro, favoring the idea that the C
mutational signature of melanoma genomes, thus providing strong has to deaminate to U to efficiently cause the insertion of A (48).
mechanistic support for a biochemical/biophysical pathway by which The trinucleotide sequence patterns of CPD cytosine deamina-
UVB mutagenesis leads to cancer mutations. tion and the C-to-T mutation patterns in melanoma provided an
excellent match (Fig. 6), supporting the deamination mutagenesis
model. The concept of CPD deamination being the major premuta-
DISCUSSION genic step is particularly appealing for a lesion that is repaired
Our results are based on the development of a new method for map- inefficiently in most genomic regions and for human skin cells,
ping of DNA damage at base-level resolution in the human genome. which divide slowly, allowing sufficient time for deamination to

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 9 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

occur before DNA replication. Since cytosine deamination in CPDs DNA ligase (NEB) in a final volume of 200 l at 16°C overnight.
is strictly a chemical reaction, it will be difficult to prevent it from Then, the reaction mixture was cleaned up with DCC-5 kit, and the
occurring, although CPD removal by skin-penetrating CPD-specific DNA was eluted in 42 l of 10 mM tris-HCl (pH 7.5). To remove
DNA repair enzymes appears to be a reasonable concept, as pro- noncircularized linear DNA from the circularized DNA pool, the
posed earlier (49), and these enzymes work on uracil-containing ligase-treated DNA was incubated with 2 l (10 U/l) of Plasmid-Safe
CPDs. In summary, our data support a specific mutational mecha- ATP-Dependent DNase (Lucigen; Middleton, WI) in 1× reaction
nism of how a prevalent form of sunlight-induced DNA damage buffer [33 mM tris-acetate (pH 7.5), 66 mM potassium acetate, 10 mM
induces specific types of mutations found at high frequency in magnesium acetate, 0.5 mM DTT, and 1 mM ATP] in a final
human skin cancer genomes. volume of 50 l at 37°C for 30 min and then at 70°C for 20 min to
stop the reaction. Then, the DNA was purified with 90 l (1.8×) of
AMPure XP beads (Beckman Coulter; Indianapolis, IN) and eluted
METHODS in 35 l of 10 mM tris-HCl (pH 8.0).
Cell culture Cleavage of DNA at CPD sites
Human skin fibroblasts cells were obtained from the American To generate DNA double-strand breaks at CPD sites, the circular-
Type Culture Collection (ATCC; catalog no. PCS-201-012) and ized DNA was treated with the following DNA repair enzymes.
grown using the Fibroblast Growth Kit (ATCC, catalog no. PCS- First, to specifically incise DNA at CPD positions, the circularized
201-041) supplemented with 2% fetal bovine serum. The cells were DNA was incubated in 1× NEBuffer 4 reaction buffer (NEB) [50 mM
used at passage number <5. potassium acetate, 20 mM tris-acetate (pH 7.9), 10 mM magnesium
acetate, and 1 mM DTT] and 0.4 l of bovine serum albumin
UVB irradiation and genomic DNA isolation (20 mg/ml) with 1 l (10 U/l) of T4-PDG (NEB) and 1 l (10 U/l)
For mapping of UVB-induced CPDs, fibroblasts at 80 to 90% of APE1 (NEB) in a final volume of 40 l at 37°C for 20 min. Then,
confluence on 10-cm culture plates were washed with 1× phosphate-­ the DNA was cleaned up with 72 l (1.8×) of AMPure XP beads and
buffered saline (PBS) and were irradiated in PBS with a UVB dose eluted in 25 l of 10 mM tris-HCl (pH 8.0). To revert the dimerized
of 1000 J/m2. The UVB lamp (Thermo Fisher Scientific, UVP 3UV pyrimidines remaining after T4-PDG incision, the DNA from above
lamp) had a peak spectral emission at 302 nm. The UVB dose was was treated with 3 l (0.25 g/l) of E. coli phrB photolyase (Novus
determined using a UVX radiometer with a UVB probe (Ultraviolet Biologicals; Centennial, CO) in 1× reaction buffer [50 mM tris-HCl
Products; Upland, CA). After irradiation, the cells were immediate- (pH 7.0), 50 mM NaCl, and 10 mM DTT] in a final volume of 50 l.
ly trypsinized and collected, and then, genomic DNA was isolated The reaction tube was placed at a distance of 15 cm under a UVA

Downloaded from https://www.science.org on September 19, 2021


using Quick-DNA Miniprep Plus Kit (Zymo Research; Irvine, CA) lamp (Thermo Fisher Scientific) that has a peak spectral emission at
according to the manufacturer’s instruction manual. For mapping 365 nm and incubated under UVA illumination for 90 min at room
deaminated CPDs, we replenished the culture medium of the cells temperature. Then, the DNA was cleaned up with 90 l (1.8×) of
after UVB radiation with 4000 J/m2 and incubated the cells for 24 or AMPure XP beads and eluted in 32 l of 10 mM tris-HCl (pH 8.0).
48 hours at 37°C before harvesting and DNA isolation. To cleave the opposite strand of the nicked DNA, the DNA from
above was incubated with 1 l (5 U/l) of single-strand specific
Circle-damage-seq mapping of CPDs S1 nuclease (Thermo Fisher Scientific) in 1× reaction buffer [40 mM
DNA preparation and circularization sodium acetate (pH 4.5), 300 mM NaCl, and 2 mM ZnSO4] in a
To prepare fragmented genomic DNA, the DNA from UVB-irradiated final volume of 40 l for 4 min at room temperature. We stopped
cells was sheared at 4°C to an average length of 300 to 400 bp by the reaction by adding 2 l of 0.5 M EDTA and 1 l of 1 M tris-HCl
sonication with a Covaris E220 sonicator (Covaris; Woburn, MA) (pH 8.0) to the reaction mixture and further incubated the mix-
under the following conditions: peak incident power (W), 140; duty ture for 10 min at 70°C. The DNA sample was cleaned up with
factor, 10%; cycles per burst, 200 times, 80 s; then, the frag- 72 l (1.8×) of AMPure XP beads and eluted in 48 l of 10 mM
mented DNA was purified with DNA Clean & Concentrator-5 Kit tris-HCl (pH 8.0).
(Zymo Research, DCC-5) according to the manufacturer’s proto- Cleavage of DNA at deaminated CPD sites
col. DNA was eluted in 42 l of 10 mM tris-HCl (pH 7.5). Next, the To generate double-strand breaks at deaminated CPD sites, the cir-
sonicated DNA was end-repaired to prepare blunt-ended DNA. One cularized DNA was initially treated with 3 l (0.25 g/l) of E. coli
microgram of the sonicated DNA was incubated in 1× T4 DNA phrB photolyase, as described above to revert the dimerized pyrim-
ligase buffer [New England Biolabs (NEB); Ipswich, MA] with the idines. Then, the DNA was cleaned up with 90 l (1.8×) of AMPure
following components: 50 mM tris-HCl (pH 7.5), 10 mM MgCl2, XP beads and eluted in 35 l of 10 mM tris-HCl (pH 8.0). To specifi-
1 mM adenosine triphosphate (ATP), 10 mM dithiothreitol (DTT), cally incise DNA at deaminated CPD positions, the circularized
1 l (2 U/l) of RNase H (NEB), 1 l (3 U/l) of T4 DNA polymerase DNA was incubated in 1× NEBuffer 4 reaction buffer (NEB) with
(NEB), 1 l (10 U/l) of T4 polynucleotide kinase (NEB), and 1 l of 1 l (5 U/l) of UDG (NEB) and 1 l (10 U/l) of APE1 (NEB) in a
10 mM deoxynucleoside triphosphates (dNTPs) in a final volume of final volume of 40 l at 37°C for 20 min. Then, the DNA was cleaned
50 l at 24°C for 30 min; then, the reaction was heat-inactivated by up with 72 l (1.8×) of AMPure XP beads and eluted in 32 l of
further incubation at 70°C for 20 min. The DNA was purified using 10 mM tris-HCl (pH 8.0). To cleave the opposite strand of the
DCC-5 Kit (Zymo Research) and eluted in 100 l of 10 mM tris- nicked DNA, the DNA was then incubated with 1 l (5 U/l) of
HCl (pH 7.5). The purified DNA was quantitated using NanoDrop single-strand specific S1 nuclease and cleaned up as described above.
(Thermo Fisher Scientific). Adapter ligation
To circularize the DNA, 1 g of the blunt-ended DNA was incu- The double-strand cleaved DNA was subsequently A-tailed in 1×
bated in 1× T4 DNA ligase buffer (NEB) with 4 l (400 U/l) of T4 NEBNext dA-tailing reaction buffer (NEB) [10 mM tris-HCl (pH 7.9),

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 10 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

10 mM MgCl2, 50 mM NaCl, and 1 mM DTT] and 0.2 mM dATP, double-­strand breaks at Dpn I cleavage sites, the circularized E. coli
with 2 l (5 U/l) of Klenow fragment exo- (NEB) in a final volume DNAs were incubated in 1× CutSmart buffer (NEB) with 1 l (20 U/l)
of 55 l by incubation at 37°C for 30 min, and then, the reaction was of Dpn I restriction enzyme (NEB) in a final volume of 20 l at 37°C
heat-inactivated by further incubation at 70°C for 30 min. To per- for 1 hour. Then, circle-damage-seq library preparation and ampli-
form adaptor ligation, 30 l of NEBNext Ultra II Ligation Master fication by PCR were performed, as described above. The circle-­
Mix, 1 l of NEBNext Ligation Enhancer (NEB), and 3 l of 1.5 M damage-seq library was sequenced as 75-bp paired-end reads on an
NEBNext adaptors for Illumina sequencing (NEB) [5′-phos-GATC- Illumina NextSeq500 platform to obtain ~15 million reads.
GGAAGAGCACACGTCTGAACTCCAGTC/ideoxyU/
ACACTCTTTCCTACACGACGCTCTTCCGATC*T-3′ and Bioinformatics analysis
5′-phos-GATCGGAAGAGCACACGTCTGAACTCCAGTC/ideoxyU/ All libraries were hard-clipped from the 3′ end to 2 × 80 nucleotide
ACACTCTTTCCTACACGACGCTCTTCCGATC*C-3′ (*; phos- length using Trim_Galore (www.bioinformatics.babraham.ac.uk/
phorothioate linkage)] were added to the dA-tailing reaction mix- projects/trim_galore/). Reads were then adapter-trimmed and quality-­
ture in a total volume of 93 l and incubated at 20°C for 15 min, and trimmed (phred < 20) in a single pass using Trim_Galore default
this reaction step was followed by incubation with 3 l (1 U/l) of parameters. Trimmed reads were aligned to the hg19 reference
USER enzyme (NEB) at 37°C for 20 min. Then, the adaptor-ligated genome using BWA-MEM version 0.7.17 (50) with default param-
circle-damage-seq library was cleaned up with 96 l (1×) of AMPure eters. Duplicate alignments were marked and removed with the
XP beads and size-selected (300 to 1000 bp) with 0.5 volumes of “removeDups” SAMBLASTER option (51). SAMtools version 1.11
AMPure XP beads, and the DNA was eluted in 25 l of 10 mM (52) was then used to remove reads with low mapping quality
tris-HCl (pH 8.0). (MAPQ < 20) and to sort reads by genomic coordinates.
Library amplification and sequencing Divergently aligned read pairs (pairs facing away from each
To amplify the circle-damage-seq library by PCR, the eluted library other) with a single-nucleotide gap between reads were selected by
was mixed with 25 l of NEBNext Ultra II Q5 Master Mix (2×), 2.5 l selecting SAM records with TLEN of 3. TLEN equals the distance
of 10 M NEBNext i5, and 2.5 l of 10 M NEBNext i7 primers between the mapped end of the template and the mapped start of
(NEB) in a final volume of 50 l. The amplification reaction mixture the template. Records with TLEN = 3 represent divergently aligned
was incubated at 98°C for 30 s, and then, 12 cycles of PCR at 98°C reads with one nucleotide between mates. Sequences for the three
for 10 s and 65°C for 75 s were performed, followed by a final exten- nucleotides representing the last nucleotide of each read and the
sion step at 65°C for 5 min. The PCR product was purified with gapped base were identified using BEDtools (53). To visualize the
50 l (1×) of AMPure XP beads and eluted in 20 l of 10 mM tris- reads, bam files were produced by selecting SAM records with

Downloaded from https://www.science.org on September 19, 2021


HCl (pH 8.0). The purified library was quantitated using a Qubit TLEN of 3 and visualized with the Integrative Genomics Viewer
3.0 fluorometer and dsDNA HS Assay kit (Thermo Fisher Scientific). version 2.8.9. The tracks were displayed with a “view as pairs” option
The size distribution of the circle-damage-seq library was determined to show pairs together with a line joining the ends.
using Bioanalyzer (Agilent; Santa Clara, CA), and the circle-damage-­ Single-base substitution mutational signatures associated with
seq library was quantified using KAPA Library Quantification kit skin cancer (SBS7/SBS7a/SBS7b) were obtained from the COSMIC
(Kapa Biosystems) according to the manufacturer’s instruction. database (http://cancer.sanger.ac.uk/cosmic/signatures/SBS/). Nor-
The circle-damage-seq library was sequenced as 150-bp paired-end malized forms of trinucleotide sequence signatures were presented
sequencing runs on an Illumina HiSeq 2500 platform to obtain be- by dividing the numbers by the relative abundance of the hg19
tween 10 million and 800 million reads. From the 800 million read genome trinucleotide frequencies.
samples, approximately 200 million paired aligned reads were re- When treating cells with UVB to produce CPDs, if the gapped
tained specifically as divergent read pairs with single-base gaps after nucleotide was T or C in the (+) strand, the damage would have
removal of duplicates. occurred on the (+) strand, whereas a gapped A or G in the (+)
strand would indicate that the damage must have occurred on the
Circle-damage-seq mapping of Dpn I cuts in the  (−) strand. For deaminated CPDs, the gapped nucleotide would be
E. coli genome a cytosine on the (+) strand that is flanked by a pyrimidine on either
As a pilot experiment, we initially performed the mapping of Dpn I side. When there is a G as the gapped nucleotide, the damage would
cut sites to optimize the circle-damage-seq method. To test the have occurred on the (−) strand and another purine is seen as a
sensitivity of circle-damage-seq toward different percentages of Dpn flanking base. The Dpn I treatment protocol probes 6mA formed at
I–cleaved sites, we mixed E. coli DNAs from two different strains, 5′GATC sequences from dam(+) E. coli, producing sequencing
dam(+) and dam(−) to prepare 0, 2, and 10% of dam(+) DNA in libraries with no gap between divergently aligned read pairs (fig. S1).
a background of dam(−) input DNA. We subjected this mixture Hence, alignment records with TLEN = 2 were selected.
to the circle-damage-seq procedure, as described above. Briefly, Chromosome name, start position, stop position, and inferred
to prepare fragmented DNAs, Rsa I and Alu I restriction enzymes strand (+/−) for all selected read pair loci were written to a BED file,
(NEB) that leave blunt ends were used to digest 1 g of E. coli sorted, and indexed with tabix (54) for downstream analyses. Logo
genomic DNA. The cleaved DNAs were subjected to A-tailing using plots were drawn using ggseqlogo (55) or Seq2Logo (www.cbs.dtu.
a NEBNext Ultra II End Prep kit (NEB) and followed by ligation of dk/biotools/Seq2Logo/) for fig. S6. “bamCoverage,” “computeMatrix,”
an annealed hairpin adaptor (5′-phos-CGGTGGACCGATGATC/ and “plotHeatmap” programs in deepTools (56) were used to
ideoxyU/ATCGGTCCACCG*T-3′) using NEBNext Ultra II Ligation generate heatmap plots of genomic regions. Gene-centered analysis
Master Mix. The adaptor-ligated DNAs were treated with USER included 1.5 kb upstream of the TSS and 1.5 kb downstream of the
enzyme and circularized with T4 DNA ligase, followed by treat- TES with average coverage calculated in non-overlapping 50-nt
ment with Plasmid-Safe ATP-Dependent DNase. To generate bins and normalized to gene length.

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 11 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

Cosine similarity analyses were performed using R (version 4.0.2) 12. D. E. Brash, UV signature mutations. Photochem. Photobiol. 91, 15–26 (2015).
package “coop” www.R-project.org/ (57). Since we have only 32 or 13. J. Cadet, A. Grand, T. Douki, Solar UV radiation-induced DNA bipyrimidine photoproducts:
Formation and mechanistic insights. Top. Curr. Chem. 356, 249–275 (2015).
16 combinations of trinucleotides for CPDs and deamination avail-
14. Y. H. You, D. H. Lee, J. H. Yoon, S. Nakajima, A. Yasui, G. P. Pfeifer, Cyclobutane pyrimidine
able, we could only use the C-to-T and T-to-C mutation windows dimers are responsible for the vast majority of mutations induced by UVB irradiation
for the NPyPy (n  =  32) and the C-to-T windows for the PyPyN in mammalian cells. J. Biol. Chem. 276, 44688–44694 (2001).
(n  =  16) to compare with the COSMIC trinucleotide signatures. 15. M. F. Laughery, A. J. Brown, K. A. Bohm, S. Sivapragasam, H. S. Morris, M. Tchmola,
Other mutations, such as C to G, C to A, T to G, and T to A, were A. D. Washington, D. Mitchell, S. Mather, E. P. Malc, P. A. Mieczkowski, S. A. Roberts,
J. J. Wyrick, Atypical UV photoproducts induce non-canonical mutation classes
ignored. This explains some level of similarity between, for example, associated with driver mutations in melanoma. Cell Rep. 33, 108401 (2020).
SBS4 and SBS7, which are otherwise mostly unrelated (SBS4 is 16. A. Besaratinia, J. I. Yoon, C. Schroeder, S. E. Bradforth, M. Cockburn, G. P. Pfeifer,
dominated by C-to-A mutations). Wavelength dependence of ultraviolet radiation-induced DNA damage as determined by
laser irradiation suggests that cyclobutane pyrimidine dimers are the principal DNA
Data availability lesions produced by terrestrial sunlight. FASEB J. 25, 3079–3091 (2011).
17. J. Jans, W. Schul, Y.-G. Sert, Y. Rijksen, H. Rebel, A. P. M. Eker, S. Nakajima, H. van Steeg,
All data are available from the GEO database (accession number F. R. de Gruijl, A. Yasui, J. H. Hoeijmakers, G. T. J. van der Horst, Powerful skin cancer
GSE159807). protection by a CPD-photolyase transgene. Curr. Biol. 15, 105–115 (2005).
18. B. S. Strauss, The "A" rule revisited: Polymerases as determinants of mutational specificity.
DNA Repair (Amst) 1, 125–135 (2002).
Code availability 19. R. E. Johnson, S. Prakash, L. Prakash, Efficient bypass of a thymine-thymine dimer by yeast
All code for this project is available at https://github.com/ DNA polymerase, Poleta. Science 283, 1001–1004 (1999).
deanpettinga/CDseq. 20. C. Masutani, R. Kusumoto, A. Yamada, N. Dohmae, M. Yokoi, M. Yuasa, M. Araki, S. Iwai,
K. Takio, F. Hanaoka, The XPV (xeroderma pigmentosum variant) gene encodes human
SUPPLEMENTARY MATERIALS DNA polymerase eta. Nature 399, 700–704 (1999).
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/ 21. E. H. Radany, E. C. Friedberg, A pyrimidine dimer-DNA glycosylase activity associated
content/full/7/31/eabi6508/DC1 with the v gene product of bacterophage T4. Nature 286, 182–185 (1980).
22. P. Mao, M. J. Smerdon, S. A. Roberts, J. J. Wyrick, Chromosomal landscape of UV damage
View/request a protocol for this paper from Bio-protocol.
formation and repair at single-nucleotide resolution. Proc. Natl. Acad. Sci. U.S.A. 113,
9057–9062 (2016).
23. M. Lindberg, M. Bostrom, K. Elliott, E. Larsson, Intragenomic variability and extended
REFERENCES AND NOTES
sequence patterns in the mutational signature of ultraviolet light. Proc. Natl. Acad. Sci. U.S.A.
1. L. B. Alexandrov, S. Nik-Zainal, D. C. Wedge, S. A. J. R. Aparicio, S. Behjati, A. V. Biankin,
116, 20411–20417 (2019).
G. R. Bignell, N. Bolli, A. Borg, A.-L. Børresen-Dale, S. Boyault, B. Burkhardt, A. P. Butler,
24. S. Premi, L. Han, S. Mehta, J. Knight, D. Zhao, M. A. Palmatier, K. Kornacker, D. E. Brash,
C. Caldas, H. R. Davies, C. Desmedt, R. Eils, J. E. Eyfjörd, J. A. Foekens, M. Greaves,
Genomic sites hypersensitive to ultraviolet radiation. Proc. Natl. Acad. Sci. U.S.A. 116,

Downloaded from https://www.science.org on September 19, 2021


F. Hosoda, B. Hutter, T. Ilicic, S. Imbeaud, M. Imielinski, N. Jäger, D. T. W. Jones, D. Jones,
24196–24205 (2019).
S. Knappskog, M. Kool, S. R. Lakhani, C. López-Otín, S. Martin, N. C. Munshi, H. Nakamura,
25. C. Lu, N. E. Gutierrez-Bayona, J. S. Taylor, The effect of flanking bases on direct and triplet
P. A. Northcott, M. Pajic, E. Papaemmanuil, A. Paradiso, J. V. Pearson, X. S. Puente, K. Raine,
sensitized cyclobutane pyrimidine dimer formation in DNA depends on the dipyrimidine,
M. Ramakrishna, A. L. Richardson, J. Richter, P. Rosenstiel, M. Schlesner, T. N. Schumacher,
wavelength and the photosensitizer. Nucleic Acids Res. 49, 4266–4280 (2021).
P. N. Span, J. W. Teague, Y. Totoki, A. N. J. Tutt, R. Valdés-Mas, M. M. van Buuren,
L. van ‘t Veer, A. Vincent-Salomon, N. Waddell, L. R. Yates; Australian Pancreatic Cancer 26. G. P. Pfeifer, R. Drouin, A. D. Riggs, G. P. Holmquist, Binding of transcription factors
Genome Initiative; ICGC Breast Cancer Consortium; ICGC MMML-Seq Consortium; ICGC creates hot spots for UV photoproducts in vivo. Mol. Cell. Biol. 12, 1798–1804 (1992).
Ped Brain, J. Zucman-Rossi, P. Andrew Futreal, U. M. Dermott, P. Lichter, M. Meyerson, 27. S. Tornaletti, G. P. Pfeifer, UV light as a footprinting agent: Modulation of UV-induced
S. M. Grimmond, R. Siebert, E. Campo, T. Shibata, S. M. Pfister, P. J. Campbell, M. R. Stratton, DNA damage by transcription factors bound at the promoters of three human genes.
Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). J. Mol. Biol. 249, 714–728 (1995).
2. B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, L. A. Diaz Jr., K. W. Kinzler, 28. J. Hu, O. Adebali, S. Adar, A. Sancar, Dynamic maps of UV damage formation and repair
Cancer genome landscapes. Science 339, 1546–1558 (2013). for the human genome. Proc. Natl. Acad. Sci. U.S.A. 114, 6758–6763 (2017).
3. T. Helleday, S. Eshtad, S. Nik-Zainal, Mechanisms underlying mutational signatures 29. K. Elliott, M. Bostrom, S. Filges, M. Lindberg, J. Van den Eynden, A. Stahlberg, A. R. Clausen,
in human cancers. Nat. Rev. Genet. 15, 585–598 (2014). E. Larsson, Elevated pyrimidine dimer formation at distinct genomic bases underlies
4. I. Martincorena, P. J. Campbell, Somatic mutation in cancer and normal cells. Science 349, promoter mutation hotspots in UV-exposed cancers. PLOS Genet. 14, e1007849 (2018).
1483–1489 (2015). 30. P. Mao, A. J. Brown, S. Esaki, S. Lockwood, G. M. K. Poon, M. J. Smerdon, S. A. Roberts,
5. G. P. Pfeifer, How the environment shapes cancer genomes. Curr. Opin. Oncol. 27, 71–77 J. J. Wyrick, ETS transcription factors induce a unique UV damage signature that drives
(2015). recurrent mutagenesis in melanoma. Nat. Commun. 9, 2626 (2018).
6. L. B. Alexandrov, Y. S. Ju, K. Haase, P. Van Loo, I. Martincorena, S. Nik-Zainal, Y. Totoki, 31. D. Perera, R. C. Poulos, A. Shah, D. Beck, J. E. Pimanda, J. W. H. Wong, Differential DNA
A. Fujimoto, H. Nakagawa, T. Shibata, P. J. Campbell, P. Vineis, D. H. Phillips, M. R. Stratton, repair underlies mutation hotspots at active promoters in cancer genomes. Nature 532,
Mutational signatures associated with tobacco smoking in human cancer. Science 354, 259–263 (2016).
618–622 (2016). 32. B. Tian, J. Hu, H. Zhang, C. S. Lutz, A large-scale analysis of mRNA polyadenylation
7. T. A. Rosenquist, A. P. Grollman, Mutational signature of aristolochic acid: Clue of human and mouse genes. Nucleic Acids Res. 33, 201–212 (2005).
to the recognition of a global disease. DNA Repair (Amst) 44, 205–211 (2016). 33. R. B. Setlow, W. L. Carrier, F. J. Bollum, Pyrimidine dimers in UV-irradiated poly dI:dC.
8. F. R. de Gruijl, H. Rebel, Early events in UV carcinogenesis—DNA damage, target cells Proc. Natl. Acad. Sci. U.S.A. 53, 1111–1118 (1965).
and mutant p53 foci. Photochem. Photobiol. 84, 382–387 (2008). 34. N. Jiang, J.-S. Taylor, In vivo evidence that UV-induced C→T mutations at dipyrimidine
9. E. D. Pleasance, R. K. Cheetham, P. J. Stephens, D. J. McBride, S. J. Humphray, sites could result from the replicative bypass of cis-syn cyclobutane dimers or their
C. D. Greenman, I. Varela, M. L. Lin, G. R. Ordóñez, G. R. Bignell, K. Ye, J. Alipaz, M. J. Bauer, deamination products. Biochemistry 32, 472–481 (1993).
D. Beare, A. Butler, R. J. Carter, L. Chen, A. J. Cox, S. Edkins, P. I. Kokko-Gonzales, 35. Y. Tu, R. Dammann, G. P. Pfeifer, Sequence and time-dependent deamination of cytosine
N. A. Gormley, R. J. Grocock, C. D. Haudenschild, M. M. Hims, T. James, M. Jia, Z. Kingsbury, bases in UVB-induced cyclobutane pyrimidine dimers in vivo. J. Mol. Biol. 284, 297–311 (1998).
C. Leroy, J. Marshall, A. Menzies, L. J. Mudie, Z. Ning, T. Royce, O. B. Schulz-Trieglaff, 36. Y. Tu, S. Tornaletti, G. P. Pfeifer, DNA repair domains within a human gene: Selective
A. Spiridou, L. A. Stebbings, L. Szajkowski, J. Teague, D. Williamson, L. Chin, M. T. Ross, repair of sequences near the transcription initiation site. EMBO J. 15, 675–683 (1996).
P. J. Campbell, D. R. Bentley, P. A. Futreal, M. R. Stratton, A comprehensive catalogue 37. J. Hu, S. Adar, C. P. Selby, J. D. Lieb, A. Sancar, Genome-wide analysis of human global
of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010). and transcription-coupled excision repair of UV damage at single-nucleotide resolution.
10. G. P. Pfeifer, Mechanisms of UV-induced mutations and skin cancer. Genome Instability Genes Dev. 29, 948–960 (2015).
Disease 1, 99–113 (2020). 38. J. Jiang, W. Li, L. A. Lindsey-Boltz, Y. Yang, Y. Li, A. Sancar, Nucleotide excision repair
11. G. P. Pfeifer, A. Besaratinia, UV wavelength-dependent DNA damage and human hotspots and coldspots of UV-induced DNA damage in the human genome. bioRxiv
non-melanoma and melanoma skin cancer. Photochem. Photobiol. Sci. 11, 90–97 (2012). 2020.04.16.045369 [Preprint]. 2021. https://doi.org/10.1101/2020.04.16.045369.

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 12 of 13


SCIENCE ADVANCES | RESEARCH ARTICLE

39. R. Sabarinathan, L. Mularoni, J. Deu-Pons, A. Gonzalez-Perez, N. Lopez-Bigas, Nucleotide 52. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis,
excision repair is impaired by binding of transcription factors to DNA. Nature 532, R. Durbin; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/
264–267 (2016). Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
40. V. J. Cannistraro, J. S. Taylor, Acceleration of 5-methylcytosine deamination 53. A. R. Quinlan, BEDTools: The Swiss-army tool for genome feature analysis. Curr. Protoc.
in cyclobutane dimers by G and its implications for UV-induced C-to-T mutation hotspots. Bioinformatics 47, 11.12.1-34 (2014).
J. Mol. Biol. 392, 1145–1157 (2009). 54. H. Li, Tabix: Fast retrieval of sequence features from generic TAB-delimited files.
41. V. J. Cannistraro, S. Pondugula, Q. Song, J. S. Taylor, Rapid deamination of cyclobutane Bioinformatics 27, 718–719 (2011).
pyrimidine dimer photoproducts at TCG sites in a translationally and rotationally 55. O. Wagih, ggseqlogo: A versatile R package for drawing sequence logos. Bioinformatics
positioned nucleosome in vivo. J. Biol. Chem. 290, 26597–26609 (2015). 33, 3645–3647 (2017).
42. J. Drost, R. van Boxtel, F. Blokzijl, T. Mizutani, N. Sasaki, V. Sasselli, J. de Ligt, S. Behjati, 56. F. Ramírez, D. P. Ryan, B. Grüning, V. Bhardwaj, F. Kilpert, A. S. Richter, S. Heyne, F. Dündar,
J. E. Grolleman, T. van Wezel, S. Nik-Zainal, R. P. Kuiper, E. Cuppen, H. Clevers, Use T. Manke, deepTools2: A next generation web server for deep-sequencing data analysis.
of CRISPR-modified human stem cell organoids to study the origin of mutational Nucleic Acids Res. 44, W160–W165 (2016).
signatures in cancer. Science 358, 234–238 (2017). 57. D. Schmidt, Co-operation: Fast correlation, covariance, and cosine similarity, R package
43. A. Canela, S. Sridharan, N. Sciascia, A. Tubbs, P. Meltzer, B. P. Sleckman, A. Nussenzweig, version 0.6-2 (2019).
DNA breaks and end resection measured genome-wide by end sequencing. Mol. Cell 63,
898–911 (2016). Acknowledgments: We thank D. Fu, E. Wolfrum and H. Lyong for help with bioinformatics
44. A. Sancar, Mechanisms of DNA excision repair. Science 266, 1954–1956 (1994). and biostatistics analysis. Funding: This work was supported by NIH grant CA228089 to G.P.P.
45. M. F. Denissenko, A. Pao, M. Tang, G. P. Pfeifer, Preferential formation of benzo[a]pyrene Author contributions: S.-G.J. and G.P.P. conceptualized and designed the project. S.-G.J. and
adducts at lung cancer mutational hotspots in P53. Science 274, 430–432 (1996). J.J. performed experiments. D.P., P.L., S.-G.J., and G.P.P. performed data analysis. G.P.P. and
46. M. S. Tang, J. B. Zheng, M. F. Denissenko, G. P. Pfeifer, Y. Zheng, Use of UvrABC nuclease S.-G.J. wrote the manuscript. All authors commented on the manuscript. Competing
to quantify benzo[a]pyrene diol epoxide-DNA adduct formation at methylated versus interests: The authors declare that they have no competing interests. Data and materials
unmethylated CpG sites in the p53 gene. Carcinogenesis 20, 1085–1089 (1999). availability: All data needed to evaluate the conclusions in the paper are present in the paper
47. K. Takasawa, C. Masutani, F. Hanaoka, S. Iwai, Chemical synthesis and translesion and/or the Supplementary Materials. All sequencing data are available from the GEO database
replication of a cis–syn cyclobutane thymine–uracil dimer. Nucleic Acids Res. 32, (accession number GSE159807). Additional data related to this paper may be requested from
1738–1745 (2004). the authors.
48. Q. Song, S. M. Sherrer, Z. Suo, J. S. Taylor, Preparation of site-specific T=mCG cis-syn
cyclobutane dimer-containing template and its error-free bypass by yeast and human Submitted 22 March 2021
polymerase . J. Biol. Chem. 287, 8021–8028 (2012). Accepted 14 June 2021
49. D. B. Yarosh, A. Rosenthal, R. Moy, Six critical questions for DNA repair enzymes in Published 30 July 2021
skincare products: A review in dialog. Clin. Cosmet. Investig. Dermatol. 12, 617–624 (2019). 10.1126/sciadv.abi6508
50. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754–1760 (2009). Citation: S.-G. Jin, D. Pettinga, J. Johnson, P. Li, G. P. Pfeifer, The major mechanism of melanoma
51. G. G. Faust, I. M. Hall, SAMBLASTER: Fast duplicate marking and structural variant read mutations is based on deamination of cytosine in pyrimidine dimers as determined by circle
extraction. Bioinformatics 30, 2503–2505 (2014). damage sequencing. Sci. Adv. 7, eabi6508 (2021).

Downloaded from https://www.science.org on September 19, 2021

Jin et al., Sci. Adv. 2021; 7 : eabi6508 30 July 2021 13 of 13


The major mechanism of melanoma mutations is based on deamination of cytosine
in pyrimidine dimers as determined by circle damage sequencing
Seung-Gi JinDean PettingaJennifer JohnsonPeipei LiGerd P. Pfeifer

Sci. Adv., 7 (31), eabi6508.

View the article online


https://www.science.org/doi/10.1126/sciadv.abi6508
Permissions
https://www.science.org/help/reprints-and-permissions

Downloaded from https://www.science.org on September 19, 2021

Use of think article is subject to the Terms of service

Science Advances (ISSN ) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science Advances is a registered trademark of AAAS.
Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).

You might also like