Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Research Article

Assessing the impacts of various factors on circular


RNA reliability
Trees-Juen Chuang, Tai-Wei Chiang, Chia-Ying Chen

Circular RNAs (circRNAs) are non-polyadenylated RNAs with a further demonstrated to be evolutionarily conserved across spe-
continuous loop structure characterized by a non-colinear cies (3, 10, 11, 12). Accumulating evidence demonstrated that
back-splice junction (BSJ). Although millions of circRNA can- circRNAs can participate in various aspects of biological processes,
didates have been identified, it remains a major challenge for including competing with their host gene expression and splicing,
determining circRNA reliability because of various types of regulating activities of miRNAs and RNA-binding proteins
false positives. Here, we systematically assess the impacts of (RBPs), and acting as translation templates (2, 13). CircRNAs were
numerous factors related to circRNA identification, conservation, reported to be especially abundant in brain and neuronal tissues
biogenesis, and function on circRNA reliability by comparisons of (11, 14) and regulatory important in nervous system development
circRNA expression from mock and the corresponding colinear/ and aging (15). Loss or dysregulation of circRNAs may affect brain
polyadenylated RNA–depleted datasets based on three different function (12, 15, 16, 17), indicating the roles of circRNAs in patho-
RNA treatment approaches. Eight important indicators of circRNA genesis and progression of neurological diseases. In cancer studies,
reliability are determined. The relative contribution to variability alternation of circRNA expression was shown to affect cell apoptosis,
explained analyses reveal that the relative importance of these invasion, migration, and proliferation, revealing the potential of
factors in affecting circRNA reliability in descending order is the circRNAs for serving as biomarkers and therapeutic targets in cancer
conservation level of circRNA, full-length circular sequences, (18, 19). These findings suggest that circRNAs are an ancient and fine-
supporting BSJ read count, both BSJ donor and acceptor splice tuned class of functional transcripts.
sites at the same colinear transcript isoforms, both BSJ donor and Varying bioinformatic tools have been developed to identify BSJs
acceptor splice sites at the annotated exon boundaries, BSJs using high-throughput transcriptome sequencing (RNA-seq) data
detected by multiple tools, supporting functional features, and (20), leading to a tremendous number of potential circRNA candidates in
both BSJ donor and acceptor splice sites undergoing alternative human and other species (21, 22, 23). However, circRNA detections are
splicing. This study thus provides a useful guideline and an im- hampered by various types of false positives. An NCL junction that
portant resource for selecting high-confidence circRNAs for originates from sequencing error, in vitro artifact, ambiguous alignment,
further investigations. trans-splicing, or genetic rearrangement is often misjudged as a BSJ
event by computational detections (1, 2). This reflects the fact of limited
DOI 10.26508/lsa.202201793 | Received 1 November 2022 | Revised 15
overlap among the circRNA candidates identified by different tools (1, 24,
February 2023 | Accepted 15 February 2023 | Published online 27 February
2023 25, 26, 27) or databases (23), highlighting the issue of circRNA reliability.
Although millions of circRNA candidates were deposited in varied da-
tabases, only a very limited number of candidates have been carefully
Introduction curated to date (23). It requires a guideline for evaluating the reliability of
the detected circRNA candidates before performing time- and cost-
CircRNAs are an emerging class of RNAs formed by cis-backsplicing consuming experimental validations.
with a covalently closed loop structure that are endogenously Because circRNAs are NCL RNAs and lack 39polyadenylated tails,
expressed as non-polyadenylated circular molecules (1, 2). Such a many circRNA candidates were detected in RNA-seq datasets de-
loop structure is characterized by a non-colinear (NCL) back-splice rived from total RNAs (the ribosomal RNA–depleted RNAs without
junction (BSJ) between a downstream splice donor site and up- poly(A)-selection; “mock RNAs” for simplicity), which contain RNA-
stream splice acceptor site. Despite most circRNAs are expressed at seq reads from both colinear and NCL RNAs (28). Because most
a much lower level compared with linear mRNAs (3, 4, 5), some are colinear RNAs are digested by RNase R (3, 28, 29, 30), a detected
highly expressed (3, 6), even more abundant than their colinear circRNA candidate is more likely to be real if its normalized BSJ read
counterparts (7, 8). In addition, circRNAs are more stable than their count is enriched or not reduced after the treatment. Therefore,
corresponding colinear mRNA isoforms (1, 9). Some circRNAs were comparisons of BSJ read counts detected from mock and RNase

Genomics Research Center, Academia Sinica, Taipei, Taiwan

Correspondence: trees@gate.sinica.edu.tw.

© 2023 Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 1 of 16


R–treated RNA datasets were often employed to evaluate the re- without poly(A)-selection), which contain RNA-seq reads from both
liability of the detected circRNAs (24, 25, 26, 31). However, in some circRNAs and colinear/polyadenylated RNAs (28). We employed
cases, such a mock-treated comparison approach may not be ef- comparisons of normalized BSJ read counts detected from mock
ficient for circRNA detection because of two inherent limitations. and the corresponding colinear/poly(A)-depleted datasets to de-
First, some colinear RNAs containing highly structured 39 ends or termine high-confidence circRNAs. For comprehensively assessing
G-quadruplexes are also resistant to RNase R (32). Second, some circRNA reliability and increasing the robustness of our analyses,
circRNAs may be susceptible to RNase R and digested by this exonuclease we extracted 19 mock-treated sample pairs of different samples or
(33). Accordingly, in addition to RNase R–treated RNAs, we considered studies from public datasets, which included three types of RNA
two other RNA treatments, RNAs treated with coupling A-tailing treatment approaches: RNase R approach (group 1), A-tailing RNase
and RNase R digestion and RNAs treated with depletion of both R approach (group 2), and non-poly(A) approach (group 3) (see
ribosomal RNAs and polyadenylated RNAs, for mock-treated Tables 1 and S2). We examined the 480,471 circRNAs in all the mock
comparisons. The former could efficiently digest most of the samples (see the Materials and Methods section) and found that a
RNase R–resistant colinear RNAs stated above (32), and the latter considerable percentage of circRNAs (38–80%) were supported by only
could deplete most polyadenylated RNAs and retain circRNAs one BSJ read (Fig 1E). To minimize potential false positives, only the
susceptible to RNase R (33). On the basis of comparisons of circRNAs expressed in the mock samples with the supporting BSJ read
normalized BSJ read counts detected from mock and the cor- count ≥ 2 were considered in the analysis for each mock-treated sample
responding colinear/poly(A)-depleted RNA datasets from three pair. It was presumed that the circRNAs not depleted after the treatment
different RNA treatments, we thus presented a systematical (“not-depleted circRNAs”) were more likely to be bona fide circRNAs
workflow to evaluate the impacts of various factors related to circRNA than those depleted because of the treatment (“depleted circRNAs”). We
identification, conservation, biogenesis, and function on circRNA then compared not-depleted circRNAs with depleted ones in terms of
reliability. After determining statistically important indicators of various factors related to circRNA identification, conservation, biogen-
circRNA reliability, we further assessed the relative influence of esis, and function (Table S3), and thereby accessed the importance of
these important indicators on circRNA reliability. Collectively, these factors in affecting circRNA reliability.
we provided a useful guideline for selecting high-confidence
circRNAs. All of the information related to circRNA identification, Factors related to circRNA identification
conservation, biogenesis, and function was freely available.
Regarding the circRNAs identified in the mock samples, the impacts
of the factors related to circRNA identification (i.e., BSJ read count,
number of detected tools, and evidence of full-length circular
Results sequence) on circRNA reliability were assessed. First, we found
that the supporting BSJ read counts were significantly higher in
Potential false positives in the circRNA database not-depleted circRNA candidates than in depleted ones in all
mock-treated sample pairs examined (all FDR < 0.05; Fig 2A). The
As shown in our analysis flow (Fig 1A), we first extracted circRNA (or percentages of non-depleted circRNAs increased with increasing the
BSJ) candidates from the circAtlas database 2.0 (22) (580,654 can- supporting BSJ read counts (Fig 2A). Second, we examined the in-
didates; see also Table S1). Like previous reports (1, 24, 25, 26, 27, 34), fluences of BSJs detected by multiple or single circRNA detection
a considerable percentage (36.5%) of the examined circRNA can- tools on circRNA reliability and found that the former were significantly
didates were detected by only one tool (Fig 1B). Such great dis- enriched for not-depleted circRNAs (Fig 2B). Generally, the percentages of
crepancies in the identification results among tools reflected the non-depleted circRNAs increased with increasing numbers of the de-
uncertainty of the computationally detected circRNA events (1, 24, 25, tected tools (Fig 2B). Third, we examined the influences of BSJs with or
26, 27). One cause of the false-positive calls may be due to alignment without the evidence of full-length circular sequences reconstructed by
ambiguity with an alternative colinear explanation or multiple hits CIRI-full (43) on circRNA reliability and showed that the former were
when detecting BSJs (1, 26, 35). Of the 580,654 circAtlas circRNAs, we significantly enriched for not-depleted circRNAs, regardless of the RNA
found that 17.3% (100,183 circRNAs; Fig 1C) were potential false positives treatment approaches (Fig 2C). Because the performance of CIRI-full, a
arising from an alternative colinear explanation (Fig S1) or multiple hits short read–based approach, for identifying full-length sequences of
(see the Materials and Methods section). It was reported that the circRNAs is hampered by the length of short reads (43), we also extracted
circRNAs detected by multiple tools tended to be more reliable than the full-length circRNA candidates identified by circFL-seq based on
tool-specific circRNAs (36, 37). As expected, the percentages of circRNAs nanopore long reads (44). Such a long read–based approach provided the
derived from ambiguous alignments significantly decreased with in- direct support of full-length circular sequences from single-molecule long
creasing the numbers of the detected tools (Fig 1D), reflecting reads. Similarly, the BSJ events with the evidence of full-length circular
the positive correlation between the level of circRNA reliability and sequences showed a significant enrichment for not-depleted circRNAs
the number of circRNA detection tools. For accuracy, we excluded the in all mock-treated sample pairs examined (Fig 2C).
circRNA candidates arising from potential alignment ambiguity
(100,183 events) and only considered the remaining circRNAs (480,471 Factors related to circRNA conservation
events) for the following analyses of circRNA reliability.
Most circRNA candidates were detected from RNA-seq datasets We then examined the relationship between circRNA reliability and
derived from mock samples (the ribosomal RNA–depleted RNAs conservation patterns at the three levels: species, tissues, and

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 2 of 16


Figure 1. The assessment of the impacts of various features on circRNA reliability.
(A) Flowchart of the overall analyses. (B) Distribution of the extracted circAtlas circRNA (BSJ) candidates (580,654 candidates) identified by 1, 2, 3, or 4 circRNA detection
tools. (C) 580,654 candidates derived from potential alignment ambiguity (with an alternative colinear explanation or multiple hits). (D) Comparisons of the percentages
of circRNA candidates derived from potential alignment ambiguity for the candidates detected by 1, 2, 3, or 4 tools. (E) Comparisons of normalized numbers of circRNA
candidates with supporting BSJ read count = 1 or ≥ 2 in all extracted mock samples of the 19 mock-treated sample pairs. For each sample, the percentage of circRNA
candidates supported by one BSJ read is shown.

individuals (or samples). We found that the conservation levels of around the BSJs (see see the Materials and Methods section) were
circRNAs at the three levels (Fig 3A–D) were all significantly higher in generally higher in not-depleted circRNAs than in depleted ones
not-depleted circRNAs than in depleted ones, regardless of the RNA (Fig S2). These results suggested that the BSJ events widely detected
treatment approaches. The percentages of non-depleted circRNAs across species/samples were more reliable than those detected in
increased with increasing numbers of the conserved species (Fig a limited number of species/samples.
3A), tissues (Fig 3B), and samples (Fig 3D). Of note, in contrast with
the number of tissues (Fig 3B), the levels of tissue specificity index Factors related to circRNA biogenesis
were significantly lower in not-depleted circRNA than in depleted
ones (Fig 3C). We further showed that the evolutionary rates (as We proceeded to examine the relationships between circRNA re-
determined by phyloP (45) or phastCons (46)) of the sequences liability and the factors related to circRNA biogenesis. Because

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 3 of 16


Table 1. Summary of RNA-seq data used in this studya.

Approach (mock versus treated RNAs)b Sample pair ID Resource (ref.)


Group 1: RNase R approach
HeLa_1 PRJNA266072 (38)
Hs68 SRP011042 (3)
rRNA- versus RNase R+ HeLa_2, HEK293 PRJNA266072 (39)
HeLa_3, K562 PRJNA231721 (40)
Total_brain, cerebellum, and blood PRJNA646808 (34)
rRNA- (KCl) versus RNase R+ (KCl) HeLa_KCl PRJNA541935 (32)
rRNA- versus RNase R+ (KCl) HEK293T_KCl and SH-SY5Y_KCl PRJNA752779 (41)
rRNA- versus RNase R+ (LiCl) HEK293T_LiCl and SH-SY5Y_LiCl PRJNA752779 (41)
Group 2: A-tailing RNase R approach rRNA- (LiCl) versus A-
HeLa_A PRJNA541935 (32)
tailing/RNase R+ (LiCl)
rRNA- versus A-tailing/RNase R+ (LiCl) HEK293T_A and SH-SY5Y_A PRJNA752779 (41)
Group 3: non-poly(A) approach rRNA- versus p(A)- GM12878_pA- and K562_pA- ENCODE (42)
a
The detailed information is given in Table S2.
b
rRNA-, RNase R+, A-tailing/RNase R+ (LiCl), and p(A)-represented rRNA-depleted RNAs, rRNA-depleted RNAs with RNase R digestion, rRNA-depleted RNAs
treated with coupling A-tailing and RNase R digestion in LiCl-containing buffer, and rRNA-/poly(A)-depleted RNAs.

circRNAs were suggested to be generated by canonical spliceo- (Fig 4E). In addition, a number of RBPs were demonstrated to serve as
somal machinery (1, 5, 47, 48, 49), we examined whether the BSJs a role for regulating circRNA biogenesis by binding to cis elements in
of not-depleted circRNA candidates tended to agree to well- the sequences flanking circularized exons (1, 2). We then examined
annotated exon boundaries of colinear transcripts or be subject whether the factor of BSJs with RBPs binding to the flanking regions
to alternative splicing (AS) based on previously annotated colinear showed enrichment for not-depleted circRNAs and found that this
transcripts (e.g., the Ensembl annotation) as compared with those factor was slightly enriched for not-depleted circRNAs only (Fig 4F). We
of depleted circRNA ones. Indeed, the BSJs that agreed to annotated also examined the BSJs with RCSacross or RBP-binding sites in flanking
exon boundaries (either donor or acceptor splice sites) showed a regions and did not observe a significant enrichment for not-depleted
significant enrichment for not-depleted circRNAs, regardless of the circRNAs (Fig 4G). Moreover, considering the minimum distance of the
RNA treatment approaches (Figs 4A and S3A). The BSJs with both detected RBP-binding sites to the BSJs, the distance was only slightly
donor and acceptor splice sites matching annotated exon shorter in not-depleted circRNAs than in depleted ones (Fig 4H). These
boundaries from the same colinear transcript isoforms were also results suggested that the factors of RCS or RBPs binding to the flanking
enriched for not-depleted circRNAs (Fig 4B). We also observed that regions were not good indicators for evaluating circRNA reliability.
the splice site strength of BSJs was generally stronger in not-
depleted circRNAs than in depleted ones, regardless of the tools Factors related to circRNA function
used for estimating the splice site strength (Fig S3B). Regarding the
BSJs matching previously annotated exon boundaries of colinear We then examined the impact of functional features (miRNA binding, RBP
transcripts, the BSJs (donor or acceptor splice sites) that were binding, and seven types of evidence for circRNA coding potential) on
subject to AS were significantly enriched for not-depleted circRNAs circRNA reliability (see the Materials and Methods section). For each BSJ
across the mock-treated sample pairs examined (Fig S3C). Such a event, because the circularized sequence is shared with its colinear
trend exhibited particularly enriched when both donor and ac- counterpart except for the BSJ donor and acceptor splice sites, we con-
ceptor splice sites of the BSJs were subject to AS (Fig 4C). These sidered the predicted miRNA/RBP-binding sites and ORFs spanning the
observations support that most circularized exons are recognized BSJ. Integration of these nine types of functional features, we observed that
by the spliceosome during canonical splicing. numbers of supporting functional features were significantly higher in not-
Next, previous studies demonstrated that backsplicing can be depleted circRNAs than in depleted ones in all mock-treated sample pairs
facilitated by RCSs residing in the sequences flanking circularized (all FDR < 0.05; Fig 5). The percentages of non-depleted circRNAs increased
exons (3, 8, 17, 29) and affected by the competition of RCSs across with increasing numbers of supporting functional features (Fig 5). These
flanking regions (RCSacross) or within individual regions (RCSwithin) (29). We results suggested that the strength of functional evidence for circRNA
thus asked whether the BSJs with RCSacross in flanking regions were candidates were positively correlated with the level of circRNA reliability.
enriched for not-depleted circRNA candidates. However, we did not
observe this trend (Fig 4D). We further examined the numbers of Relative effect of each individual factor on circRNA reliability
(RCSacross - RCSwithin) for the detected circRNAs and found no signif-
icant differences between not-depleted and depleted circRNAs in According to the above assessment results, we observed eight
terms of the percentages of the BSJs with (RCSacross - RCSwithin) ≥ 1 important indicators of circRNA reliability, each of which can

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 4 of 16


Figure 2. Impact of factors related to circRNA identification on circRNA reliability.
(A) Supporting BSJ read count, (B) number of detected tools, and (C) evidence of full-length circular sequence. For (B, C), the odds ratios, which were determined using
two-tailed Fisher’s exact test, represented the ratios of the occurrences of (B) circRNAs detected by multiple tools or (C) circRNAs with the evidence of full-length circular
sequence for non-depleted circRNAs to the occurrences of those for depleted circRNAs. The dashed lines represented odds ratio = 1. (A, B) For the bottom panels of (A, B),
the correlations between percentages of non-depleted circRNAs and (A) supporting BSJ read count or (B) number of detected tools are shown. For (C), the evidence of
full-length circular sequence was supported by CIRI-full (a short read-based approach, top) or circFL-seq (a long read-based approach, bottom). (A, B, C) P-values were
determined using Wilcoxon rank-sum test (WRST; greater or less; (A)) or two-tailed FET (B, C) and FDR adjusted across 19 mock-treated sample pairs for each examined
factor using Benjamini–Hochberg correction. For (A), the Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The
number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.

effectively distinguish between non-depleted and depleted circRNA reliability for each mock-treated sample pair (see the
circRNAs in all mock-treated sample pairs examined. The eight Materials and Methods section) (Fig 6A and Table S4). We
factors were (i) supporting BSJ read count; (ii) BSJs detected by ranked the RCVE scores for each mock-treated sample pair and
multiple tools; (iii) full-length circular sequences; (iv) conservation found that on average the most important features in descending
level of circRNA (number of samples observed the circRNAs); (v) order were the conservation level of circRNA, full-length circular
both BSJ donor and acceptor splice sites at the annotated exon sequences, supporting BSJ read count, both BSJ donor and acceptor
boundaries; (vi) both BSJ donor and acceptor splice sites at the splice sites at the same colinear transcript isoforms, supporting
same colinear transcript isoforms; (vii) both BSJ donor and functional features, both BSJ donor and acceptor splice sites at the
acceptor splice sites undergoing annotated AS events; and (viii) annotated exon boundaries, BSJs detected by multiple tools, and
supporting functional features. We then employed a general- both BSJ donor and acceptor splice sites undergoing annotated AS
ized linear model (GLM) with the eight factors and measured events (Fig 6B).
the relative contributions to variability explained (RCVE) (50) to Particularly, our result revealed that the conservation level of
evaluate the relative effect of individual factor in affecting circRNA (number of samples observed the circRNAs) is the most

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 5 of 16


Figure 3. Impact of factors related to circRNA conservation on circRNA reliability.
(A, B, C, D) Impact of conservation factors at the (A) species, (B, C) tissue, and (D) individual (or sample) levels on circRNA reliability. For the bottom panels of (A, B, and D),
the correlations between percentages of non-depleted circRNAs and (A) number of conserved species, (B) number of conserved tissues, or (C) number of conserved
samples are shown. P-values are determined using WRST (greater or less) and FDR adjusted across 19 mock-treated sample pairs for each examined factor using
Benjamini–Hochberg correction. The Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated
sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.

dominant determinant of circRNA reliability common to all the one cell type/tissue only. Of the examined mock-treated sample
examined mock-treated sample pairs except for the sample pair in pairs, the RNA-seq data of five pairs (HeLa_1, HeLa_2, HeLa_3,
blood (Fig 6B). Of note, the samples considered were the integration HeLa_KCl, and HeLa_A) were all derived from HeLa cells, which were
of samples across species, tissue, and individuals in circAtlas. It is generated from different studies or conditions (Table S2). The five
known that circRNAs are often expressed in a cell type-/tissue- HeLa-based RNA-seq datasets can be regarded as replicates. As
specific manner (4, 51). Actually, we found that most (61%; 293,152 expected, the percentage of not-depleted circRNAs was strongly
circRNAs) of the 480,471 circAtlas circRNAs were detected in one positively correlated with the number of samples observed the
sample only (Table S3). We were curious about whether the positive circRNAs in all the five HeLa-based pairs (Fig 6C, left). The similar
correlation between the percentage of not-depleted circRNAs and the trend was observed for the two K562-based pairs (Fig 6C, middle).
number of samples observed the circRNAs held when considering We also examined the HeLa_1 and HeLa_2 mock-treated sample

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 6 of 16


Figure 4. Impact of factors related to circRNA biogenesis on circRNA reliability.
(A) Both BSJ sites at annotated boundaries, (B) both BSJ sites at the same isoform, (C) both BSJ sites undergoing alternative splicing (AS), (D) BSJs with #RCSacross > 0,
(E) BSJs with #(RCSacross - RCSwithin) > 0, (F) BSJs with RBPs binding to the flanking regions, (G) BSJs with #RCSacross > 0 or RBPs binding to the flanking regions, and
(H) minimum distance of RBP-binding sites to BSJs. For (A, B, C, D, E, F, G), the odds ratios represented the ratios of the occurrences of the examined factors of (A, B, C, D, E, F,
G) for non-depleted circRNAs to the occurrences of those for depleted circRNAs. The dashed lines represented odds ratio = 1. Odds ratios and P-values were determined
using two-tailed FET. P-values were FDR adjusted across 19 mock-treated sample pairs for each examined factor using Benjamini–Hochberg correction. For (H), P-values
were determined using WRST (greater or less) and FDR adjusted across 19 mock-treated sample pairs for each examined factor using Benjamini–Hochberg correction. The
Wilcoxon effect sizes and the corresponding 95% confidence intervals are plotted (see also Table S4). The number of mock-treated sample pairs that passed the
statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented in curly brackets.

pairs, which were two replicates generated from the same each of the five HeLa-based mock-treated sample pairs, the HeLa-
group but different studies (Table S2) and showed the similar specific circRNAs were divided into circRNAs detected in single replicate
results (Fig 6C, right). Furthermore, regarding the mock samples only and circRNAs detected in multiple replicates. Our results revealed
of the 19 mock-treated sample pairs, we extracted the HeLa- that the percentages of not-depleted circRNAs were significantly
specific circRNAs that were detected in the examined HeLa higher in circRNAs detected in multiple HeLa replicates than in
samples only but not in the non-HeLa cell lines/tissues. For circRNAs detected in single replicate only for all five HeLa-based

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 7 of 16


Figure 5. Impact of functional features on circRNA reliability.
Nine types of functional features (see the Materials and Methods section) for circRNAs are examined. For the bottom panel, the correlations between percentages of
non-depleted circRNAs and number of supporting functional features are shown. P-values are determined using WRST (greater or less) and FDR adjusted across 19 mock-
treated sample pairs for each examined feature using Benjamini–Hochberg correction. The Wilcoxon effect sizes and the corresponding 95% confidence intervals are
plotted (see also Table S4). The number of mock-treated sample pairs that passed the statistical significance tests with FDR < 0.05 (or −log10(FDR) > 1.3) are represented
in curly brackets.

pairs (Fig 6D, left). Such a trend was held for the K562-based pairs whether the presence of G-quadruplexes across the BSJs affected
(Fig 6D, right). the percentages of not-depleted circRNAs and found no significant
differences between circRNAs with and without removing
G-quadruplexes across BSJs (Fig S4 and Table S3). Moreover, we
extracted BSJs from two high-confidence circRNA datasets, in which
Discussion the BSJ events had passed multiple experimental validations. For
the first dataset (8), BSJs were supported by both avian myelo-
The reliability of computationally detected circRNAs is not guar- blastosis virus- and Moloney murine leukemia virus-derived
anteed because of varied types of false positives. It is particularly RNA-seq reads (designated as “RT-independent circRNAs”). RT-
difficult to pick out spurious BSJ (or circRNA) events derived from independent circRNAs are regarded to be highly accurate be-
in vitro artifacts among a tremendous number of previously cause comparisons of different RTase products can effectively
identified NCL junctions. Such in vitro artifacts may frequently arise detect RT-based artifacts (1, 10, 52, 53). For the second dataset (55),
from template switching by random hexamer primed reverse BSJs (or their functions) were previously validated by RT-based
transcriptase (RT) reactions (52), which can switch from one tem- experiments and at least one type of non-RT–based experiments
plate to another with short homologous sequences at the NCL simultaneously in at least one human tissues or cell lines (des-
junction sites (53, 54). Of note, some NCL junctions passing RT- ignated as “RT-/non-RT–validated circRNAs”; see the Materials and
based validations were finally confirmed to be generated from Methods section). We observed that the percentages of both RT-
in vitro artifacts by further validations (10). As RNA-seq data are independent (Fig 7A) and RT-/non-RT–validated (Fig 7B) circRNAs
generated from RT-based sequencing strategies, it is unavoidable were significantly higher in not-depleted circRNAs than in depleted
that a number of spurious NCL events masquerade as genuine BSJs circRNAs in all mock-treated sample pairs examined. Moreover, we
because of such RT-based artifacts (1). We asked whether the compared circAtlas circRNAs with circRNAs from eight other pub-
approaches based on comparisons of paired mock and treated licly accessible circRNA databases and identified circAtlas-specific
samples can reflect circRNA reliability to a certain extent. We circRNAs. We presumed that circAtlas-specific circRNAs were less
emphasized that, for the robustness of our analyses, 19 mock- probably to be reliable as compared with circRNAs collected in
treated sample pairs of different samples/studies based on three multiple databases. Indeed, the percentages of circAtlas-specific
types of RNA treatment approaches (Table 1): RNase R approach circRNAs were significantly lower in not-depleted circRNAs than in
(group 1), A-tailing RNase R approach (group 2), and non-poly(A) depleted circRNAs (Fig 7C).
approach (group 3) were performed for the mock-treated com- For the relative influence of each individual factor on circRNA
parisons. Because highly structured 3’ ends or G-quadruplexes may reliability, our RCVE analysis revealed that “conservation level of
affect the effect of the RNase R treatment (32), we also examined circRNA” and “full-length circular sequences” were the top two

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 8 of 16


Figure 6. Assessment of relative influence of each individual factor on circRNA reliability.
(A) RCVE scores of the examined factors for each mock-treated sample pair. The length of each color segment in each bar represents the RCVE score of the corresponding
examined factor. (B) Ranking (top) and average ranking (bottom; see also the numbers in the parentheses) of the RCVE scores of the examined factors. (C) Correlations between the
percentages of not-depleted circRNAs and number of samples observed the circRNAs in HeLa-based (left and right) or K562-based (middle) mock-treated sample pairs. Of note, the
HeLa_1 and HeLa_2 mock-treated sample pairs were generated from the same group but different studies. (D) Comparisons of percentages of not-depleted circRNAs for HeLa-
specific circRNA (left) or K562-specific circRNAs (right) detected in single replicate only or multiple replicates. All P-values are determined using two-tailed FET.

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 9 of 16


RO-merged reads, the BSJ events are less probable to originate
from template switching because both ends of the paired-end
reads are sequencing from the same fragment and reversely
overlapped with each other by alignment. Similarly, with the direct
support of full-length circular sequences from single-molecule
long reads, the nanopore-based full-length circRNAs are less
likely to be artifacts derived from template switching.
Our analyses also suggested that “supporting BSJ read count,”
“supporting functional features,” and “BSJs detected by multiple
tools” were good indicators for assessing circRNA reliability. Be-
cause the circularized sequence is shared with its colinear coun-
terpart except for the BSJ sites, the supporting BSJ read is a unique
evidence of charactering a given circRNA event. It is known that
most circRNAs are expressed at a low level (3, 4, 5); particularly, a
considerable number of detected circRNA candidates were sup-
ported by one BSJ read only (e.g., Fig 1E). It is challenging to de-
termine the reliability of such low-abundance circRNAs. Detection
tools often set a threshold of supporting BSJ read count to minimize
false positives. For example, circRNA_finder (56) and segemehl (57)
filter circRNA candidates with a BSJ read count of at least five. In
addition to BSJ reads, the functional features of the predicted
miRNA/RBP-binding sites or ORFs spanning the BSJs implies the
relevance of the BSJ events to functional importance. These fea-
tures provide another unique support of circRNA reliability. Re-
garding the factor of “BSJs detected by multiple tools,” a previous
study reported that combining the results of multiple tools may
compensate for the possible weak points of each single tool, and
thereby increase the confidence of the detected circRNAs (58). This
also reflects our observation of the negative correlation between
the percentages of false-positive BSJs derived from alignment
ambiguity and the numbers of the detected tools (Fig 1D).
In terms of circRNA biogenesis, we showed that the factors of
“both BSJ donor and acceptor splice sites at the same colinear
transcript isoforms,” “both BSJ donor and acceptor splice sites at
the annotated exon boundaries,” and “both BSJ donor and acceptor
splice sites undergoing annotated AS events” were important for
determining circRNA reliability. A possible reason is that circRNAs
Figure 7. Robustness analyses of the approaches based on comparisons of are also generated by canonical spliceosomal machinery (1, 47, 48,
paired mock and treated samples for assessing circRNA reliability. 49). Previous studies also reported that NCL transcript products
(A, B, C) Comparisons of the percentages of (A) RT-independent circRNAs, (B) RT-/
(including circRNAs, fusion events, and trans-spliced transcripts)
non-RT–validated circRNAs, and (C) circAtlas-specific circRNAs in the not-depleted
and depleted circRNAs for all mock-treated sample pairs examined. RT- not matching well-annotated exon boundaries were more likely to
independent and RT-/non-RT–validated circRNAs represented high-confidence be in vitro artifacts (53, 59, 60). Particularly regarding the transcripts
circRNAs (see text and the Materials and Methods section). CircAtlas-specific generated during the transcriptional process, in addition to cis-
circRNAs were the circAtlas circRNAs that are not observed in eight other publicly
accessible circRNA databases (see the Materials and Methods section). P-values
backsplicing, an observed intragenic NCL junction may also arise
were determined using two-tailed FET and FDR adjusted across 19 mock-treated from trans-splicing (1, 10). Although most of the detected NCL
sample pairs for each examined feature using Benjamini–Hochberg correction. events may be derived from cis-backsplicing, previous studies
*FDR < 0.05. **FDR < 0.01. ***FDR < 0.001.
based on comparisons of poly(A)-selected and poly(A)-depleted
RNA-seq data suggested that a considerable percentage of de-
dominant determinants of circRNA reliability (Fig 6B). For the most tected NCL junctions were derived from trans-spliced RNAs (8, 61).
important factor in affecting circRNA reliability, because template Actually, a number of NCL junctions detected in human cells were
switching events occur randomly during reverse transcription experimentally validated to be generated by trans-splicing rather
in vitro, a circRNA candidate observed in multiple samples or than cis-backsplicing (10, 53, 62, 63). As trans-splicing occurs be-
replicates is less likely to be generated from such an in vitro artifact. tween two or more separate precursor mRNAs (64, 65), it is possible
For the second most important factor in affecting circRNA reliability, that an observed NCL junction with its two splice sites from dif-
full-length circular sequences can be reconstructed by Illumina ferent colinear transcript isoforms is derived from a trans-splicing
short RNA-seq reads with the reverse overlap (RO) feature (43) or event. We observed that the full-length circRNAs with the support of
identified by single-molecule long reads (44). With the support of RO-merged short reads or single-molecule long reads, which were

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 10 of 16


principally not generated from trans-splicing, were significantly the best of our understanding, this study, for the first time, presents
enriched for the BSJs with donor and acceptor splice sites at the a useful guideline to systematically assess circRNA reliability with
same colinear transcript isoforms (Fig S5). simultaneously accounting for varied types of factors, facilitating
On the other hand, we showed that the factors related to RCS and further functional investigation of this important class of non-
RBPs binding to the flanking regions of BSJs were not good indi- canonical transcripts.
cators for assessing circRNA reliability. Our observations reflect a
previous notion that the existence of RCS is neither sufficient nor
necessary for cis-backsplicing (2). Although some circRNA cases
were demonstrated to be facilitated by base-paring between RCSs
Materials and Methods
at two flanking intronic sequences of the BSJs (3, 8, 29), only a
CircRNA candidates and RNA-seq data collection
limited number of RCS-circRNA correlations were experimentally
confirmed. Multiple RNA parings derived from different RCS pairs
As shown in Fig 1A, human circRNA candidates examined (Table S1)
across flanking introns or within individual flanking introns can
were extracted from circAtlas 2.0 (22), which contained 580,654
compete against each other and lead to the reduction of circRNA
human circRNA candidates based on 240 samples collected from 19
production (29, 47, 66) or alternative cis-backsplicing (30, 67), further
tissues across seven vertebrate species. The related information
complicating the determination and validation of the RCS-circRNA
such as circRNA identification factors (e.g., circRNA detection tools
correlations. In addition, although repetitive elements abundantly
and full-length circular sequences) and conservation factors (e.g.,
emerge in mammalian genomes, which contribute to RNA pairing,
conservation of circRNAs across species, tissues, and samples)
circRNAs observed in non-mammalian species (e.g., Drosophila
were also downloaded from circAtlas 2.0. CircAtlas circRNAs were
melanogaster (56) and Oryza sativa (68)) are often lacking the feature
identified by CIRI2 (38, 72), find_circ (28), CIRCexplorer (73), or DCC
of RCS. For the associations between RBPs and circRNA formation,
(74). The full-length circular sequences were reconstructed by CIRI-
different RBPs may play opposite roles (up- or down-regulation) in
full (43). All the analyses were based on the human reference
regulating circRNA expression. Some RBPs, such as FUS (69) and
genome (GRCh38) and the Ensembl annotation (version 100). The
ADARs (70), were even showed to mediate circRNA production in a
circRNA candidates that were potentially derived from alignment
bidirectional manner. Because different RBPs may have overlapping
ambiguity (circRNA candidates had an alternative colinear expla-
capacities for binding to RCSs, it requires further investigations to
nation or multiple matches against the human genome) were
understand the joint (coordinate or competitive) effects of different
detected by NCLcomparator (35) with default parameters and were
RBPs on the regulations for each individual circRNA (2, 71).
not included in the subsequent analyses for accuracy (Fig 1A). After
This study systematically assessed the impacts of a dozen
that, 480,471 circAtlas circRNAs were retained. The remaining RNA-
factors related to identification, conservation, biogenesis, and
seq reads were aligned against the human reference genome using
function on circRNA reliability on the basis of comparisons of mock-
STAR with default parameters (75). For each circRNA (BSJ), we
treated sample pairs from three different RNA treatment ap-
generated a pseudo sequence by concatenating the sequences
proaches. Of the examined factors, eight were shown to be the
flanking the BSJ (within −50 nucleotides of the donor site to +50
important indicators of circRNA reliability in all mock-treated
nucleotides of the acceptor site) and aligned all STAR-unmapped
sample pairs examined, regardless of the RNA treatment ap-
reads against the 100-bp BSJ-based pseudo sequence using BWA
proaches. We assessed the relative influence of each individual
with default parameters (76). A STAR-unmapped read was deter-
factor on circRNA reliability and suggested that the most important
mined as a BSJ read if the read matched to ≥80% of the BSJ-based
factors in descending order were the conservation level of circRNA,
pseudo sequence and spanned the junction boundary by ≥10 bp on
full-length circular sequences, supporting BSJ read count, both BSJ
both sides of the BSJ.
donor and acceptor splice sites at the same colinear transcript
The mock-treated sample pairs (19 pairs) were extracted from
isoforms, supporting functional features, both BSJ donor and ac-
publicly accessible databases and categorized into three groups
ceptor splice sites at the annotated exon boundaries, BSJs detected
according to the RNA treatment approaches (see Tables 1 and S2):
by multiple tools, and both BSJ donor and acceptor splice sites
undergoing annotated AS events. All the circAtlas circRNA candi- Group 1: RNase R approach, mock sample versus RNase R–treated
dates and the corresponding factors examined were provided sample (14 pairs).
(Table S3). Importantly, we found a remarkably positive correlation Group 2: A-tailing RNase R approach, mock sample versus A-tailing
between number of supporting factors and the percentage of high- RNase R–treated (RNAs treated with coupling A-tailing and RNase R
confidence circRNAs (i.e., RT-independent and RT-/non-RT– digestion) sample (three pairs).
validated circRNAs) and a remarkably negative correlation between Group 3: non-poly(A) approach, mock sample versus non-poly(A)–
number of supporting factors and the percentage of circAtlas- selected (RNAs treated with depletion of both ribosomal RNAs and
specific circRNAs (Fig 8). This revealed the additive effects of polyadenylated RNAs) sample (two pairs).
these factors on circRNA reliability. Particularly, more than 70% of
circRNAs without any supporting factors were circAtlas-specific Of note, the downloaded mock-treated sample pairs should
circRNAs, whereas such a percentage sharply declined to ≤3% simultaneously satisfy the following criteria: (1) the number of
when considering the circRNAs with at least five supporting factors the circRNA candidates detected in the mock samples should
(Fig 8), supporting the effectiveness of these factors in determining be greater than 600 and (2) more than one-third of the circRNA
circRNA reliability before performing experimental validations. To candidates detected in the mock samples should be detected in the

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 11 of 16


Figure 8. The correlation between number of supporting factors and the percentage of high-confidence circRNAs (RT-independent or RT-/non-RT–validated
circRNAs) and circAtlas-specific circRNAs for the 480,471 circAtlas circRNA candidates.
The right panel showed a clearer graph of the correlation between number of supporting factors and the percentage of circAtlas-specific circRNAs. The eight important
factors illustrated in Fig 6 except for the factor of “supporting BSJ read count” were considered because this factor was dependent on the examined samples and the
corresponding RNA-seq data. For the factors of “number of samples” and “number of supporting functional features,” three was used as a cutoff value. The detailed
information can be found in Table S3.

corresponding treated samples. For each RNA-seq dataset, dupli- and (4) within +1 to +10 nucleotides of the donor site. The evolu-
cated RNA-seq reads were removed using FastUniq (77). tionary rate of each region was measured by the average value of
the phyloP (or phastCons) scores of the considered nucleotides
(10 bp) within the region. The annotated exon boundaries and
Treat/mock ratio calculation
alternative splicing (AS) patterns were derived from the Ensembl
annotation (version 100). The splice site strength of BSJs was es-
For accuracy, only the circRNAs detected in the mock samples with
timated using MaxEntScan based on three scoring models (maxi-
the support of at least two BSJ reads were considered in the
mum entropy model, first-order Markov model, and weight matrix
analysis for each mock-treated sample pair. The expression levels
model) (78). The numbers of reverse complementary sequences
of circRNAs were calculated using the BSJ reads per million raw
(RCSs) across the flanking sequences (RCSacross; ±20,000 nucleo-
reads (RPM). For each considered circRNA candidate, we calculated
tides of the BSJs) or within individual flanking sequences (RCSwithin)
the ratio of the circRNA expression detected in the treated samples
of the BSJs were calculated using CircMiMi (79) with the command
to that detected in the corresponding mock samples (i.e., treat/
line: circmimi_tools check RCS –dist 20,000 genome.fa circRNAs.tsv
mock ratio). The not-depleted and depleted circRNAs are defined as
out.tsv. RBPs binding to the flanking regions (±1,000 nucleotides)
follows:
of BSJs was determined according to CLIP-supported RBP-binding
(
≥ 1; Not − depleted circRNAs sites downloaded from ENCORI (80) at http://starbase.sysu.edu.cn/.
treat=mock ratio G-quadruplexes across the BSJs were determined using the QGRS
< 1; Depleted circRNAs:
mapping algorithm with default parameters (81, 82). The predicted
G-quadruplexes should span the BSJ boundaries by ≥5 bp on both
sides of the BSJs. miRNA-binding sites across the BSJs were de-
Extractions of various factors termined using miRanda 3.3a (83) with pairing score ≥ 155. RBP-
binding sites across the BSJs were determined using RBPmap (84)
The factors of CIRI-full–identified full-length circular sequence, with a high stringency level and conservation filter. The predicted
number of circRNA detection tools, conservation patterns at the miRNA (or RBP)-binding sites should span the BSJ boundaries
individual (or sample), tissue, and species levels (including tissue by ≥5 (or ≥2) bp on both sides of the BSJs, respectively. The seven
specificity index) were extracted from circAtlas 2.0 (22). The BSJ types of evidence for coding potential of more than 320,000
events with the direct support of full-length circular sequences human circRNAs (including circAtlas circRNAs) were extracted
from nanopore long RNA-seq reads were downloaded from Liu from TransCirc (85). The seven types of evidence included
et al’s study (44). The evolutionary rates were determined by phyloP ribosome/polysome-binding evidence, experimentally supported
(45) or phastCons (46) scores, which were downloaded from the translation initiation site, internal ribosome entry site, predicted
UCSC Genome Browser at https://genome.ucsc.edu/. For each BSJ, m6A modification site, circRNA-specific ORF, sequence composition
we considered the four regions around the BSJ: (1) within +1 to +10 score for the ORF, and mass spectrometry data–supported peptide
nucleotides of the acceptor site; (2) within −10 to −1 nucleotides of across the BSJ. The translatable scores of circRNAs were down-
the acceptor site; (3) within −10 to −1 nucleotides of the donor site; loaded from TransCirc.

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 12 of 16


Extractions of high-confidence circRNAs and circAtlas-specific same colinear transcript isoforms; whether BSJs were subject to AS;
circRNAs and number of supporting functional features, respectively. In the
2
RCVE formula, rall 2
and rreduced represented the r2 values calculated
The high-confidence circRNAs examined here included RT- by the GLM including all the eight factors and excluding the factor of
independent and RT-/non-RT–validated circRNAs (see text). The interest, respectively. The RCVE scores range from 0 to 1, with a
RT-independent circRNAs (1,478 events), which were supported by higher RCVE score indicating a more important contribution of the
both avian myeloblastosis virus- and Moloney murine leukemia examined factor to the model. The detailed results of the statistical
virus-derived RNA-seq reads, were extracted from our previous significance tests for each examined factor are given in Table S4.
study (8). The RT-/non-RT–validated circRNAs (553 events), in which
the NCL junctions (or their functions) were validated by RT-based
experiments and at least one type of non-RT–based experiments Data Availability
in human tissues/cell lines, were extracted from CircR2disease
(version 2.0) (55). The circAtlas-specific circRNA candidates were the The data underlying this study are available in the text and in
BSJ events that were collected in circAtlas only rather than in eight Tables S1–S4. Tables S1–S4 and all the codes used to generate the
other publicly accessible circRNA databases (including CIRCpedia results are available at GitHub (https://github.com/TreesLab/
v2 (86), CircRic (87), MiOncoCirc v2.0 (88), TSCD (51), circBase (89), circRNA_features).
circRNADb (90), and exoRBase 2.0 (91)). All the circRNA candidates
deposited in CircR2disease or non-circAtlas circRNA databases
were not considered if the NCL junction coordinates were not
available or cannot be converted to the genomic coordinates on the Supplementary Information
GRCh38 assembly by liftOver (92). For the NCL junctions with non-
assigned strands, the junction coordinates were matched to the Supplementary Information is available at https://doi.org/10.26508/lsa.
202201793.
circRNA candidates in circAtlas. The analyzed circRNA candidates,
the related factors, and the treat/mock ratios of the corresponding
mock-treated sample pairs are given in Table S3. All the coordinates
in the table were one-based. Acknowledgements

Statistics analysis This work was supported by T-J Chuang, Genomics Research Center, Aca-
demia Sinica, Taiwan and the National Science and Technology Council,
Taiwan (MOST 111-2311-B-001-021). We thank the related organizations for
For enrichment analyses, we compared not-depleted circRNAs with generating and providing the RNA-seq data used in this study. We also thank
depleted ones in terms of various factors related to circRNA other members of T-J Chuang laboratory for helpful discussions.
identification, conservation, biogenesis, and function. P-values
were determined using two-tailed Fisher’s exact test or Wilcoxon Author Contributions
rank-sum test (WRST; greater and less) (see also the figure legends).
P-values were further FDR adjusted across 19 mock-treated sample T-J Chuang: conceptualization, formal analysis, investigation,
pairs for each examined feature using Benjamini–Hochberg cor- methodology, and writing—original draft, review, and editing.
rection. The Wilcoxon effect sizes and the corresponding 95% T-W Chiang and C-Y Chen: software and methodology.
confidence intervals were calculated with the wilcox_effsize
function implemented in the rstatix R package (93). To measure the
Conflict of Interest Statement
relative effect of individual factor in determining circRNA reliability,
we first employed a GLM with all the examined factors using the
The authors declare that they have no conflict of interest.
statsmodels package (0.13.2) (94) in Python and then performed the
RCVE (50) as follows:

glm ðy ~ f 1 + f 2 + f 3 + f 4 + f 5 + f 6 + f 7 + f 8 ; family = binomialÞ References


2
rall − rreduced
2
1 Chen I, Chen CY, Chuang TJ (2015) Biogenesis, identification, and function
RCVE = 2 of exonic circular RNAs. Wiley Interdiscip Rev RNA 6: 563–579. doi:10.1002/
rall
wrna.1294

In the GLM, y indicated whether a circRNA candidate was a not- 2 Chen LL (2020) The expanding regulatory mechanisms and cellular
functions of circular RNAs. Nat Rev Mol Cell Biol 21: 475–490. doi:10.1038/
depleted circRNA (i.e., not-depleted circRNA = 1; depleted circRNA =
s41580-020-0243-y
0). f1~f8 represented the eight factors examined in the mode:
3 Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF,
supporting BSJ read count; whether BSJs were detected by multiple
Sharpless NE (2013) Circular RNAs are abundant, conserved, and
tools; whether BSJs can be reconstructed full-length circRNAs; associated with ALU repeats. RNA 19: 141–157. doi:10.1261/rna.035667.112
number of samples observed the circRNAs; whether BSJs agreed to 4 Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO (2013) Cell-type
annotated exon boundaries; whether BSJs had both donor and specific features of circular RNA expression. PLoS Genet 9: e1003777.
acceptor splice sites matching annotated exon boundaries from the doi:10.1371/journal.pgen.1003777

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 13 of 16


5 Guo JU, Agarwal V, Guo H, Bartel DP (2014) Expanded identification and 23 Vromman M, Vandesompele J, Volders PJ (2021) Closing the circle: Current
characterization of mammalian circular RNAs. Genome Biol 15: 409. state and perspectives of circular RNA databases. Brief Bioinform 22:
doi:10.1186/s13059-014-0409-z 288–297. doi:10.1093/bib/bbz175
6 Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO (2012) Circular RNAs 24 Hansen TB, Veno MT, Damgaard CK, Kjems J (2016) Comparison of circular
are the predominant transcript isoform from hundreds of human genes RNA prediction tools. Nucleic Acids Res 44: e58. doi:10.1093/nar/gkv1458
in diverse cell types. PLoS One 7: e30733. doi:10.1371/
25 Zeng X, Lin W, Guo M, Zou Q (2017) A comprehensive overview and
journal.pone.0030733
evaluation of circular RNA detection tools. PLoS Comput Biol 13:
7 Zheng Q, Bao C, Guo W, Li S, Chen J, Chen B, Luo Y, Lyu D, Li Y, Shi G, et al e1005420. doi:10.1371/journal.pcbi.1005420
(2016) Circular RNA profiling reveals an abundant circHIPK3 that
26 Chen CY, Chuang TJ (2019) Comment on [L8D2Q2M0]A comprehensive
regulates cell growth by sponging multiple miRNAs. Nat Commun 7: 11215.
overview and evaluation of circular RNA detection tools[R8D2Q2M1].
doi:10.1038/ncomms11215
PLoS Comput Biol 15: e1006158. doi:10.1371/journal.pcbi.1006158
8 Chuang TJ, Chen YJ, Chen CY, Mai TL, Wang YD, Yeh CS, Yang MY, Hsiao YT,
27 Szabo L, Salzman J (2016) Detecting circular RNAs: Bioinformatic and
Chang TH, Kuo TC, et al (2018) Integrative transcriptome sequencing
experimental challenges. Nat Rev Genet 17: 679–692. doi:10.1038/
reveals extensive alternative trans-splicing and cis-backsplicing in
nrg.2016.114
human cells. Nucleic Acids Res 46: 3671–3691. doi:10.1093/nar/gky032
28 Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L,
9 Chen LL, Yang L (2015) Regulation of circRNA biogenesis. RNA Biol 12:
Mackowiak SD, Gregersen LH, Munschauer M, et al (2013) Circular RNAs
381–388. doi:10.1080/15476286.2015.1020271
are a large class of animal RNAs with regulatory potency. Nature 495:
10 Yu CY, Liu HJ, Hung LY, Kuo HC, Chuang TJ (2014) Is an observed non-co- 333–338. doi:10.1038/nature11928
linear RNA product spliced in trans, in cis or just in vitro? Nucleic Acids
29 Zhang XO, Wang HB, Zhang Y, Lu X, Chen LL, Yang L (2014) Complementary
Res 42: 9410–9423. doi:10.1093/nar/gku643
sequence-mediated exon circularization. Cell 159: 134–147. doi:10.1016/
11 Rybak-Wolf A, Stottmeister C, Glazar P, Jens M, Pino N, Giusti S, Hanan M, j.cell.2014.09.001
Behm M, Bartok O, Ashwal-Fluss R, et al (2015) Circular RNAs in the
30 Zhang XO, Dong R, Zhang Y, Zhang JL, Luo Z, Zhang J, Chen LL, Yang L (2016)
mammalian brain are highly abundant, conserved, and dynamically
expressed. Mol Cell 58: 870–885. doi:10.1016/j.molcel.2015.03.027 Diverse alternative back-splicing and alternative splicing landscape of
circular RNAs. Genome Res 26: 1277–1287. doi:10.1101/gr.202895.115
12 Chen YJ, Chen CY, Mai TL, Chuang CF, Chen YC, Gupta SK, Yen L, Wang YD,
Chuang TJ (2020) Genome-wide, integrative analysis of circular RNA 31 Jakobi T, Uvarovskii A, Dieterich C (2019) circtools-a one-stop software
dysregulation and the corresponding circular RNA-microRNA-mRNA solution for circular RNA research. Bioinformatics 35: 2326–2328.
regulatory axes in autism. Genome Res 30: 375–391. doi:10.1101/ doi:10.1093/bioinformatics/bty948
gr.255463.119 32 Xiao MS, Wilusz JE (2019) An improved method for circular RNA
13 Huang A, Zheng H, Wu Z, Chen M, Huang Y (2020) Circular RNA-protein purification using RNase R that efficiently removes linear RNAs
interactions: Functions, mechanisms, and identification. Theranostics 10: containing G-quadruplexes or structured 3’ ends. Nucleic Acids Res 47:
3503–3517. doi:10.7150/thno.42174 8755–8769. doi:10.1093/nar/gkz576

14 You X, Vlatkovic I, Babic A, Will T, Epstein I, Tushev G, Akbalik G, Wang M, 33 Zhang Y, Yang L, Chen LL (2016) Characterization of circular RNAs.
Glock C, Quedenau C, et al (2015) Neural circular RNAs are derived from Methods Mol Biol 1402: 215–227. doi:10.1007/978-1-4939-3378-5_17
synaptic genes and regulated by development and plasticity. Nat 34 Rabin A, Zaffagni M, Ashwal-Fluss R, Patop IL, Jajoo A, Shenzis S, Carmel L,
Neurosci 18: 603–610. doi:10.1038/nn.3975 Kadener S (2021) SRCP: A comprehensive pipeline for accurate
15 Gasparini S, Licursi V, Presutti C, Mannironi C (2020) The secret garden of annotation and quantification of circRNAs. Genome Biol 22: 277.
neuronal circRNAs. Cells 9: 1815. doi:10.3390/cells9081815 doi:10.1186/s13059-021-02497-7

16 Piwecka M, Glažar P, Hernandez-Miranda LR, Memczak S, Wolf SA, Rybak- 35 Chen CY, Chuang TJ (2019) NCLcomparator: Systematically post-
Wolf A, Filipchyk A, Klironomos F, Cerda Jara CA, Fenske P, et al (2017) Loss screening non-co-linear transcripts (circular, trans-spliced, or fusion
of a mammalian circular RNA locus causes miRNA deregulation and RNAs) identified from various detectors. BMC Bioinformatics 20: 3.
affects brain function. Science 357: eaam8526. doi:10.1126/ doi:10.1186/s12859-018-2589-0
science.aam8526 36 Petkovic S, Muller S (2015) RNA circularization strategies in vivo and in
17 Mai TL, Chen CY, Chen YC, Chiang TW, Chuang TJ (2022) Trans-genetic vitro. Nucleic Acids Res 43: 2454–2465. doi:10.1093/nar/gkv045
effects of circular RNA expression quantitative trait loci and potential 37 Hansen TB (2018) Improved circRNA identification by combining
causal mechanisms in autism. Mol Psychiatry 27: 4695–4706. doi:10.1038/ prediction algorithms. Front Cell Dev Biol 6: 20. doi:10.3389/
s41380-022-01714-4 fcell.2018.00020
18 Liu J, Zhang X, Yan M, Li H (2020) Emerging role of circular RNAs in cancer. 38 Gao Y, Wang J, Zhao F (2015) CIRI: An efficient and unbiased algorithm for
Front Oncol 10: 663. doi:10.3389/fonc.2020.00663 de novo circular RNA identification. Genome Biol 16: 4. doi:10.1186/
19 Rajappa A, Banerjee S, Sharma V, Khandelia P (2020) Circular RNAs: s13059-014-0571-3
Emerging role in cancer diagnostics and therapeutics. Front Mol Biosci 7: 39 Gao Y, Wang J, Zheng Y, Zhang J, Chen S, Zhao F (2016) Comprehensive
577938. doi:10.3389/fmolb.2020.577938 identification of internal structure and alternative splicing events in
20 Chen L, Wang C, Sun H, Wang J, Liang Y, Wang Y, Wong G (2021) The circular RNAs. Nat Commun 7: 12060. doi:10.1038/ncomms12060
bioinformatics toolbox for circRNA discovery and analysis. Brief 40 Mercer TR, Clark MB, Andersen SB, Brunck ME, Haerty W, Crawford J, Taft
Bioinform 22: 1706–1728. doi:10.1093/bib/bbaa001 RJ, Nielsen LK, Dinger ME, Mattick JS (2015) Genome-wide discovery of
21 Xia S, Feng J, Chen K, Ma Y, Gong J, Cai F, Jin Y, Gao Y, Xia L, Chang H, et al human splicing branchpoints. Genome Res 25: 290–303. doi:10.1101/
(2018) CSCD: A database for cancer-specific circular RNAs. Nucleic Acids gr.182899.114
Res 46: D925–D929. doi:10.1093/nar/gkx863 41 Li Y, Wang F, Teng P, Ku L, Chen L, Feng Y, Yao B (2022) Accurate
22 Wu W, Ji P, Zhao F (2020) CircAtlas: An integrated resource of one million identification of circRNA landscape and complexity reveals their pivotal
highly accurate circular RNAs from 1070 vertebrate transcriptomes. roles in human oligodendroglia differentiation. Genome Biol 23: 48.
Genome Biol 21: 101. doi:10.1186/s13059-020-02018-y doi:10.1186/s13059-022-02621-1

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 14 of 16


42 Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, Hilton JA, 59 Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, Kang H, Kim J, Lee S (2010)
Jain K, Baymuradov UK, Narayanan AK, et al (2018) The encyclopedia of ChimerDB 2.0–A knowledgebase for fusion genes updated. Nucleic Acids
DNA elements (ENCODE): Data portal update. Nucleic Acids Res 46: Res 38: D81–D85. doi:10.1093/nar/gkp982
D794–D801. doi:10.1093/nar/gkx1081
60 Al-Balool HH, Weber D, Liu Y, Wade M, Guleria K, Nam PLP, Clayton J, Rowe
43 Zheng Y, Ji P, Chen S, Hou L, Zhao F (2019) Reconstruction of full-length W, Coxhead J, Irving J, et al (2011) Post-transcriptional exon shuffling
circular RNAs enables isoform-level quantification. Genome Med 11: 2. events in humans can be evolutionarily conserved and abundant.
doi:10.1186/s13073-019-0614-1 Genome Res 21: 1788–1799. doi:10.1101/gr.116442.110
44 Liu Z, Tao C, Li S, Du M, Bai Y, Hu X, Li Y, Chen J, Yang E (2021) circFL-seq 61 Chuang TJ, Wu CS, Chen CY, Hung LY, Chiang TW, Yang MY (2016) NCLscan:
reveals full-length circular RNAs with rolling circular reverse Accurate identification of non-co-linear transcripts (fusion, trans-
transcription and nanopore sequencing. Elife 10: e69457. doi:10.7554/ splicing and circular RNA) with a good balance between sensitivity and
elife.69457 precision. Nucleic Acids Res 44: e29. doi:10.1093/nar/gkv1013
45 Pertea M, Pertea GM, Salzberg SL (2011) Detection of lineage-specific 62 Takahara T, Kanazu SI, Yanagisawa S, Akanuma H (2000) Heterogeneous
evolutionary changes among primate species. BMC Bioinformatics 12: Sp1 mRNAs in human HepG2 cells include a product of homotypic trans-
274. doi:10.1186/1471-2105-12-274 splicing. J Biol Chem 275: 38067–38072. doi:10.1074/jbc.m002010200
46 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, 63 Flouriot G, Brand H, Seraphin B, Gannon F (2002) Natural trans-spliced
Clawson H, Spieth J, Hillier LW, Richards S, et al (2005) Evolutionarily mRNAs are generated from the human estrogen receptor-alpha (hER
conserved elements in vertebrate, insect, worm, and yeast genomes. alpha) gene. J Biol Chem 277: 26244–26251. doi:10.1074/jbc.m203513200
Genome Res 15: 1034–1050. doi:10.1101/gr.3715005
64 Horiuchi T, Aigaki T (2006) Alternative trans-splicing: A novel mode of
47 Ashwal-Fluss R, Meyer M, Pamudurti NR, Ivanov A, Bartok O, Hanan M, pre-mRNA processing. Biol Cell 98: 135–140. doi:10.1042/bc20050002
Evantal N, Memczak S, Rajewsky N, Kadener S (2014) circRNA biogenesis
65 Gingeras TR (2009) Implications of chimaeric non-co-linear transcripts.
competes with pre-mRNA splicing. Mol Cell 56: 55–66. doi:10.1016/
Nature 461: 206–211. doi:10.1038/nature08452
j.molcel.2014.08.019
66 Kelly S, Greenman C, Cook PR, Papantonis A (2015) Exon skipping is
48 Starke S, Jost I, Rossbach O, Schneider T, Schreiner S, Hung LH, Bindereif
correlated with exon circularization. J Mol Biol 427: 2414–2417.
A (2015) Exon circularization requires canonical splice signals. Cell Rep
doi:10.1016/j.jmb.2015.02.018
10: 103–111. doi:10.1016/j.celrep.2014.12.002
67 Zhang P, Zhang XO, Jiang T, Cai L, Huang X, Liu Q, Li D, Lu A, Liu Y, Xue W,
49 Wang Y, Wang Z (2015) Efficient backsplicing produces translatable
et al (2020) Comprehensive identification of alternative back-splicing in
circular mRNAs. RNA 21: 172–179. doi:10.1261/rna.048272.114
human tissue transcriptomes. Nucleic Acids Res 48: 1779–1789.
50 Kvikstad EM, Tyekucheva S, Chiaromonte F, Makova KD (2007) A macaque doi:10.1093/nar/gkaa005
[R8S2Q1M7]s-eye view of human insertions and deletions: Differences in
68 Lu T, Cui L, Zhou Y, Zhu C, Fan D, Gong H, Zhao Q, Zhou C, Zhao Y, Lu D, et al
mechanisms. PLoS Comput Biol 3: 1772–1782. doi:10.1371/
(2015) Transcriptome-wide investigation of circular RNAs in rice. RNA 21:
journal.pcbi.0030176
2076–2087. doi:10.1261/rna.052282.115
51 Xia S, Feng J, Lei L, Hu J, Xia L, Wang J, Xiang Y, Liu L, Zhong S, Han L, et al
69 Errichelli L, Dini Modigliani S, Laneve P, Colantoni A, Legnini I, Capauto D,
(2017) Comprehensive characterization of tissue-specific circular RNAs
Rosa A, De Santis R, Scarfo R, Peruzzi G, et al (2017) FUS affects circular
in the human and mouse genomes. Brief Bioinform 18: 984–992.
RNA expression in murine embryonic stem cell-derived motor neurons.
doi:10.1093/bib/bbw081
Nat Commun 8: 14741. doi:10.1038/ncomms14741
52 Houseley J, Tollervey D (2010) Apparent non-canonical trans-splicing is
70 Shen H, An O, Ren X, Song Y, Tang SJ, Ke XY, Han J, Tay DJT, Ng VHE, Molias
generated by reverse transcriptase in vitro. PLoS One 5: e12271.
FB, et al (2022) ADARs act as potent regulators of circular transcriptome
doi:10.1371/journal.pone.0012271
in cancer. Nat Commun 13: 1508. doi:10.1038/s41467-022-29138-2
53 Wu CS, Yu CY, Chuang CY, Hsiao M, Kao CF, Kuo HC, Chuang TJ (2014)
71 Yang L, Wilusz JE, Chen LL (2022) Biogenesis and regulatory roles of
Integrative transcriptome sequencing identifies trans-splicing events
circular RNAs. Annu Rev Cell Dev Biol 38: 263–289. doi:10.1146/annurev-
with important roles in human embryonic stem cell pluripotency.
cellbio-120420-125117
Genome Res 24: 25–36. doi:10.1101/gr.159483.113
72 Gao Y, Zhang J, Zhao F (2018) Circular RNA identification based on
54 McManus CJ, Duff MO, Eipper-Mains J, Graveley BR (2010) Global analysis
multiple seed matching. Brief Bioinform 19: 803–810. doi:10.1093/bib/
of trans-splicing in Drosophila. Proc Natl Acad Sci U S A 107: 12975–12979.
bbx014
doi:10.1073/pnas.1007586107
73 Dong R, Ma XK, Chen LL, Yang L (2019) Genome-Wide annotation of
55 Fan C, Lei X, Tie J, Zhang Y, Wu FX, Pan Y (2022) CircR2Disease v2.0: An
circRNAs and their alternative back-splicing/splicing with CIRCexplorer
updated web server for experimentally validated circRNA-disease
pipeline. Methods Mol Biol 1870: 137–149. doi:10.1007/978-1-4939-8808-
associations and its application. Genomics Proteomics Bioinformatics
2_10
20: 435–445. doi:10.1016/j.gpb.2021.10.002
74 Cheng J, Metge F, Dieterich C (2016) Specific identification and
56 Westholm JO, Miura P, Olson S, Shenker S, Joseph B, Sanfilippo P, Celniker
quantification of circular RNAs from sequencing data. Bioinformatics 32:
SE, Graveley BR, Lai EC (2014) Genome-wide analysis of drosophila
1094–1096. doi:10.1093/bioinformatics/btv656
circular RNAs reveals their structural and sequence properties and age-
dependent neural accumulation. Cell Rep 9: 1966–1980. doi:10.1016/ 75 Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P,
j.celrep.2014.10.062 Chaisson M, Gingeras TR (2013) STAR: Ultrafast universal RNA-seq aligner.
Bioinformatics 29: 15–21. doi:10.1093/bioinformatics/bts635
57 Hoffmann S, Otto C, Doose G, Tanzer A, Langenberger D, Christ S, Kunz M,
Holdt LM, Teupser D, Hackermuller J, et al (2014) A multi-split mapping 76 Li H, Durbin R (2009) Fast and accurate short read alignment with
algorithm for circular RNA, splicing, trans-splicing and fusion detection. Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. doi:10.1093/
Genome Biol 15: R34. doi:10.1186/gb-2014-15-2-r34 bioinformatics/btp324
58 Gaffo E, Buratin A, Dal Molin A, Bortoluzzi S (2022) Sensitive, reliable and 77 Xu H, Luo X, Qian J, Pang X, Song J, Qian G, Chen J, Chen S (2012) FastUniq: A
robust circRNA detection from RNA-seq with CirComPara2. Brief fast de novo duplicates removal tool for paired short reads. PLoS One 7:
Bioinform 23: bbab418. doi:10.1093/bib/bbab418 e52249. doi:10.1371/journal.pone.0052249

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 15 of 16


78 Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence 87 Ruan H, Xiang Y, Ko J, Li S, Jing Y, Zhu X, Ye Y, Zhang Z, Mills T, Feng J, et al
motifs with applications to RNA splicing signals. J Comput Biol 11: (2019) Comprehensive characterization of circular RNAs in ~ 1000 human
377–394. doi:10.1089/1066527041410418 cancer cell lines. Genome Med 11: 55. doi:10.1186/s13073-019-0663-5
79 Chiang TW, Mai TL, Chuang TJ (2022) CircMiMi: A stand-alone software for 88 Vo JN, Cieslik M, Zhang Y, Shukla S, Xiao L, Zhang Y, Wu YM, Dhanasekaran
constructing circular RNA-microRNA-mRNA interactions across species. SM, Engelke CG, Cao X, et al (2019) The landscape of circular RNA in
BMC Bioinformatics 23: 164. doi:10.1186/s12859-022-04692-0 cancer. Cell 176: 869–881.e13. doi:10.1016/j.cell.2018.12.021

80 Li JH, Liu S, Zhou H, Qu LH, Yang JH (2014) starBase v2.0: decoding miRNA- 89 Glazar P, Papavasileiou P, Rajewsky N (2014) circBase: a database for
ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large- circular RNAs. RNA 20: 1666–1670. doi:10.1261/rna.043687.113
scale CLIP-Seq data. Nucleic Acids Res 42: D92–D97. doi:10.1093/nar/ 90 Chen X, Han P, Zhou T, Guo X, Song X, Li Y (2016) circRNADb: A
gkt1248 comprehensive database for human circular RNAs with protein-coding
81 Frees S, Menendez C, Crum M, Bagga PS (2014) QGRS-conserve: A annotations. Sci Rep 6: 34985. doi:10.1038/srep34985
computational method for discovering evolutionarily conserved G- 91 Lai H, Li Y, Zhang H, Hu J, Liao J, Su Y, Li Q, Chen B, Li C, Wang Z, et al (2022)
quadruplex motifs. Hum Genomics 8: 8. doi:10.1186/1479-7364-8-8 exoRBase 2.0: An atlas of mRNA, lncRNA and circRNA in extracellular
vesicles from human biofluids. Nucleic Acids Res 50: D118–D128.
82 Menendez C, Frees S, Bagga PS (2012) QGRS-H predictor: A web server for
doi:10.1093/nar/gkab1085
predicting homologous quadruplex forming G-rich sequence motifs in
nucleotide sequences. Nucleic Acids Res 40: W96–W103. doi:10.1093/nar/ 92 Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H,
gks422 Diekhans M, Furey TS, Harte RA, Hsu F, et al (2006) The UCSC genome
browser database: Update 2006. Nucleic Acids Res 34: D590–D598.
83 Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS (2003) MicroRNA
doi:10.1093/nar/gkj144
targets in Drosophila. Genome Biol 5: R1. doi:10.1186/gb-2003-5-1-r1
93 Tomczak M, Tomczak E (2014) The need to report effect size estimates
84 Paz I, Kosti I, Ares M Jr., Cline M, Mandel-Gutfreund Y (2014) RBPmap: A
revisited. An overview of some recommended measures of effect size.
web server for mapping binding sites of RNA-binding proteins. Nucleic Trends Sport Sci 1: 19–25.
Acids Res 42: W361–W367. doi:10.1093/nar/gku406
94 Seabold S, Perktold J (2010) statsmodels: Econometric and statistical
85 Huang W, Ling Y, Zhang S, Xia Q, Cao R, Fan X, Fang Z, Wang Z, Zhang G modeling with python. In Proceedings of the 9th Python in Science
(2021) TransCirc: An interactive database for translatable circular RNAs Conference. Austin, TX: SciPy Organizers. doi:10.25080/Majora-92bf1922-011
based on multi-omics evidence. Nucleic Acids Res 49: D236–D242.
doi:10.1093/nar/gkaa823
86 Dong R, Ma XK, Li GW, Yang L (2018) CIRCpedia v2: An updated database License: This article is available under a Creative
for comprehensive circular RNA annotation and expression comparison. Commons License (Attribution 4.0 International, as
Genomics Proteomics Bioinformatics 16: 226–233. doi:10.1016/ described at https://creativecommons.org/
j.gpb.2018.08.001 licenses/by/4.0/).

Important indicators of circRNA reliability Chuang et al. https://doi.org/10.26508/lsa.202201793 vol 6 | no 5 | e202201793 16 of 16

You might also like