Professional Documents
Culture Documents
Pan 08 Nature Genet
Pan 08 Nature Genet
Pan 08 Nature Genet
Deep surveying of alternative all hypothetical, additional 5 to 3 pairings of splice sites in the same
set of genes (Fig. 1a; and Supplementary Methods online). Mining of
splicing complexity in the a dataset of 15,702 multiexon UniGene clusters, each containing one
or more locus-specific RefSeq cDNA, resulted in the compilation of
human transcriptome by
2008 Nature Publishing Group http://www.nature.com/naturegenetics
1Banting and Best Department of Medical Research, University of Toronto, Toronto M5S 3E1, Canada. 2Department of Electrical and Computer Engineering, University
of Toronto, Toronto M5S 3G4, Canada. 3Department of Molecular Genetics, University of Toronto, Toronto M5S 3E1, Canada. Correspondence should be addressed to
B.J.B. (b.blencowe@utoronto.ca).
Received 17 July; accepted 19 August; published online 2 November 2008; addendum published after print 28 April 2009; doi:10.1038/ng.259
Ce
2008 Nature Publishing Group http://www.nature.com/naturegenetics
60
He
splice junctions representing evidence for skipping of one or more alternative
50
cassette exons in mRNA-Seq read alignments. The tissue distribution of
40 Li
30
these junction reads was plotted as the percentage of junctions that appear
20
Lu in one to all six tissues. (d) Tissue distributions of new splice junctions
10 Sk detected in pairs of tissues. The size of each blue box indicates the number
0
Br Ce He Li Lu Sk
of junctions shared between a given pair of tissues, with the highest number
1 2 3 4 5 6 of shared junctions corresponding to the largest box. Br, whole brain; Ce,
Number of tissues
cerebral cortex; He, heart; Li, liver; Lu, lung; Sk, skeletal muscle.
However, at increased levels of read coverage (that is, 16 to >500 reads tissue-specific alternative splicing events, in addition to new alter-
per 100 nucleotides), alternative splicing events can be detected in 92 native splicing events in transcripts with tissue-restricted expression
97% of multiexon genes (Supplementary Methods). This represents a patterns. Supplementary Table 1 online lists genes with more than
substantial increase over a previous estimate (74%)4 for the proportion five new splice junctions. Many of these genes encode giant and other
of multiexon genes that contain one or more alternative splicing event. muscle-specific proteins, thus revealing a previously unappreciated
Given that our analysis of mRNA-Seq data detects approximately half degree of alternative splicing complexity in transcripts from muscle-
of known junctions, and that there is an almost linear increase in the specific genes. These findings are consistent with previous proposals
detection rate of new junctions as data from each tissue is added that alternative splicing of transcripts encoding some of these proteins
(Fig. 1b), we predict that with full coverage the numbers of new has an important role in controlling fundamental mechanical proper-
junctions would be at least twice those detected in the present data. ties of muscle, such as tension and contractility14.
To assess the degree to which known and new junctions detected in As mRNA-Seq data affords the detection of alternative splicing
the mRNA-Seq data may represent tissue-dependent splicing events, events in transcripts irrespective of their length and associated splicing
we investigated the frequencies at which
detected splice junctions formed by skipping
of one or more exons are unique to indivi- a b
Number of AS events
4.5
Number of AS events per exon
0.3
significantly more junctions that were 0.2 4
detected in only one tissue, and these could 0.1 3.5
represent tissue-specific or tissue-restricted 0 3
4
8
13 2
17 6
21 0
25 4
8
37 6
0
0
0
1
1
2
2
2
5
>5
9
29
33
41
150 1.5
expressed but only detected in a single tissue.
100 1
We also examined the tissue specificity of all
0.5
(n 439) new splice junctions that were 50
0
detected in two of the six tissues. In the plot 0
4 4
>2
4
8
16 6
32 2
64 4
8 8
25 256
1, 1, 2
2 8
48
02 02
2
4
1
3
6
12 12
2 1
04
13 2
6
21 0
25 4
29 8
33 2
37 6
41 0
0
51 5
,0
1
5
1
1
2
3
4
5
>5
2,
9
6
17
complexity, the relationship between alternative splicing frequency predictions for percent inclusion. The correlation increases (r 0.85,
relative to exon number in genes can be accurately assessed. Accord- n 546) when a threshold of 50 or more junction matching reads is
ingly, we determined the median number of alternative splicing events used. Predictions for percent inclusion levels from the two
per exon for genes with different numbers of total exons and with systems also agree well for tissue-regulated alternative exons (Supple-
similar overall levels of Illumina read coverage (Fig. 2a). Despite the mentary Fig. 2b and Supplementary Information). Together, the
theoretical possibility of a quadratic (n2) increase in the number of results described above show that mRNA-Seq data can be used to
alternative splicing possibilities as the number of exons per gene reliably measure alternative splicing levels, in addition to revealing
increases, our results indicate that the number of alternative splicing important new insights into alternative splicing complexity in the
events per gene increases in a near linear fashion (Fig. 2a). Thus, human transcriptome.
notably, the frequency of alternative splicing detection per exon does
Note: Supplementary information is available on the Nature Genetics website.
not rise in genes with increasing numbers of exons, and this observa-
tion suggests that selection pressure may act to generally limit splicing ACKNOWLEDGMENTS
complexity in large genes. This observation facilitates assessment of We thank S. Luo, I. Khrebtukova and G. Schroth of Illumina Inc. for providing
2008 Nature Publishing Group http://www.nature.com/naturegenetics
the total number of alternative splicing events in human tissues. some of the mRNA-Seq datasets used in this analysis. We also thank M. Brudno,
When considering that the frequency of new alternative splicing Y. Barash, J. Calarco and S. Ahmad for helpful suggestions and comments on the
manuscript. B.J.B and B.J.F. acknowledge support from the Canadian Institutes
events detected in the six different tissues can be extrapolated to other of Health Research and from Genome Canada through the Ontario
human tissues, it is possible to derive an estimate for the total number Genomics Institute.
of alternative splicing events that can be detected by comparable
methods. By combining the rates of detection of new and known AUTHOR CONTRIBUTIONS
alternative splicing events afforded by mRNA-Seq and EST-cDNA Q.P. created the exon and splice junction libraries and performed analyses of the
mRNA-Seq, cDNA-EST and microarray data. O.S., L.J.L. and B.J.F. designed and
data, respectively, we observed that the median number of alternative implemented the logistic regression classifier and contributed to the analyses of
splicing events per exon is between 0.5 and 0.75 for genes with tissue-specific alternative splicing events. The study was coordinated by B.J.B.
intermediate to high levels of Illumina sequence coverage (32256 The manuscript was prepared by B.J.B. and Q.P., with the participation of O.S.,
reads per 100 bases; see Fig. 2b). Given that 175,944 exons were mined L.J.L. and B.J.F.
from 15,702 multiexon human genes, we predict that on the order of
Published online at http://www.nature.com/naturegenetics/
88,000132,000 alternative splicing events of comparable abundance
Reprints and permissions information is available online at http://npg.nature.com/
as those detected in the present study are expressed in major human reprintsandpermissions/
tissues. This estimate further suggests that, on average, there are at
least seven alternative splicing events per multiexon human gene.
1. Matlin, A.J., Clark, F. & Smith, C.W. Nat. Rev. Mol. Cell Biol. 6, 386398
An important question concerning high-throughput sequencing
(2005).
technologies is their capacity to generate reliable quantitative mea- 2. Blencowe, B.J. Cell 126, 3747 (2006).
surements for alternative splicing levels. To address this question, we 3. Ben-Dov, C., Hartmann, B., Lundgren, J. & Valcarcel, J. J. Biol. Chem. 283,
12291233 (2008).
compared estimates for percent exon inclusion from the mRNA-Seq 4. Johnson, J.M. et al. Science 302, 21412144 (2003).
data described above with percent inclusion estimates generated from 5. Sorek, R., Dror, G. & Shamir, R. BMC Genomics 7, 273 (2006).
profiling B5000 cassette-type alternative exons in the same six human 6. Calarco, J.A. et al. Adv. Exp. Med. Biol. 623, 6484 (2007).
7. Bainbridge, M.N. et al. BMC Genomics 7, 246 (2006).
tissues using our previously validated15, quantitative alternative 8. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Methods 5,
splicing microarray system (unpublished data, Supplementary 621628 (2008).
9. Cloonan, N. et al. Nat. Methods 5, 613619 (2008).
Fig. 2a and Supplementary Methods online). When applying a 10. Sultan, M. et al. Science 321, 956960 (2008).
threshold of 20 or more reads per tissue that match any one of the 11. Su, A.I. et al. Proc. Natl. Acad. Sci. USA 101, 60626067 (2004).
three splice junction sequences representing inclusion and skipping of 12. Zhang, W. et al. J. Biol. 3, 21 (2004).
13. Yeo, G., Holste, D., Kreiman, G. & Burge, C.B. Genome Biol. 5, R74 (2004).
a cassette exon, there is a high correlation (r 0.80, n 1,548) 14. Schiaffino, S. & Reggiani, C. Physiol. Rev. 76, 371423 (1996).
between the alternative splicing microarray- and mRNA-Seqderived 15. Pan, Q. et al. Mol. Cell 16, 929941 (2004).