Professional Documents
Culture Documents
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
RNA Sequencing
An introduction to efficient planning and execution of RNA sequencing
(RNA-Seq) experiments.
Selected Applications of RNA-Seq parison of data sets obtained from regulatory purposes, may be used
Transcriptome studies are well exper imental conditions (e.g. drug to develop biomarkers specific to
suited to understand disease mech- treatments) and controls to deter- a medical condition. Such differen-
anisms, developmental mecha- mine the difference in transcript abun- tially expressed miRNA, can then be
nisms, or response to various stress- dance. The focus here is on messenger experimentally verified to develop
ors. Differential expression analysis RNA (mRNA). In addition, non-cod- diagnostic qPCR kits for instance.
of RNA-Seq data relies on the com- ing miRNAs, which often have gene
Experimental Design usually suffice for accurate mapping. per replicate for eukaryotic organisms
For a successful experiment, many However, paired-end sequencing (and and 10 million single-end reads for
aspects, including experimental setup, in some cases longer reads, for instance each replicate for prokaryotic organ-
sampling, and funding are to be con- as produced by Pacific Biosciences isms. For miRNA the read numbers
sidered. In addition, the number of (PacBio) sequencing technologies) is may be halved. It is also worth men-
biological replicates and the number required if highly accurate transcript tioning that the External RNA Controls
of reads produced for each replicate quantification, determination of gene Consortium (ERCC) has developed a
are essential parameters to produce fusions or novel splice variant detec- set of external RNA controls designed
valid results [5], especially to detect tion is envisaged. In contrast to sin- to mimic natural eukaryotic mRNA
the maximal number of differentially gle-end sequencing, paired-end sequences [6]. These sequences may
expressed genes which includes rare sequencing enables reading both ends be spiked in after RNA isolation and can
transcripts. As gene expression anal- of a (c)DNA fragment. Generally, it is be used to estimate the uncertainty in
ysis builds on counting reads from recommended to work at least in trip- the subsequent measurements.
the respective transcription unit, sin- licates per experimental condition and
gle-end reads of 75 bases length sequence 30 million single-end reads
RNA Isolation of the input RNA. The acronym GIGO distorted upon amplification of low
Obtaining high quality RNA is critical. “garbage in, garbage out” holds true amounts of input RNA, using either
RNA degradation is detrimental to the in this case as well. A notable excep- transcription-based or PCR-based
experiments since it may introduce 3’ tion is RNA extracted from formalde- amplification methods. However,
biases during polyadenylation (polyA) hyde-fixed paraffin-embedded (FFPE) with careful controls and a sufficient
enrichment or may distort the tran- tissue obtained by laser-assisted dis- number of biological replicates, the
script profile by differentially affect- section methods, where a certain adverse effects can be minimized.
ing different RNAs. Thus, great care amount of degradation is unavoid-
is needed to preserve the integrity able. Transcript profiles may also be
Sequencing Library Construction struction may use alternative tech- After adapter ligation and before PCR
Depending on the desired RNA type niques to enrich the relevant RNA amplification, uracil-DNA glycosylase
(coding or non-coding) and type of fraction. The constructed libraries are (UNG) is added to degrade the second
organism studied (eukaryote or pro- stranded, meaning they retain the strand. As a result, all reads start in the
karyote), different sequencing library strand information of the sequenced same orientation, allowing the iden-
types are constructed. For instance, molecule, which results in a more reli- tification of the transcribed strand. A
to sequence mRNA, a polyA enrich- able quantification of gene expression schematic depiction of how total RNA
ment step is performed for eukary- [7]. One typical method to keep the is turned into a sequenceable Illumina
otes, while a ribosomal RNA deple- RNA strandness makes use of uracil cDNA library is shown in Figure 2.
tion step is carried out for prokaryotes. instead of thymine for incorporation
Kits for non-coding RNA library con- during second strand cDNA synthesis.
Anti-sense DNA
will be antisense
First strand second strand will be sense
Sense DNA
Sense DNA
Sense DNA
sequencing Indexing sequencing
Figure 3, is especially well suited for
RNA-Seq, as it is fast, accurate and cost
effective [8]. Sequenced reads are pro-
duced in the standard fastq format [9] Index read
5‘ 5‘ 5‘ 5‘
that incorporates both sequence infor-
mation and quality scoring and can be Flow Cell Flow Cell Flow Cell Flow Cell
further processed in downstream anal-
yses. Figure 3: Schematic of Illuminas paired-end sequencing workflow.
Table 1. This excerpt of a table shows the main
output of a differential gene expression analysis. In
this experiment two conditions with three replicates
are compared to each other. The table lists from
left to right the gene identifier, boxplots represent-
ing expression level distributions of the replicates,
the log2 fold change of gene expression between
condition 1 and condition 2, the propability value
(p-value) of the log2 fold change and the p-value
adjusted for multiple testing.
fold change
< Neuroendocrine cancers> 2.4 0 2.3
DNA MDM2 Inhibition of apoptosis
Neuroblastoma
PTK2 Cell survival
Amplification
TP53 Apoptosis
MYCN MAX
COMMD3-BMI1 Repression of tumor
supressors
SP1 DNA
ZBTB17 NTRK1 Figure 5. Excerpt from the Kyoto Encyclopedia
Inhibition of apoptosis
Carcinoid (TF) NGFR of Genes and Genomes (KEGG) pathway graph
Mutation DNA “TRANSCRIPTIONAL MISREGULATION IN CANCER”,
MEN1 TF CDKN1B Proliferation?
where colored nodes represent significantly up- or
MLL (TF)
downregulated genes in the selected pathway.
Based on the differential gene expres- metabolic processes as exemplified of such analyses may be submitted to
sion results and depending on the in Figure 5. A useful additional anal- public databases such as miRNet [12]
content of gene information published ysis in the case of miRNAs comprises for further network-based visual analy-
in databases, gene set and pathway a motif search to identify potential sis. Figure 6 depicts such a motif iden-
analysis may be carried out to illumi- miRNA targets and to uncover addi- tified by a miRNA analysis.
nate the larger context of the involved tional, novel miRNAs [11]. The results
UUUGAGUC
AU
A Figure 6. A depiction of a significant de novo miRNA
motif discovered in a miRNA Seq analysis. A miRNA
U motif is a region that is well conserved in many of
G the analyzed sequences.
CAA UA C
Summary
Obviously, RNA-Seq is not limited to tated reference genome is available. resulting in a ready-to-use de novo
dealing with questions of differen- In short, RNA is collected from as many transcriptome [13].
tial gene expression or identification different stages and tissues as possi- RNA-Seq provides a snapshot of the
of miRNA, which have been discussed ble. The entire RNA is then enriched transcriptome in cells and cell pop-
in the previous sections. Table 2 lists for polyadenylated mRNA. The pool ulations, making it a very attractive
common RNA-Seq applications. The of mRNA, which ideally represents and powerful method. However, the
table can serve as a guide for selecting all transcribed genes, is then normal- results of the RNA-Seq experiments are
an appropriate approach to a research ized to reduce abundant mRNAs and complex because they produce a large
question. Another application of RNA enrich rare mRNAs. The normalized amount of fragmented data. However,
Seq technology is, for example, de novo transcripts are sequenced, then assem- with the right approach, the challenge
transcriptome assembly and annota- bled in a second step and annotated of extracting knowledge is reduced to
tion, which is useful when no anno- with various databases in a third step, a manageable task.
Table 2. This table provides an overview of common scientifc questions in the field of RNA-Seq and gives a brief overview of the most important points that
need to be considered in a RNA-Seq project. The table is intended as a quick reference guide.
mRNA and mRNA and e.g. miRNA and mRNA and mRNA,
Material and availability of annotated availability of annotated availability of annotated availability of annotated missing annotated
Resources reference genome reference genome non-coding RNA and
reference genome
reference Genome reference Genome
Total RNA isolation; Total RNA isolation; Total RNA isolation; Total RNA isolation; Total RNA isolation;
Sample stranded polyA enriched stranded ribo-depleted non-coding RNA
enriched sequencing
stranded polyA enriched normalized mRNA
sequencing library sequencing library
Preparation sequencing library sequencing library
library
30 Mio single-end reads, 10 Mio single-end reads, 15 Mio single-end reads, 50 Mio paired-end reads, 20 Mio paired-end reads,
Sequencing 75 bp length 75 bp length 75 bp length 2 x 150 bp length 2 x 300 bp length
References
[1] Steven L. Salzberg. Open ques- [6] Lemire A, Lea K, Batten D, et al. [12] Yannan Fan, Keith Siklenka, Simran
tions: How many genes do we have? Development of ERCC RNA Spike-In K. Arora, Paula Ribeiro, Sarah Kimmins,
BMC Biology. 2018;16(94). doi:10.1186/ Control Mixes. Journal of Biomolecular Jianguo Xia; miRNet - dissecting miR-
s12915-018-0564-x Techniques : JBT. 2011;22(Suppl):S46 NA-target interactions and functional
associations through network-based
[2] Sam Griffiths-Jones, Russell J. [7] Zhao S, Zhang Y, Gordon W, et al. visual analysis, Nucleic Acids Research,
Grocock, Stijn van Dongen, Alex Comparison of stranded and non- Volume 44, Issue W1, 8 July 2016, Pages
Bateman, Anton J. Enright; miRBase: stranded RNA-seq transcriptome profil- W135–W141, https://doi.org/10.1093/
microRNA sequences, targets and gene ing and investigation of gene overlap. nar/gkw288
nomenclature, Nucleic Acids Research, BMC Genomics. 2015;16(1):675.
Volume 34, Issue suppl_1, 1 January doi:10.1186/s12864-015-1876-7 [13] Neves, R.C., Guimaraes, J.C.,
2006, Pages D140–D144, https://doi. Strempel, S. et al., Transcriptome pro-
org/10.1093/nar/gkj112 [8] Online at: https://emea.illumina. filing of Symbion pandora (phylum
com/systems/sequencing-platforms/ Cycliophora): insights from a differential
[3] The Gene Ontology Consortium; nextseq/applications.html?langsel=/ gene expression analysis, Org Divers Evol
Expansion of the Gene Ontology knowl- ch/, accessed 14.09.2018 (2017) 17: 111. https://doi.org/10.1007/
edgebase and resources, Nucleic Acids s13127-016-0315-1
Research, Volume 45, Issue D1, 4 January [9] Online at: http://maq.sourceforge.
2017, Pages D331–D338, https://doi. net/fastq.shtml, accessed 14.09.2018
org/10.1093/nar/gkw1108
[10] Sandrine Borgeaud, Lisa C. Metzger,
[4] Minoru Kanehisa, Miho Furumichi, Tiziana Scrignari, Melanie Blokesch, The
Mao Tanabe, Yoko Sato, Kanae type VI secretion system of Vibrio chol-
Morishima; KEGG: new perspectives on erae fosters horizontal gene transfer,
genomes, pathways, diseases and drugs, Science 02 Jan 2015: Vol. 347, Issue 6217,
Nucleic Acids Research, Volume 45, Issue pp. 63-67. DOI: 10.1126/science.1260064
D1, 4 January 2017, Pages D353–D361,
https://doi.org/10.1093/nar/gkw1092 [11] Bhupesh K. Prusty, Nitish Gulve,
Suvagata Roy Chowdhury, Michael
[5] Schurch NJ, Schofield P, Gierliński M, Schuster, Sebastian Strempel, Vincent
et al. How many biological replicates Descamps, Thomas Rudel. HHV-6
are needed in an RNA-seq experiment encoded small non-coding RNAs define
and which differential expression tool an intermediate and early stage in viral
should you use? RNA. 2016;22(6):839- reactivation. npj Genomic Medicine.
851. doi:10.1261/rna.053959.115 2018;3(25). 10.1038/s41525-018-0064-5