Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

2021-02-16

RNA sequencing
BN335: BIOINFORMATICS II What is RNA-Seq
• Massively parallel sequencing method for
RNA Sequencing and analysis
transcriptome analyses

• Complementary DNA (cDNA) generated from


RNA are sequenced using next-generation
“short read” technologies

Hussein J. M. (PhD)
• Reads are aligned to a reference genome and
a transcriptome map is constructed

Outline Transcriptome
• Intro to RNA‐Seq • Thetranscriptome is the complete set of transcripts
 Biological Questions in a cell, and their quantity, for a specific
 Comparison with Other Methods developmental stage or physiological condition
 Overview RNA‐Seq workflow
• Transcript Reconstruction from RNA-seq Reads • Understanding the transcriptome ise ssential for
• RNA‐Seq Applications -Interpreting the functional elements of the genome
 Annotation
-revealing the molecular constituents of cells, tissues
 Quantification
 Other Applications -understanding development and disease
• Expression Profiling Steps and Software

1
2021-02-16

Aims/Goals of RNA-Seq Library Preparation Methods


Annotation
• The construction of sequencing libraries
To determine the transcriptional structure of genes: principally involves
Identify genes, exons, splicing events  Isolating the desired RNA molecules
Novel genes or transcripts  fragmenting or amplifying randomly primed cDNA
 discovery of novel transcripts molecules
• Quantification  Reverse-transcribing the RNA to cDNA
 To quantify the changing expression levels of each  Enrichment
transcript during development and under  ligating sequencing adaptors
differentconditions
Differential gene expression
 Abundance of transcripts between different conditions
RNA-seq library fragmentation and size selection strategies that influence
interpretation and analysis.
Transcriptome: RNA World Overview of RNA-seq workflow
RNA-seq library fragmentation and size selection strategies that influence
Transcriptome: RNAs interpretation and analysis.

In addition to proteincoding mRNA, there is a diverse group


of noncoding RNA (ncRNA) molecules that are functional 4
hY p://cfinh talk.geospiza.com/2009/05/small‐rnas‐get‐smaller.html
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393 Griffith et al., 2015

2
2021-02-16
RNA-seq library fragmentation and size selection strategies that influence
interpretation and analysis.

RNA Isolation & Enrichment interpretation and analysis.


Overview of RNA-seq workflow
RNA-seq library fragmentation and size selection strategies that influence

• rRNA Depletion RNA-seq library fragmentation and size selection strategies that influence
 One approach to eliminate rRNAs is based on interpretation and analysis.
sequences pecific probes that can hybridize to rRNAs
 Alternatively, rRNAs are targeted by anti-sense DNA
oligos and digested by RNase H
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393 Griffith et al., 2015
 a method also known as probe-directed degradation
(PDD)
 The selection of an approach for enriching RNA
transcripts of interest for sequencing depends
on the goal of the experiment and many
technical factors http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393 Griffith et al., 2015
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393 Griffith et al., 2015

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393 Griffith et al., 2015

RNA Isolation & Enrichment Overview of RNA-seq workflow


A. Sequencing
• Selection of Poly(A) + Transcripts
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393 Griffith et al., 2015
 Sequencing of polyadenylated RNA is perhaps the
B. Data amalysis
most common application of RNA-Seq
 The poly(A) tail provides technical convenience
for enrichment of poly(A)
 Poly(A) + RNA selection can be carried out with
magnetic or cellulose beads coated with oligo-dT
molecules
 Polyadenylated RNAs can be selected using oligo-dT
priming for reverse transcription (RT)
 Poly(A) purification is a preferred method to
select poly(A) unless a very low amount of RNA is
available

3
2021-02-16

Outline
Quiz • Intro to RNA‐Seq
 Biological Questions
RNA-Seq is now the method of choice to study gene  Comparison with Other Methods
expression and RNA. Justify
 Overview RNA‐Seq workflow
• Transcript Reconstruction from RNA-seq Reads
• RNA‐Seq Applications
 Annotation
 Quantification
 Other Applications
• Expression Profiling Steps and Software

Advantages of RNA-Seq Transcript


RNA-Seq Reconstruction
Challenge: fromReconstruction
Transcript RNA-seq Reads

• Does not require existing genomic sequence


– Unlike hybridization approaches
• Very low back ground noise
(Avg. ~ 2 kb)
– Reads can be unabmiguously mapped
• Resolution
– Up to 1bp (Avg. ~ 300 b)

• High-throughput
– Better than Sanger sequencing of cDNA or EST libraries
• Cost
– Lower than traditional sequencing (~ 75 to 150 b reads, SE or PE)
• Can reveal sequence variations (SNPs) Adapted from: http://www2.fml.tuebingen.mpg.de/raetsch/members/research/transcriptomics.html

4
2021-02-16

Transcript Reconstruction
Transcript
Transcript from
Reconstruction
Reconstruction from RNA-Seq
from Reads
RNA-Seq
RNA-seq
RNA-Seq Reads
Reads Transcript Reconstruction
Transcript from
Reconstruction RNA-Seq
from Reads
RNA-seq Reads

Many tools to choose among:

Transcript Reconstruction from RNA-Seq Reads TopHat


Trinity
HISATHISAT Trinity
TheThe
“New Tuxedo”
Trinity Suite: Oases
“New Tuxedo” Suite: STAR
End-to-end Genome-based HISAT2 SoapDenovoTrans
End-to-end Genome-based
RNA-Seq Analysis GSNAP AbyssTrans
RNA-Seq Analysis
Software Package … IDBA-Tran
Software Package
Shannon
BinPacker GMAP
Bridger BLAT
HISAT Trinity Tuxedo” Suite:
The “New Cufflinks
… AAT
End-to-end
GMAP Genome-based Stringtie Spidey
StringString
Tie Tie GMAP
RNA-Seq Analysis IsoLasso Sim4
Software Package Bayesembler …
Trip
Traph
CEM
Nature Biotech, 2010
TransComb
Scallop
String Tie GMAP

Transcript Reconstruction from RNA-seq Reads Outline


What if it’s a new species and there is no reference genome?
• Intro to RNA‐Seq
Transcript Reconstruction from RNA-Seq Reads  Biological Questions
Transcript Reconstruction from RNA-Seq Reads
 Comparison with Other Methods
 Overview RNA‐Seq workflow
• Transcript Reconstruction from RNA-seq Reads
TopHat Trinity • RNA‐Seq Applications
End-to-end Transcriptome-based
RNA-Seq Analysis  Annotation
Software Package
Trinity  Quantification
 Other Applications
• Expression Profiling Steps and Software
GMAP
Cufflinks

5
2021-02-16

The Big Picture Genome-Guided


Annotation: IdentifyTranscript
known and Reconstruction
novel transcript
Splice-align reads to the genome

Genome-Guided Transcript Reconstruction


Splice-align reads to the genome

q Applica6ons – Annota6on: From Martin & Wang. Nature Reviews


Alignment segmentinpiles
Genetics. 2011
=> exon regions

Alignment segment piles => exon regions


Known and Novel Transcripts
Genome-Guided Transcript Reconstruction
Mapped
RNA‐Seq Reads:Applica6ons
Unmapped Reads:
– Annota6on:
ns/gene Annotation:
novel exonIdentify
or gene? known and
novel splice novel transcript
junc6ons? Annotation: Identify known and novel transcript
Splice-align reads to the genome
Iden6fy Known and Novel Transcripts From Martin & Wang. Nature Reviews in Genetics. 2011

Mapped Reads: Unmapped Reads:


Known exons/gene
novel exon or gene? novel splice junc6ons?

Genome-Guided Transcript Reconstruction


Large alignment gaps => introns
Splice-align reads to the genome

Large alignment gaps => introns

GuY man, M. et al Ab ini. o reconstruc. on of cell type–specific transcriptom es in m ouse Trapnell, C. et al Transcript assem bly and quan. fica. on by RN A‐Seq reveals unannotated
reveals the conserved m ul. ‐exonic structure of lincRNAs Nature Biotechnology (2010) transcripts and isoform sw itching during cell differen. a. on Nature Biotechnology (2010)

10
type–specific transcriptom es in m ouse Trapnell, C. et al Transcript assem bly and quan. fica. on by RNA‐Seq reveals unannotated From Martin & Wang. Nature Reviews in Genetics. 2011
ncRNAs Nature Biotechnology (2010) transcripts and isoform sw itching during cell differen. a. on Nature Biotechnology (2010)
Overlapping but different introns = evidence of alternative splicing

10 Overlapping but different introns = evidence of alternative splicing

6
2021-02-16

RNA‐Seq Applica6ons ‐ Quan6fica6on:


Annotation: Alternative Splicing
Expression
i Profilng
Quantification: Expression profiling

Mortazavi A., et al. M apping and quan. fying m am m alian transcriptom es by RN A‐Seq Nature Methods (2008) 12

Genome-Guided Transcript Reconstruction


Annotation: Alternative Splicing Need for Normalization
Traverse
Traverse paths
paths through
through the
the graph
graph to
to assemble
assemble transcript
transcript isoforms
isoforms
• More reads mapped to a transcript if it is
i) long
ii) at higher depth of coverage

Reconstructed
Reconstructed isoforms
isoforms • Normalize such that
i) features of different lengths
ii) total sequence from different conditions
can be compared
From
From Martin
Martin &
& Wang.
Wang. Nature
Nature Reviews
Reviews in
in Genetics.
Genetics. 2011
2011

7
2021-02-16

Quan6fying Expression: RPKM


Quantifying Expression: RPKM Quantifying Expression: FPKM
• RPKM: Reads Per Kilobase per Million mapped
• RPKM:
readsReads Per Kilobase per Million mapped •FPKM: Fragments Per Kilobase of transcript per
reads
Million fragments mapped
•• RPKM
RPKM= =  Analogous to RPKM but does not use read counts.
C : Number of mappable reads on a feature (eg.  the relative abundances of transcripts are described in
transcript,
 C : Number exon,reads
of mappable etc.)on a feature (eg. terms of the expected biological objects (fragments)
L: Length
transcript, exon, etc.)of feature (in kb) observed from an RNA‐Seq experiment, which in the future
N: Total
 L: Length number
of feature (in kb)of mappable reads (in millions) may not be represented by single read

 N: Total number of mappable reads (in millions)

Mortazavi A., et al. M apping and quan. fying m am m alian transcriptom es by RN A‐Seq Nature Methods (2008) 14

RPKM : Example RNA-Seq Data Analysis Summary


RPKM Example
 The analysis pipeline of RNA-Seq consists
of four fundamental analysis steps
Gene A 600 bases Gene B 1100 bases Gene C 1400 bases
 Raw image data have to be converted into
RPKM = 12/(0.6*6) = 3.33 RPKM = 24/(1.1*6) =3.64 RPKM = 11/(1.4*6) = 1.31 short read sequences
 Short read alignment to the reference
Sample 1 genome or transcriptome. If not available
C=12 C=24 C = 11
N = 6M then, de novo sequencing
 The amount of mapped reads is counting and
Sample 2 the gene expression level calculation
C=19 C=28 C = 16
N = 8M  peak calling algorithms
RPKM = 19/(0.6*8) = 3.96 RPKM = 28/(1.1*8) =1.94 RPKM = 16/(1.4*8) = 1.43
 Statistical tests to determine differential gene
expression
15

8
2021-02-16

Conclusion
RNA-Seq
• Offers high-throughput quantitative
measurement of transcript abundance
• Expression levels correlate well with qPCR
• Costs continue to fall due to multiplexing
• Expected to replace microarrays for
transcriptomic studies
• Automated pipeline (Tophat/Cufflinks)

You might also like