Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

How does DESeq2 work?

Alexandre Thiery
Eva Hamrud
Centre for Craniofacial and Regenerative Biology
How does DESeq2 work?
Learning Objectives:
• Understand the purpose of DESeq2
• Know about the different normalization methods and when each one
should be applied
• Know that DESeq2’s normalization accounts for both RNA
composition and library size
• Understand why gene-wise dispersion is calculated in DESeq2
• Understand the importance of multiple test correction
What is DESeq2?

DESeq2 is an R package used to find differential gene


expression or peak accessibility between 2 or more
conditions
Biological question: Which genes change their
expression when you treat your cells with a drug?
Drug treatment
Experimental set up

Untreated sample 1 Untreated sample 2 Untreated sample 3

Treated sample 1 Treated sample 2 Treated sample 3

Collect data from each sample using RNA-seq


RNA-seq data

Untreated Untreated Untreated Treated Treated Treated


sample 1 sample 2 sample 3 sample 1 sample 2 sample 3
Gene 1 45 43 42 145 157 126
Gene 2 0 0 0 12 42 4
Gene 3 1030 1011 1035 1025 1010 1025
Gene 4 1 2 0 0 1 5
Gene 5 10 35 12 53 54 100
RNA-seq data

Untreated Untreated Untreated Treated Treated Treated


sample 1 sample 2 sample 3 sample 1 sample 2 sample 3
Gene 1 45 43 42 145 157 126
Gene 2 0 0 0 12 42 4
Gene 3 1030 1011 1035 1025 1010 1025
Gene 4 1 2 0 0 1 5
Gene 5 10 35 12 53 54 100

Untreated VS Treated
RNA-seq data

Untreated Untreated Untreated Treated Treated Treated


sample 1 sample 2 sample 3 sample 1 sample 2 sample 3
Gene 1 45 43 42 145 157 126
Gene 2 0 0 0 12 42 4
Gene 3 1030 1011 1035 1025 1010 1025
Gene 4 1 2 0Untreated VS
0 Treated 1 5
Gene 5 10 35 12 53 54 100
Gene 1
180
160
140
Gene expression

120
100
80
60
40
20
0
Untreated sampl e Untreated sampl e Untreated sampl e Treated sample 1 Treated sample 2 Treated sample 3
1 2 3
RNA-seq data

Untreated Untreated Untreated Treated Treated Treated


sample 1 sample 2 sample 3 sample 1 sample 2 sample 3
Gene 1 45 43 42 145 157 126
Gene 2 0 0 0 12 42 4
Gene 3 1030 1011 1035 1025 1010 1025
Gene 4 1 2 0Untreated VS
0 Treated 1 5
Gene 5 10 35 12 53 54 100

Untreated Treated

Differentially
Freq

expressed gene

Gene expression
RNA-seq data

Untreated Untreated Untreated Treated Treated Treated


sample 1 sample 2 sample 3 sample 1 sample 2 sample 3
Gene 3 1030 1011 1035 1025 1010 1025
Gene 2 0 0 0 12 42 4
Gene 3 1030 1011 1035 1025 1010 1025
Gene 4 1 2 0Untreated VS
0 Treated 1 5
Gene 5 10 35 12 53 54 100

Untreated Treated
Freq

Not differentially
expressed gene

Gene expression
We need to account for technical variance
Untreated Treated

Freq
Gene expression

Is this difference biological?

1, Normalization of read counts

2, Estimating gene-wise dispersion


We need to account for technical variance
Untreated Treated

Freq
Gene expression

Is this difference biological?

1, Normalization of read counts

2, Estimating gene-wise dispersion


1, Read Normalization: Accounting for gene length

Gene X Gene Y

Freq
Gene expression

Normalization methods: RPKM, FPKM, TPM

Meeta et al. 2021


1, Read Normalization: Accounting for sequencing depth

Sample A Sample B

Freq
Expression of Gene X

Normalization method: CPM


𝐶𝑜𝑢𝑛𝑡𝑠 𝑥
𝐶𝑃𝑀𝑥 = . 1𝑒6
𝑆𝑎𝑚𝑝𝑙𝑒 𝑙𝑖𝑏𝑟𝑎𝑟𝑦 𝑠𝑖𝑧𝑒 Meeta et al. 2021
1, Read Normalization: Accounting for RNA composition

Sample A Sample B

Freq
Expression of Gene X

Normalization method: DESeq2’s median of ratios


Meeta et al. 2021
1, Read Normalization: DESeq2 ‘Median of ratios’

DESeq2 accounts for sequencing depth and RNA composition

1) Calculates the size factor from median of ratios, thus


preventing DE genes from affecting the normalisation

2) Divides the raw expression by this normalisation factor to


adjust for sequencing depth

For more details: Meeta et al. 2021


We need to account for technical variance
Untreated Treated

Freq
Gene expression

Is this difference biological?

1, Normalization of read counts

2, Estimating gene-wise dispersion


2, Variance changes with level of gene expression

Meeta et al. 2021


2, Calculating gene-wise dispersion using DESeq2

Gene-wise dispersion estimates

Maximum likelihood estimation


Freq

Expression of Gene X

Meeta et al. 2021


2, Calculating gene-wise dispersion using DESeq2

Gene-wise dispersion estimates Shrink the dispersion to the curve

Meeta et al. 2021


Hypothesis testing

For each gene, DESeq2 fits a linear model of gene expression, taking into account the
dispersion estimates
These models are used to test if the means between the different conditions differ
significantly (using Wald test)
This is run for each gene independently

Sample A Sample B
Freq

Expression of Gene 1

Meeta et al. 2021


Multiple test correction: False Discovery Rate
Sample A Sample B
Freq

With a p-value of 0.05, 1 out of 20 tests will be a false


Expression of Gene 1 positive

Untreated Treated As there are 1000s of genes in an RNA-seq dataset,


multiple test correction is needed to overcome this issue
Freq

DESeq2 uses Benjamini-Hochberg correction (FDR) by


Expression of Gene 2
default to adjust p-values in relation to the number of
Sample A Sample B genes being tested
Freq

Expression of Gene 3
Resources
• More in-depth explanation of differential gene expression analysis and DESeq2:
https://hbctraining.github.io/DGE_workshop_salmon/lessons/05_DGE_DESeq2_analysis2.html
• DESeq2 paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8
• Intro to DESeq2 youtube video: DESeq2 Basics Explained | Differential Gene Expression Analysis |
Bioinformatics 101
• Youtube video of maximum likelihood estimation: https://www.youtube.com/watch?v=XepXtl9YKwc&t=96s
• Youtube video of multiple test correction and FDR: https://www.youtube.com/watch?v=K8LQSvtjcEo

You might also like