Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Web-based pipeline for the analysis and interactive visualization

of single-cell RNAseq data


Adrian Shajkofci , Vincent Gardeux , Petra Schwalie , Bart Deplancke
1

1,2

Laboratory of Systems Biology and Genetics, EPFL, Lausanne, Switzerland,

1,2

1,2

Swiss Institute of Bioinformatics, Lausanne, Switzerland

Introduction
Single Cell (SC) RNAseq allows the measurement of the expression of thousands of genes in hundreds or even thousands of individual cells. These technologies were applied in many domains including embryology, cancer research, neurobiology, or microbiology to characterize rare cell populations such as circulating tumor cells or to study mosaicism. In the past five years, pipelines specifically dedicated to SC analysis emerged such as SINCERA1, SEURAT2, MAST3 or PAGODA4. However, they only incorporate a restricted set of algorithms and their output remains very computational. In this work, we present a fully integrated, web-based pipeline aimed at the complete analysis of SC
RNAseq data post genome alignment from the parsing, filtering, and normalization of the input count data files, to the visual (2D and 3D) representation of cell clusters and other
statistical features. Our software allows the user to easily select and compare many common algorithms, as well as specific SC tools such as SCDE4 or tSNE5 and provides an
interactive visualization of the results. Differential expression (DE) calculations and functional enrichment analysis help the characterization of clusters of cells, such as cell types
or differentiation time points. The pipeline is designed to be modular, such that further tools can be added as they are developed, as well as computationally efficient and designed for concurrent queries and calculations.

Software pipeline
2D and 3D interactive visualization for :
- Principal Component Analysis (PCA)
- t-distributed Stochastic Neighbour Embedding (t-SNE)5
- Zero-Inflated Factor Analysis (ZIFA)9
- Cell colouring according to gene exppression / gene sets / pathways

Normalization algorithms :
- Standard scaling and log
- Voom6
- Trimmed Mean of M-Values (TMM)7
- Standard Median Ratio (DESeq2)8

Filtering algorithms :
- Threshold
- Most highly expressed
genes
- Coefficient of variation
- PAGODA4 (SC specific)

Normalization

Dimensionality
reduction

Filtering

Clustering

SCViz
Platform

Input Data

Input file :
- scRNA-seq read count data
- already normalized matrix
Processing :
- duplicates handling
- gene ID conversion to ENSEMBL
and other databases
- robust parsing (missing columns,...)

Clustering algorithms :
- K-Means with automatic silhouette
analysis.
- Gaussian Mixture Models (GMM)
- Mean Shift
- Hierarchical Clustering with interactive tree cutting (Ward)
- SC310 (SC specific) consensus
clustering

Differential
expression

Species-specific functional annotation databases :


- Gene atlas
- Gene ontology (GO)
- KEGG Pathways

Functional
enrichment

D
C

Technical characteristics
User-friendly interface written in HTML5 / Javascript with WebGL components. Works with
every modern browser without any installation.
Server written in Java with bindings for any Python and R script or library. SQL database for
enrichment regions and account management. REST-ful API for HTTP communication.
Designed for concurrent queries and calculations in a multi-processor and multicore.

Applications
Freely available public service
Local software for laboratories or whole
universities

Figure 1. Screenshots of SCViz using scRNA-seq gene expression count data11.


Panel A shows the main view with the addition of a 2D PCA plot. Here cells are
colored according to the heart gene set from the Gene atlas (EMBL-EBI) database. Panel B & C show PCA plots with cell colored according to SC3 consensus
clustering (respectively 2D & 3D). Panel D shows hierarchical clustering plot with
a user-defined tree cut point (in red), generating five clusters. Panel E shows the
output of differential expression (Limma) of one cluster over all other cells. Panel
F shows the functional enrichment for the 10 first differentially expressed genes
suggesting that the selected cluster is composed of cardiac cells.

Perspectives

Conclusions
This web-based software can help biologists with little/no computational expertise to
perform the interpretation of any single-cell gene expression count data. It allows
conducting the most common analyses such as filtering, normalization, clustering, DE
calculation and functional enrichment in a very convenient way. Furthermore, visualization through PCA, t-SNE or ZIFA is interactive, therefore researchers can explore a
broad range of views as well as distributions of specific genes or gene set expression
patterns among the cells. Finally, the web format also allows the application to be run
on any computer and any system without requiring any installation or further updates.

We plan to update the software with up-to-date analyzing and visualization tools, as well
as additional functionalities. Functional annotation databases are ever-growing, fuelled
by the efforts of dozens of laboratories, companies and consortiums everywhere in the
world; the integration of these databases in our software is also planned. Additionally, we
aim to develop an interface where most parameters, such as the choice of algorithms or
the number of clusters will be automatically optimized depending on the input data.

Demonstration version

References
Guo et al., 2015
2
Satija et al., 2015
3
Finak et al., 2015
4
Kharchenko et al., 2014
5
Van der Maaten et al., 2008
6
Law et al., 2014

DE algorithms :
- Limma-Voom6
- EdgeR7
- DESeq28
- SCDE4 (SC specific)

Robinson et al., 2010


8
Love et al., 2014
9
Pierson et al., 2015
10
Kiselev et al., 2016
11
Dueck et al., 2015
7

A demonstration version of the software is available at the following address:

http://www.singlecell.ch

Acknowledgements

You might also like