Asap Poster Sibdays

Web-based pipeline for the analysis and interactive visualization
of single-cell RNAseq data

Adrian Shajkofci , Vincent Gardeux , Petra Schwalie , Bart Deplancke
1
1,2
Laboratory of Systems Biology and Genetics, EPFL, Lausanne, Switzerland,
1,2
1,2
Swiss Institute of Bioinformatics, Lausanne, Switzerland
Introduction
Single Cell (SC) RNAseq allows the measurement of the expression of thousands of genes in hundreds or even thousands of individual cells. These technologies were applied in many domains including embryology, cancer research, neurobiology, or microbiology to characterize rare cell populations such as circulating tumor cells or to study mosaicism. In the past five years, pipelines specifically dedicated to SC analysis emerged such as SINCERA1, SEURAT2, MAST3 or PAGODA4. However, they only incorporate a restricted set of algorithms and their output remains very computational. In this work, we present a fully integrated, web-based pipeline aimed at the complete analysis of SC
RNAseq data post genome alignment from the parsing, filtering, and normalization of the input count data files, to the visual (2D and 3D) representation of cell clusters and other
statistical features. Our software allows the user to easily select and compare many common algorithms, as well as specific SC tools such as SCDE4 or tSNE5 and provides an
interactive visualization of the results. Differential expression (DE) calculations and functional enrichment analysis help the characterization of clusters of cells, such as cell types
or differentiation time points. The pipeline is designed to be modular, such that further tools can be added as they are developed, as well as computationally efficient and designed for concurrent queries and calculations.
Software pipeline
2D and 3D interactive visualization for :
- Principal Component Analysis (PCA)
- t-distributed Stochastic Neighbour Embedding (t-SNE)5
- Zero-Inflated Factor Analysis (ZIFA)9
- Cell colouring according to gene exppression / gene sets / pathways
Normalization algorithms :
- Standard scaling and log
- Voom6
- Trimmed Mean of M-Values (TMM)7
- Standard Median Ratio (DESeq2)8
Filtering algorithms :
- Threshold
- Most highly expressed
genes
- Coefficient of variation
- PAGODA4 (SC specific)
Normalization
Dimensionality
reduction
Filtering
Clustering
SCViz
Platform
Input Data
Input file :
- scRNA-seq read count data
- already normalized matrix
Processing :
- duplicates handling
- gene ID conversion to ENSEMBL
and other databases
- robust parsing (missing columns,...)
Clustering algorithms :
- K-Means with automatic silhouette
analysis.
- Gaussian Mixture Models (GMM)
- Mean Shift
- Hierarchical Clustering with interactive tree cutting (Ward)
- SC310 (SC specific) consensus
clustering
Differential
expression
Species-specific functional annotation databases :

- Gene atlas
- Gene ontology (GO)
- KEGG Pathways
Functional
enrichment
D
C
Technical characteristics
User-friendly interface written in HTML5 / Javascript with WebGL components. Works with
every modern browser without any installation.
Server written in Java with bindings for any Python and R script or library. SQL database for
enrichment regions and account management. REST-ful API for HTTP communication.
Designed for concurrent queries and calculations in a multi-processor and multicore.
Applications
Freely available public service
Local software for laboratories or whole
universities
Figure 1. Screenshots of SCViz using scRNA-seq gene expression count data11.

Panel A shows the main view with the addition of a 2D PCA plot. Here cells are
colored according to the heart gene set from the Gene atlas (EMBL-EBI) database. Panel B & C show PCA plots with cell colored according to SC3 consensus
clustering (respectively 2D & 3D). Panel D shows hierarchical clustering plot with
a user-defined tree cut point (in red), generating five clusters. Panel E shows the
output of differential expression (Limma) of one cluster over all other cells. Panel
F shows the functional enrichment for the 10 first differentially expressed genes
suggesting that the selected cluster is composed of cardiac cells.
Perspectives
Conclusions
This web-based software can help biologists with little/no computational expertise to
perform the interpretation of any single-cell gene expression count data. It allows
conducting the most common analyses such as filtering, normalization, clustering, DE
calculation and functional enrichment in a very convenient way. Furthermore, visualization through PCA, t-SNE or ZIFA is interactive, therefore researchers can explore a
broad range of views as well as distributions of specific genes or gene set expression
patterns among the cells. Finally, the web format also allows the application to be run
on any computer and any system without requiring any installation or further updates.
We plan to update the software with up-to-date analyzing and visualization tools, as well
as additional functionalities. Functional annotation databases are ever-growing, fuelled
by the efforts of dozens of laboratories, companies and consortiums everywhere in the
world; the integration of these databases in our software is also planned. Additionally, we
aim to develop an interface where most parameters, such as the choice of algorithms or
the number of clusters will be automatically optimized depending on the input data.
Demonstration version
References
Guo et al., 2015
2
Satija et al., 2015
3
Finak et al., 2015
4
Kharchenko et al., 2014
5
Van der Maaten et al., 2008
6
Law et al., 2014
DE algorithms :
- Limma-Voom6
- EdgeR7
- DESeq28
- SCDE4 (SC specific)
Robinson et al., 2010

8
Love et al., 2014
9
Pierson et al., 2015
10
Kiselev et al., 2016
11
Dueck et al., 2015
7
A demonstration version of the software is available at the following address:
http://www.singlecell.ch
Acknowledgements

Asap Poster Sibdays

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asap Poster Sibdays

Uploaded by

Copyright:

Available Formats

Web-based pipeline for the analysis and interactive visualization

of single-cell RNAseq data

Laboratory of Systems Biology and Genetics, EPFL, Lausanne, Switzerland,

Swiss Institute of Bioinformatics, Lausanne, Switzerland

Species-specific functional annotation databases :

Figure 1. Screenshots of SCViz using scRNA-seq gene expression count data11.

Robinson et al., 2010

A demonstration version of the software is available at the following address:

You might also like