Download as key, pdf, or txt
Download as key, pdf, or txt
You are on page 1of 1

Seeking the best de novo transcriptome assembling in

non-model organisms 2
M. Espigares , P. Seoane , A. Polonio , R. Bautista3,
1 1
RTA2013-00068-C03-02
2
J. Quintana 4
, A. Pérez-García , L. Gómez
1) Departamento de Biología Molecular y Bioquímica (Universidad de Málaga), Campus de Teatinos, 29071 Málaga, Spain
, MG Claros
4 1,3
RTA2013-00023-C02
2) Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga. Campus de Teatinos s/n, 29071 Málaga, España.
3) Plataforma Andaluza de Bioinformática (Universidad de Málaga), Severo Ochoa, 34, 29590 Málaga, Spain
4) Departamento de Biotecnología, Universidad Politécnica de Madrid, Madrid, Spain

Introduction Assembling strategy


In species where no genome sequence is available, the first step is the procurement of a de novo
reference transcriptome. A problem is how to evaluate the suitability of this de novo transcriptome.
The hypothesis that will be tested in this work is that reliable de novo transcriptomes should
resemble well-characterised transcriptomes of model species. Non-model species are very important
from the economical or ecological point of view. For example, the European chestnut (Castanea
sativa) is a forest tree having an important impact on local economy. Other interesting example is
the fungus Podosphaera axthii that has the ability to parasitize Cucurbitaceae corps.
The objective of the present study is to obtain an automated, reproducible and flexible workflow
that served to obtain the most complete and reliable de novo transcriptome of a non-model species.
It is illustrated with the European chestnut and the P. xanthin fungus reference transcriptomes, and
may be extended to other plant species, such as olive tree or grapevine.

Post-assembling tools for validation

BUSCO (Benchmarking Universal Single-CopyOrthologs): percentage of found BUSCO’s


completed, duplicated and fragmented.
FLN (Full-Lenghter-Next): N50, N90, number of protein annotated, number of contigs, number of
unigenes, unigenes with/without orthologues, sequences with artifacts, number of gaps, and
percentage of indetermination.
Bowtie (mapping tool): percentage overall reads alignment rate and reads intersection Illumina-
454.
PCA (principal component analysis) and HCPC (hierarchical clustering principal component): Workflow designed for the construction of a transcriptome by combining different assemblers and different
Radar plot distribution, factor map of strategies and variables, strategy ranking list, cluster Best transcriptome for Castanea
assembling strategies. The workflow can be divided in three parts: I) Pre-processing of the raw data, II) assembling
dendrogram, hierarchical clustering. approaches, and III) validation using a set of metrics for further principal component analyses.
sativa
Transcriptome reference selection Best transcriptome for Castanea sativa
a) b a)
4
5
4
-
b
)
r
e
a
)
d
s
STRATEGY RANKING
/
I R
l e
l f
u
- e
a r
s e
s
e
m
n
c Assembly
b
l
e
s Average distances
y

from reference

rs_454read/merge_SOAP 0.5187306
rs_454read/merge_Oase 0.5259034
rs_454read/Oase 0.5435384
Comparison of six plant references according to metrics described in this study. a) PCA analysis of the 6 reference rs_454read/Oases_25 0.5486952
transcriptomes; b) radar distribution of analytical parameters by plant transcriptome. A. thaliana and P. trichocarpa a) Hierarchical clustering of the 90 different assembling strategies obtained from 454 and Illumina0.5653193
rs_454read/SOAP_25 single-end reads. b)
present metrics such as complete contigs and protein annotated outside of the radar circle and indetermination going Ranking list of the 7 best transcriptomes obtained. Results show the best transciptomes as the reconciliation of llumina
rs_454read/SOAP 0.5885214
inside, confirming that these two transcriptomes are better references than the others. assemblies with 454 reads.
rs_454read/RAY_ 0.5897747

Best transcriptomes for Podosphaera xanthii


454 paired reads from epiphyte body Illumina paired reads from haustorium Epiphyte + haustorium
a) b a) b a) b
) ) 45
4-
R
e
f
e
)
r
R E M rea e
n INDIVIDUALS
e
f u i S ds c

e l r R O O +Ill
e

a
RANKING
r e A
e
e a f a P u n
d

n r e s ass 4
a 5
c r
e n em 4
e e
n s d bly r
e
c R a
d
e a s Assembly Average
y
distances

from reference

a) Hierarchical clustering of assemblies when sequencing P. xanthii epiphyte a) Hierarchical clustering of P xanthiirs_454read/merge_SOAP
assemblies combining both epiphyte
0.4945959
body using 454-FLX technology. b) Radar distribution of analytical parameters a) Hierarchical clustering of assemblies of Illumina reads of P xanthii body and haustorium. b) Ranking list rs_454read/RAY_
of the 7 best transcriptomes0.4952736
obtained.
used for the dendrogram. It is clearly observed that Mira assembling (red) is haustorium. b) Radar distribution of analytical parameters. Since more Reassembling 454-reads and Illumina assemblies is the way to provide the
rs_454read/RAY_25 0.4959462
closer to the reference (orange) than Euler assembling (blue). assemblers are involved in now, the radar plot becomes more complicated. best general transcriptome for P xanthii. As expected, it is not the same
strategy than used for chestnut rs_454read/SOAP_35 0.4983065
Dots distribution for Oases seems to be the best option.
rs_454read/SOAP_25 0.5040314
rs_454read/merge_RAY_ 0.5053191

Conclusions rs_454read/RAY_35 0.5095975

A very large number of assembling strategies can be compared in a single run.


Reads, strategy, assemblers and post-assembling tools clearly impact on the resulting transcriptome and assembling strategy.
It is easy to recycle the workflow to be applied to a wide variety of organisms and sequencing data types.
This workflow demonstrates that, depending on the sequencing dataset used, a different assembly strategy is selected and the workflow can do this selection without human intervention.

You might also like