Professional Documents
Culture Documents
Why Deep Learning Is Changing The Way To Approach NGS Data Processing A Review
Why Deep Learning Is Changing The Way To Approach NGS Data Processing A Review
Why Deep Learning Is Changing The Way To Approach NGS Data Processing A Review
11, 2018
Abstract—Nowadays, big data analytics in genomics is specific fields, such as comparative genomic, metagenomics, bi-
an emerging research topic. In fact, the large amount of ological systematic, medical diagnosis, early detection of can-
genomics data originated by emerging next-generation se- cer, single nucleotide polymorphisms (SNPs) research, regula-
quencing (NGS) techniques requires more and more fast
and sophisticated algorithms. In this context, deep learn- tion of gene expression, forensic biology, and many others. In re-
ing is re-emerging as a possible approach to speed up the cent years, modern high-performance sequencing methods have
DNA sequencing process. In this review, we specifically dis- been developed. In such a context, next-generation sequencing
cuss such a trend. In particular, starting from an analysis (NGS) indicates a number of different modern DNA sequencing
of the interest of the Internet community in both NGS and techniques that are applied, for example, for genome sequenc-
deep learning, we present a taxonomic analysis highlighting
the major software solutions based on deep learning algo- ing, genome resequencing, transcriptome profiling (RNA-Seq),
rithms available for each specific NGS application field. We DNA-protein interactions (ChIP-sequencing), epigenome char-
discuss future challenges in the perspective of cloud com- acterization, etc. NGS solutions allow us to speed up DNA
puting services aimed at deep learning based solutions for sequencing tasks. Even though it is possible to analyze short
NGS. fragments of nucleic acids in a more efficient fashion, the pos-
Index Terms—Big data, biotechnology, deep learning, ge- sibility to carry out a large number of parallel sequencing tasks
nomics, next-generation sequencing (NGS). causes a huge amount of genomics data that need to be stored
and processed in a short time. Therefore, the huge amount of
genomics data brought by NGS techniques are examples of
I. INTRODUCTION
the well-known “big data” problem [1]–[3]. NGS allows re-
OWADAYS, big data analytics in genomics is an emerg-
N ing research topic. Knowing the nucleotide sequence of
the genome of an organism is extremely important in various
searchers to perform numerous parallel sequencing processes in
order to obtain a large number of sequences in a short time and
at low cost, compared to a traditional Sanger sequencing based
biotechnological research fields. DNA sequencing is a molec- on chain-termination methods.
ular biology process able to determine the right nucleotide se- With the advent of NGS, one of the major challenges in
quence of a DNA molecule, which is constituted by the alterna- bioinformatics is to efficiently transform genomics big data into
tion of four nucleotides: adenine (A), thymine (T), guanine (G), valuable knowledge. In fact, on one hand, a key issue is repre-
and cytosine (C). Information obtained from the DNA sequenc- sented by the complexity of errors that are generated by NGS
ing process are used for basic biological research and in other data, whereas, on the other hand, another issue is represented
by the processing of the huge amount of NGS data that require
Manuscript received July 5, 2017; revised December 29, 2017; ac- a considerable execution time, making the old statistical ma-
cepted March 31, 2018. Date of publication April 12, 2018; date of current chinery based algorithms not so efficient any more. In order to
version July 24, 2018. This work was supported by the Italian Healthcare address such issues, both industrial and scientific communities
Ministry funded project “Do severe acquired brain injury patients benefit
from telerehabilitation? A cost-effectiveness analysis study,” under Grant are looking at deep learning solutions. Deep learning is a re-
GR-2016-02361306. (Corresponding author: Antonio Celesti.) search field of machine learning and artificial intelligence that
F. Celesti is with the Department of Biomedical and Dental Sci- is based on different levels of representation, corresponding to
ences and Morphological and Functional Images, University of Messina,
Messina 98125, Italy (e-mail: fabrizio.celesti@studenti.unime.it). factors of hierarchy in which high-level concepts are defined on
A. Celesti is with the MIFT Department, University of Messina, Messina low-level basis. Although, it was introduced by Ivakhnenko and
98166, Italy, with the Alma Digit Research Laboratory, Messina 98166, Lapa in 1965 [4], currently, it is re-emerging due to its potential
Italy, and also with the BIG DATA Laboratory, National Interuniver-
sity Consortium for Informatics, Rome 00185, Italy (e-mail: acelesti@ applications to solve many big data problems in different appli-
unime.it). cation fields, including genomics. In fact, it allows researchers
J. Wan is with the School of Mechanical and Automotive Engineer- to perform DNA sequencing tasks in a more efficient and faster
ing, South China University of Technology, Guangzhou 510630, China
(e-mail: mejwan@scut.edu.cn). fashion than in the past.
M. Villari is with the MIFT Department, University of Messina, Messina Up to now, several surveys have been proposed regarding the
98166, Italy, with the IRCCS Centro Neurolesi “Bonino Pulejo,” Messina application of deep learning in bioinformatics. In this context, a
98124, Italy, and with the Alma Digit Research Laboratory, Messina
98166, Italy (e-mail: mvillari@irccsme.it). survey on deep learning in medical image analysis was proposed
Digital Object Identifier 10.1109/RBME.2018.2825987 in [5], whereas a survey of recent advances in deep learning
1937-3333 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
CELESTI et al.: WHY DEEP LEARNING IS CHANGING THE WAY TO APPROACH NGS DATA PROCESSING: A REVIEW 69
techniques for electronic health record analysis was proposed With the advent of NGS, one of the major challenges in bioin-
in [6]. However, a survey specifically focusing on the adoption formatics has been to efficiently transform genomics big data
of deep learning based software tools in the NGS domain is into valuable knowledge. In fact, on one hand, the key issue of
still missing. In this review, different from existing surveys, NGS is represented by the complexity of errors that are gener-
we aim to overcome such a gap in the literature. In particular, ated and that have to be managed during genomics big data pro-
starting from a Google Trends analysis of the interest of the cessing. For example, current error rates can vary roughly from
Internet community in both NGS and deep learning and from <1% for germline SNPs to >25% somatic INDELs; whereas,
an analysis of scientific works available in the literature, we on the other hand, the processing of genomics big data coming
present a taxonomic analysis of the current state of the art on from NGS requires a considerable execution time, making the
deep learning based solutions for NGS big data analytics. For old statistical machinery based algorithms not so efficient any
the sake of simplicity, in the rest of the paper, we will refer to more. In this regard, both industrial and scientific communities
such a kind of software family with the term “deep learning have look at deep learning solutions that allow to perform NGS
NGS” (DLN). In particular, besides highlighting the countries data processing tasks faster than in the past.
where such a trend is more evident, we highlight the major NGS Nowadays, deep learning is one of the strategic technology
application fields in which DLN software tools are available. trends reported by Gartner in 2017 [7]. It represents a class of
The remainder of the review is organized as follows. In machine-learning (ML) algorithms that uses different abstrac-
Section II, we motivate why deep learning is so important in tion levels to give a meaning to data, in which each level takes
the field of NGS. An analysis on the interest of the Internet input from the previous one [8]. Currently, it is used to address
community on such topics is provided in Section III. A taxo- problems in various application fields, including, e.g., complex
nomic analysis of DLN software tools is discussed in Section III. networks [9], finance [10], cancer prediction [11], satellite im-
A discussion on the current state of the art and future challenges age processing [12], image classification [13], social networks
in the perspective of DLN cloud-based services is provided in data analysis [14], real-time video streaming [15], [16], cloud
Section V. Section VI concludes the paper. management [17], etc. It was born from the artificial neural
networks (ANNs) research in order to improve the poor perfor-
mance of back-propagation algorithms when it is necessary to
II. WHY DOES NGS REQUIRE DEEP LEARNING? consider networks with many hidden layers [18]. Algorithms can
In this section, we motivate why deep learning is so important be either supervised or unsupervised. Commonly, deep learning
in the field of NGS in order to introduce the reader toward algorithms are used to solve signal processing, graphical mod-
a taxonomic analysis of existing related works and initiatives eling, and pattern recognition problems. Typical deep learning
available in the literature. methods include the following.
DNA sequencing is the process of determining the precise 1) Deep neural network (DNN): It is an ANN with mul-
order of A, T, G, and C nucleotides within a DNA molecule. tiple hidden layers between the input and output layers,
The advent of DNA sequencing methods has greatly accelerated which can model complex nonlinear relationships. Archi-
biological and medical research works and discoveries. Knowl- tectures generate compositional models where the object
edge of DNA sequences is fundamental for basic biological is expressed as a layered composition of primitives.
research and in various applied fields, including genome 2) Convolutional neural network (CNN): It is a class of deep,
analysis and SNP research (e.g., diagnostic approaches and feed-forward ANNs that has successfully been applied
prevention, analysis structure of mutant proteins), comparative to analyze images. CNNs use a variation of multilayer
genomics (e.g., metagenomics, ribosomal RNA (rRNA) perceptrons (MLPs) designed to require minimal prepro-
classification, infectious disease diagnostics), regulation of cessing. They are also known as shift invariant (or space
gene expression (e.g., role of messenger RNA (mRNA) and invariant) ANN, based on their shared-weight architec-
noncoding RNA), forensic biology, biological systematic, early ture and translation invariance characteristics.
detection of cancer, virology, etc. Modern DNA sequencing 3) Recurrent neural network (RNN): It is a class of ANN in
technologies have allowed researchers to efficiently analyze the which connections between units form a directed cycle
genomes of various living species, including humans, animal, that allows us to analyze a dynamic temporal behavior.
plant, and microbial organisms. Several new methods for DNA Unlike feed-forward ANNs, RNNs can use their internal
sequencing were developed in 90 s and have been implemented memory to process arbitrary sequences of input.
in commercial DNA sequencers since 2000. 4) Autoencoder (AE): It is an ANN used for unsupervised
NGS, also known as high-throughput sequencing, is a catch- learning of efficient codings. Four main variants exist as
all term used to describe different modern DNA sequencing follows:
technologies, including Illumina (Solexa) sequencing, Roche a) Sparse autoencoder (SA): By imposing sparsity
454 sequencing, Ion torrent: Proton/PGM sequencing, and on the hidden units during the training process
SOLiD sequencing. NGS technologies allow researchers to se- (while having a larger number of hidden units than
quence DNA much more quickly and cheaply compared to the inputs), an AE can learn useful structures in the
older Sanger sequencing technology and, as such, have revolu- input data. This allows sparse representations of
tionized the study of genomics and molecular biology. inputs.
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
70 IEEE REVIEWS IN BIOMEDICAL ENGINEERING, VOL. 11, 2018
Fig. 1. Google Trends. Interest during the time period ranging from the week of December 18–25, 2011 to the week of December 4–10, 2016.
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
CELESTI et al.: WHY DEEP LEARNING IS CHANGING THE WAY TO APPROACH NGS DATA PROCESSING: A REVIEW 71
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
72 IEEE REVIEWS IN BIOMEDICAL ENGINEERING, VOL. 11, 2018
consistent across various cell types and/or tissues. DECRES D. SNPs in Coding and Noncoding Regions of Genome
[35] is a supervised deep learning solution for the identification
DeepSea [41] is a piece of framework based on CNN algo-
of enhancer and promoter regions in the human genome. It com-
rithms that directly learns a regulatory sequence code from large-
bines the following deep learning basic and derived methods:
scale chromatin-profiling data and enables the prediction of
CNN, DA, CA, SDA, stacked contractive autoencoder (SCA),
chromatin effects of sequence alterations with single-nucleotide
multiclass logistic/softmax regression (MCL/SR), MLPs, re-
sensitivity. Diet networks [42] is a software tool based on the
stricted Boltzman machine (RBM), deep belief network (DBN),
concept of CNN reparameterization aimed at solving the over-
and stacked restricted Boltzman machine (SRBM). Flexible In-
fitting problem originated when the number of input features
tegration of Data with Deep LEarning (FIDDLE) [36] is an
can be orders of magnitude larger than the number of training
open source flexible integrative data-agnostic piece of frame-
examples. In particular, by means of a neural network parame-
work that is able to learn a unified representation by analyzing
terization, it is able to considerably reduce the number of free
multiple data types in order to infer another data type. DEEP
parameters. It is based on the idea that it can first learn and pro-
[37], [38] is a predictive piece of framework using the CNN
vide a distributed representation for each input feature (e.g., for
approach able to streamline the analysis of enhancer’s prop-
each position in the genome where variations are observed), and
erties in various cellular conditions. By using such a solution,
then learn by means of another neural network how to map each
it is possible to train many models of individual classification
distributed feature representation into a vector of parameters by
that can be combined to classify DNA regions as enhancers
means of a classifier neural network in which weights links each
or nonenhancers. DEEP uses features that are deduced from
value of the feature to a specific hidden unit.
histone modification marks or attributes coming from differ-
DANN [43] is a software tool aimed at annotating genetic
ent sequence characteristics. DeepBind [39] is a stand-alone
variants, especially noncoding variants, for the purpose of iden-
software tool based on CNN that is totally automatic and is
tifying pathogenic ones. It uses the same feature set and train-
able to handle millions of sequences for each experiment. The
ing data as combined annotation-dependent depletion to train a
specificities that are determined by DeepBind are displayed
DNN that can capture nonlinear relationships among features in
as a weighted position matrix or as a “mutation map,” which
an efficient fashion when there is large number of samples and
indicates how the variations affect binding within a specific
features.
sequence.
E. Protein Structure Prediction
C. Prediction of Splicing Variants of mRNA
Given enough large protein families and using a global statis-
DeepSplice [40] is a CNN-based solution adopted as a splice tical inference approach, it is possible to obtain enough accuracy
junction classification tool employing deepCNNs that offers the in protein residue contact predictions to predict the structure of
following: many proteins. However, these approaches do not consider the
1) offers better performances compared to other methods fact that the contacts in a protein are neither randomly nor
for predicting splice sites; independently distributed, but actually follow precise rules gov-
2) offers high computational efficiency; and erned by the structure of the protein, and thus, are interdepen-
3) can be applied so as to pick out self-defined training data. dent. Considering such a concept, PconsC2 [44] is a multilayer
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
CELESTI et al.: WHY DEEP LEARNING IS CHANGING THE WAY TO APPROACH NGS DATA PROCESSING: A REVIEW 73
TABLE II
RELATED WORK TITLE, AUTHORS, COUNTRY, AND DEEP LEARNING BASED SOLUTION FOR EACH NGS SUBFIELD
feed-forward (MLFF) stack of random decision forests learn- from gene expression data identifying genes that are critical for
ers aimed at identifying proteinlike contact patterns in order to the diagnosis of breast cancer.
improve contact predictions.
V. DISCUSSION AND FUTURE CHALLENGES
F. Research and Monitoring of Biomarkers From both Google Trends and taxonomic analysis of aca-
Cancer detection from gene expression data is a challenge demic initiatives, it is evident that deep learning for the process-
due to the high dimensionality and complexity of considered ing of NGS big data is an emerging research topic.
data. After decades of research, there is still uncertainty in the Fig. 5 highlights that the highest percentage of DLN software
clinical diagnosis of cancer and the identification of tumor- tools is available in the United States, confirming the interest
specific markers. The stacked denoising AE (SDAE) solution of the Internet community of this country on deep learning and
[45] adopts a deep learning approach aimed at cancer detection NGS. From an academic point of view, other relevant initiatives
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
74 IEEE REVIEWS IN BIOMEDICAL ENGINEERING, VOL. 11, 2018
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
CELESTI et al.: WHY DEEP LEARNING IS CHANGING THE WAY TO APPROACH NGS DATA PROCESSING: A REVIEW 75
by future cloud computing services based on deep learning al- [23] John Markoff, “Scientists see promise in deep-learning programs,”
gorithms in terms of both computational resource scalability New York Times. 2012. [Online]. Available: http://www.nytimes.com/
2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-
and DNA fragments data sharing. With this review, we hope we artificial-intelligence.html. Accessed: Nov. 2012.
succeeded in stimulating the biotechnology community toward [24] F. Celesti et al., “Big data analytics in genomics: The point on deep
the development of advanced DLN software tools. learning solutions,” in Proc. IEEE Symp. Comput. Commun., 2017,
pp. 306–309.
[25] SCOPUS, Scopus Content at-a-glance. 2017. [Online]. Available:
REFERENCES https://www.elsevier.com/solutions/scopus/content to Accessed: Oct. 20,
2017.
[1] M. Fazio, M. Paone, A. Puliafito, and M. Villari, “Huge amount of het- [26] D. Rav et al., “Deep learning for health informatics,” IEEE J. Biomed.
erogeneous sensed data needs the cloud,” in Proc. Int. Multi-Conf. Syst., Health Informat., vol. 21, no. 1, pp. 4–21, Jan. 2017.
Signals, Devices, 2012, pp.–1–6. [27] C. Angermueller, T. Prnamaa, L. Parts, and O. Stegle, “Deep learning for
[2] M. Fazio, A. Celesti, A. Puliafito, and M. Villari, “Big data storage in the computational biology,” Mol. Syst. Biol., vol. 12, no. 7, 2016, Art. no. 878.
cloud for smart environment monitoring,” Procedia Comput. Sci., vol. 52, [Online]. Available: http://dx.doi.org/10.15252/msb.20156651
no. 1, 2015, pp. 500–506. [28] M. K. K. Leung, A. Delong, B. Alipanahi, and B. J. Frey, “Machine
[3] M. Fazio, A. Celesti, M. Villari, and A. Puliafito, “The need of a hybrid learning in genomic medicine: A review of computational problems and
storage approach for IoT in PaaS cloud federation,” in Proc. IEEE 28th data sets,” Proc. IEEE, vol. 104, no. 1, pp. 176–197, Jan. 2016.
Int. Conf. Adv. Inf. Netw. Appl. Workshops, 2014, pp. 779–784. [29] Y. Wang, et al., “Predicting DNA methylation state of CpG dinucleotide
[4] A. G. Ivakhnenko and V. G. Lapa, Cybernetic Predicting Devices. New using genome topological features and deep networks,” Sci. Rep., vol. 6,
York, NY, USA: CCM Information Corporation, 1965. pp. 1–15, 2016.
[5] G. Litjens et al., “A survey on deep learning in medical image analysis,” [30] R. Singh, J. Lanchantin, G. Robins, and Y. Qi, “DeepChrome: Deep-
Med. Image Anal., vol. 42, pp. 60–88, 2017. learning for predicting gene expression from histone modifications,”
[6] B. Shickel, P. Tighe, A. Bihorac, and P. Rashidi, “Deep EHR: A Bioinformatics, vol. 32, no. 17, pp. i639–i648, 2016.
survey of recent advances in deep learning techniques for electronic [31] D. Kelley, J. Snoek, and J. Rinn, “Basset: Learning the regulatory code of
health record (EHR) analysis,” IEEE J. Biomed. Health Informat., still the accessible genome with deep convolutional neural networks,” Genome
to be published. Res., vol. 26, no. 7, pp. 990–999, 2016.
[7] K. Panetta, “Gartner’s top 10 strategic technology trends for 2017: Arti- [32] J. Lanchantin, R. Singh, B. Wang, and Y. Qi, “Deep Motif dashboard:
ficial intelligence, machine learning, and smart things promise an intelli- Visualizing and understanding genomic sequences using deep neural net-
gent future,” Gartner, Oct. 2016. [Online]. Available: http://www.gartner. works,” Pacific Symp. Biocomput., Waimea, HI, USA, Jan. 2016, vol. 22,
com/smarterwithgartner/gartners-top-10-technology-trends-2017/ pp. 254–265.
[8] L. Deng and D. Yu, “Deep learning: Methods and applications,” Tech. [33] D. Quang and X. Xie, “DanQ: A hybrid convolutional and recurrent deep
Rep., May 2014. [Online]. Available: https://www.microsoft.com/en- neural network for quantifying the function of DNA sequences,” Nucleic
us/research/publication/deep-learning-meth ods-and-applications/ Acids Res., vol. 44, no. 11, 2016.
[9] D. C. Mocanu, E. Mocanu, P. H. Nguyen, M. Gibescu, and A. Liotta, “A [34] F. Liu, H. Li, C. Ren, X. Bo, and W. Shu, “PEDLA: Predicting enhancers
topological insight into restricted Boltzmann machines,” Mach. Learn., with a deep learning-based algorithmic framework,” Sci. Rep., vol. 6,
vol. 104, no. 2, pp. 243–270, Sep. 2016. 2016, Art. no. 28517.
[10] S. Sohangir, D. Wang, A. Pomeranets, and T. Khoshgoftaar, “Big data: [35] Y. Li, W. Shi, and W. W. Wasserman, “Genome-wide prediction of cis-
Deep learning for financial sentiment analysis,” J. Big Data, vol. 5, no. 1, regulatory regions using supervised deep learning methods,” Cold Spring
pp.–1–25, 2018. Harbor Laboratory, Cold Spring Harbor, NY, USA. [Online]. Available:
[11] D. Bychkov et al., “Deep learning based tissue analysis predicts outcome https://www.biorxiv.org/content/early/2016/02/28/041616
in colorectal cancer,” Sci. Rep., vol. 8, no. 1, 2018, Art. no. 3359. [36] U. Eser and L. S. Churchman, “FIDDLE: An integrative deep learning
[12] M. Zou and Y. Zhong, “Transfer learning for classification of optical framework for functional genomic data inference,” Cold Spring Har-
satellite image,” Sens. Imag., vol. 19, no. 1, 2018. bor Laboratory, Cold Spring Harbor, NY, USA. [Online]. Available:
[13] Y. Zhou, Q. Hu, and Y. Wang, “Deep super-class learning for long-tail http://biorxiv.org/content/early/2016/10/17/081380
distributed image classification,” Pattern Recognit., vol. 80, pp. 118–128, [37] D. Kleftogiannis, P. Kalnis, and V. Bajic, “Deep: A general computational
2018. framework for predicting enhancers,” Nucleic Acids Res., vol. 43, no. 1,
[14] M. Bastos and D. Mercea, “Parametrizing Brexit: Mapping twitter political pp. 1–6, 2015.
space to parliamentary constituencies,” Inf. Commun. Soc., vol. 21, no. 7, [38] X. Min, N. Chen, T. Chen, and R. Jiang, “DeepEnhancer: Predicting en-
pp. 921–939, 2018. hancers by convolutional neural networks,” Proc. IEEE Int. Conf. BioIn-
[15] M. T. Ve ga, D. C. Mocanu, and A. Liotta, “Unsupervised deep learning format. Biomed., 2016, pp. 637–644.
for real-time assessment of video streaming services,” Multimedia Tools [39] B. Alipanahi, A. Delong, M. Weirauch, and B. Frey, “Predicting the se-
Appl., vol. 76, no. 21, pp. 22303–22327, Nov. 2017. quence specificities of DNA- and RNA-binding proteins by deep learning,”
[16] M. T. Vega, D. C. Mocanu, J. Famaey, S. Stavrou, and A. Liotta, “Deep Nature Biotechnology, vol. 33, no. 8, pp. 831–838, 2015.
learning for quality assessment in live video streaming,” IEEE Signal [40] Y. Zhang, X. Liu, J. N. MacLeod, and J. Liu, “DeepSplice: Deep classi-
Process. Lett., vol. 24, no. 6, pp. 736–740, Jun. 2017. fication of novel splice junctions revealed by RNA-Seq,” in Proc. IEEE
[17] Y. Zhang, J. Yao, and H. Guan, “Intelligent cloud resource management Int. Conf. BioInformat. Biomed., 2016, pp. 330–333.
with deep reinforcement learning,” IEEE Cloud Comput., vol. 4, no. 6, [41] J. Zhou and O. Troyanskaya, “Predicting effects of noncoding variants with
pp. 60–69, Nov./Dec. 2018. deep learning-based sequence model,” Nature Methods, vol. 12, no. 10,
[18] D. Yu and L. Deng, “Deep learning and its applications to signal and pp. 931–934, 2015.
information processing [exploratory DSP],” IEEE Signal Process. Mag., [42] A. Romero et al., “Diet networks: Thin parameters for fat genomic,” Int.
vol. 28, no. 1, pp. 145–154, Jan. 2011. Conf. Learning Representations 2017 (ICLR2017), Toulon, France, Apr.
[19] G. Hinton et al., “Deep neural networks for acoustic modeling in speech 2017.
recognition: The shared views of four research groups,” IEEE Signal [43] D. Quang, Y. Chen, and X. Xie, “DANN: A deep learning approach for
Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012. annotating the pathogenicity of genetic variants,” Bioinformatics, vol. 31,
[20] “Recent advances in deep learning for speech research at Microsoft,” in no. 5, pp. 761–763, 2015.
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2013. [On- [44] M. Skwark, D. Raimondi, M. Michel, and A. Elofsson, “Improved contact
line]. Available: https://www.microsoft.com/en-us/research/publication/ predictions using the recognition of protein like contact patterns,” PLoS
recent-advances-in-deep-learning-for-speech-research-at-microsoft/ Comput. Biol., vol. 10, no. 11, 2014, Art. no. e1003889.
[21] N. Majumder, S. Poria, A. Gelbukh, and E. Cambria, “Deep learning-based [45] P. Danaee, R. Ghaeini, and D. Hendrix, “A deep learning approach for
document modeling for personality detection from text,” IEEE Intell. Syst., cancer detection and relevant gene identification,” Proc. 22nd Pac. Symp.
vol. 32, no. 2, pp. 74–79, Mar. 2017. Biocomput., 2017, pp. 219–229.
[22] A. S. Becker, M. Marcon, S. Ghafoor, M. C. Wurnig, T. Frauenfelder, and [46] G. Di Modica and O. Tomarchio, “Matching the business perspectives
A. Boss, “Deep learning in mammography,” Investigative Radiol., vol. 52, of providers and customers in future cloud markets,” Cluster Comput.,
no 7, pp. 434–440. Feb. 2017. vol. 18, no. 1, pp. 457–475, 2015.
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.
76 IEEE REVIEWS IN BIOMEDICAL ENGINEERING, VOL. 11, 2018
[47] G. Di Modica, O. Tomarchio, and L. Vita, “Resource and service discovery Jiafu Wan (M’11) has been a Professor with
in SOA: A P2P oriented semantic approach,” Int. J. Appl. Math. Comput. the School of Mechanical and Automotive En-
Sci., vol. 21, no. 2, pp. 285–294, 2011. gineering, South China University of Technol-
[48] G. D. Modica and O. Tomarchio, “Semantic security policy matching in ogy, Guangzhou, China, since September 2015.
service oriented architectures,” in Proc. IEEE World Congr. Serv., Jul. He has directed 16 research projects, including
2011, pp. 399–405. the National Key Research and Development
[49] A. Celesti, M. Fazio, F. Celesti, G. Sannino, S. Campo, and M. Villari, Project, the National Natural Science Founda-
“New trends in biotechnology: The point on NGS cloud computing solu- tion of China, the High-level Talent Project of
tions,” Proc. IEEE Symp. Comput. Commun., 2016, pp. 267–270. Guangdong Province, and the Natural Science
[50] A. Celesti, N. Peditto, F. Verboso, M. Villari, and A. Puliafito, “Draco Foundation of Guangdong Province. His re-
PaaS: A distributed resilient adaptable cloud oriented platform,” in Proc. search interests include cyber-physical systems,
IEEE 27th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, intelligent manufacturing, big data analytics, Industry 4.0, smart factory,
2013, pp. 1490–1497. cloud robotics, and Internet of vehicles.
[51] A. Celesti, D. Mulfari, M. Fazio, A. Puliafito, and M. Villari, “Evaluating Prof. Wan is an Associate Editor for the IEEE ACCESS (SCI) and on
alternative DAAS solutions in private and public openstack clouds,” Softw. the Editorial Board of PLoS One (SCI), and he is a Managing Editor for
Pract. Experience, vol. 47, no. 9, pp. 1185–1200, 2017. the International Journal of Autonomous and Adaptive Communications
[52] A. Celesti, F. Celesti, M. Fazio, P. Bramanti, and M. Villari, “Are next- Systems (Ei Compendex) and International Journal of Applied Research
generation sequencing tools ready for the cloud?” Trends Biotechnology, and Technology (Ei Compendex). He has been a Leading Guest Editor
vol. 35, pp. 486–489, 2017. for several SCI-indexed journals, such as IEEE SYSTEMS JOURNAL, IEEE
ACCESS, Computer Networks (Elsevier), Mobile Networks & Applications
(Springer), Computers and Electrical Engineering (Elsevier), and Micro-
Fabrizio Celesti received the master’s degree processors and Microsystems (Elsevier). He is a senior member of both
in biotechnology from the University of Messina, CMES and CCF.
Messina, Italy, in 2017. He is currently work-
ing toward the second-level master postgradu-
ate course in advanced medical biotechnology
for diagnostic in laboratory at the Department of
Biomedical and Dental Sciences and Morpho-
functional Images, Messina University.
Since 2016, he has collaborated with the De-
partment of Engineering on research activities
on big genomics data analytics. Since 2016, he
has been a Technical Program Committee member of the IEEE ICT so-
lutions for eHealth (ICTS4eHealth). His main research interests include
ehealth, genomics, and big data analytics.
Massimo Villari (M’08) received the Laurea
degree in 1999 in electronic engineering, and
Antonio Celesti (M’16) received the master’s the Ph.D. in advanced technologies for informa-
degree in computer science in 2008, and the tion engineering in 2002, from the University of
Ph.D. in advanced technologies for informa- Messina, Messina, Italy. He is currently an As-
tion engineering in 2012, from the University of sociate Professor in computer engineering with
Messina, Messina, Italy. He is currently an Ad- the University of Messina. He is actively work-
junct Professor in databases with the University ing as an IT Security and Distributed Systems
of Messina. He has been a collaborator in many Analyst in cloud computing, virtualization, and
national and international projects, such as the storage. For the EU Project RESERVOIR, he
EU FP7 Project RESERVOIR—Resources and led IT security activities, and for the EU Project
Services Virtualization Without Barriers, from VISION-CLOUD, he covered the role of architectural designer for the
2008 to 2011, and the EU FP7 Project VISION University of Messina. He is currently working on the EU Project Fron-
CLOUD—Cloud Virtualized Storage Services Foundation for the Future tierCities, the Accelerator of FIWARE on Smart Cities—Smart Mobility,
Internet, from 2010 to 2013. From 2012 to 2014, he was an Assistant Re- where he is responsible for information and communications technology.
searcher with the University of Messina working on the POR FSE 2007– He is strongly involved in EU Future Internet initiatives, specifically cloud
2013 Project SIMONE, focusing on energy management and cloud com- computing and security in distributed systems. He has been a co-author
puting. Since 2012, he has been responsible for the Digital Iconographic of more than 130 scientific publications and patents on cloud comput-
Atlas of Numismatics in Antiquity. Since 2014, he has been an Assistant ing (cloud federation), distributed systems, wireless networks, network
Researcher with the University of Messina working on the EU FP7 project security, cloud security, and cloud and IoTs. He was a General Chair
FrontierCities—European Cities Driving the Future Internet. From 2015 of ESOCC 2015 and IEEE-ISCC 2016. Since 2011, he has been a fel-
to 2016, he was a collaborator in the EU Horizon 2020 Project BEACON, low of IARIA, recognized as a cloud computing expert, and has also
focusing on federated cloud networking. Since 2017, he has been a Prin- been involved in the activities of the FIArch, the EU Working Group on
cipal Investigator of the Horizon 2020 Project FrontierCities2 subgrant Future Internet Architecture. In 2014, he was recognized by an inde-
CASMOB and responsible for the University of Messina with the Project pendent assessment (IEEE TRANSACTIONS ON CLOUD COMPUTING, April
entitled “Do severe acquired brain injury patients benefit from Telere- 2014) as one of the worldwide active scientific researchers, top 27 clas-
habilitation? A cost-effectiveness analysis study,” funded by the Italian sification, in the cloud computing area. He has been a General Chair of
healthcare industry. He has co-authored many scientific publications on EAI-CN4IoT. He is an Editor-in-Chief of the EAI-endorsed Transactions
distributed systems and cloud computing. His main research interests on Smart Cities. He is currently the scientifist responsible for research
include distributed systems and cloud computing (with particular regard activities on Cloud and eHealth agreed upon between the University
to federation, storage, security, and energy efficiency), and assistive of Messina and IRCCS Centro Neurolesi “Bonino Pulejo” in eHealth:
technology. http://healthycloud.irccsme.it/.
Authorized licensed use limited to: Pázmány Péter Catholic University. Downloaded on September 27,2023 at 13:41:55 UTC from IEEE Xplore. Restrictions apply.