Professional Documents
Culture Documents
PDF High Performance Computing 5Th Latin American Conference Carla 2018 Bucaramanga Colombia September 26 28 2018 Revised Selected Papers Esteban Meneses Ebook Full Chapter
PDF High Performance Computing 5Th Latin American Conference Carla 2018 Bucaramanga Colombia September 26 28 2018 Revised Selected Papers Esteban Meneses Ebook Full Chapter
https://textbookfull.com/product/high-performance-computing-isc-
high-performance-2018-international-workshops-frankfurt-main-
germany-june-28-2018-revised-selected-papers-rio-yokota/
Smart Cities First Ibero American Congress ICSC CITIES
2018 Soria Spain September 26 27 2018 Revised Selected
Papers Sergio Nesmachnow
https://textbookfull.com/product/smart-cities-first-ibero-
american-congress-icsc-cities-2018-soria-spain-
september-26-27-2018-revised-selected-papers-sergio-nesmachnow/
https://textbookfull.com/product/advances-in-computing-13th-
colombian-conference-ccc-2018-cartagena-colombia-
september-26-28-2018-proceedings-jairo-e-serrano-c/
https://textbookfull.com/product/geo-informatics-in-sustainable-
ecosystem-and-society-6th-international-conference-
gses-2018-handan-china-september-25-26-2018-revised-selected-
papers-yichun-xie/
https://textbookfull.com/product/soft-computing-systems-second-
international-conference-icscs-2018-kollam-india-
april-19-20-2018-revised-selected-papers-ivan-zelinka/
Esteban Meneses
Harold Castro
Carlos Jaime Barrios Hernández
Raul Ramos-Pollan (Eds.)
High Performance
Computing
5th Latin American Conference, CARLA 2018
Bucaramanga, Colombia, September 26–28, 2018
Revised Selected Papers
123
Communications
in Computer and Information Science 979
Commenced Publication in 2007
Founding and Former Series Editors:
Phoebe Chen, Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu,
Krishna M. Sivalingam, Dominik Ślęzak, and Xiaokang Yang
High Performance
Computing
5th Latin American Conference, CARLA 2018
Bucaramanga, Colombia, September 26–28, 2018
Revised Selected Papers
123
Editors
Esteban Meneses Harold Castro
Instituto Tecnológico de Costa Rica Universidad de los Andes
Centro Nacional de Alta Tecnología Bogotá, Colombia
Pavas, Costa Rica
Raul Ramos-Pollan
Carlos Jaime Barrios Hernández Universidad de Antioquia
Universidad Industrial de Santander Medellín, Colombia
Bucaramanga, Colombia
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Steering Committee
Mateo Valero Barcelona Supercomputing Center, Spain
Gonzalo Hernández University of Santiago, Chile
Carla Osthoff National Laboratory for Scientific Computing, Brazil
Philippe Navaux Federal University of Rio Grande do Sul, Brazil
Isidoro Gitler Center for Research and Advanced Studies
of the National Polytechnic Institute, Mexico
Esteban Mocskos University of Buenos Aires, Argentina
Nicolas Wolovick National University of Cordoba, Argentina
Sergio Nesmachnow University of the Republic, Uruguay
Alvaro de la Ossa Osegueda University of Costa Rica, Costa Rica
Esteban Meneses National High Technology Center, Costa Rica
Carlos Jaime Barrios Industrial University of Santander, Colombia
Hernández
Harold Enrique Castro University of Los Andes, Colombia
Barrera
Gilberto Javier Diaz Toro Industrial University of Santander, Colombia
Luis Alberto Nunez de Industrial University of Santander, Colombia
Villavicencio Martinez
Program Committee
Alvaro Coutinho Federal University of Rio de Janeiro, Brazil
Bruno Schulze National Laboratory for Scientific Computing, Brazil
Carla Osthoff National Laboratory for Scientific Computing, Brazil
Daniel Cordeiro University of São Paulo, Brazil
Esteban Clua Federal Fluminense University, Brazil
Lucas Schnorr Federal University of Rio Grande do Sul, Brazil
Marcio Castro Federal University of Santa Catarina, Brazil
Pedro Mario Cruz Silva NVIDIA, Brazil
Roberto Pinto-Souto National Laboratory for Scientific Computing, Brazil
Luiz Angelo Steffenel Université de Reims Champagne-Ardenne, France
Luiz Derose Cray, USA
Ginés Guerrero University of Chile, Chile
Claudia Jiménez-Guarín University of the Andes, Colombia
Fabio Martinez Carrillo National University of Colombia, Colombia
Gilberto Javier Diaz Toro Industrial University of Santander, Colombia
Idalides Vergara-Laurens University of Turabo, Colombia
Julian Ernesto Jaramillo Industrial University of Santander, Colombia
Luis Fernando Castillo University of Caldas, Colombia
viii Organization
Artificial Intelligence
Accelerators
Fast Marching Method in Seismic Ray Tracing on Parallel GPU Devices. . . . 101
Jorge Monsegny, Jonathan Monsalve, Kareth León, Maria Duarte,
Sandra Becerra, William Agudelo, and Henry Arguello
Applications
Performance Evaluation
Cloud Computing
1 Introduction
A critical step of personalized medicine is to develop accurate and fast artificial
intelligence systems with lower rates of energy used for tailoring medical care
(e.g. treatment, therapy and usual doses) to the individual patient and predict
the length and cost of the hospital stay. In this context, inferring common patient
phenotype patterns that could depict disease variations, disease classification
and patient stratification, requires massive clinical datasets and computationally
intensive models [1,2]. Thus, the complex structure, noisy and missing data from
large Electronic Health Records (EHR) data became a core computational task
to automated phenotype extractions [3].
In this paper, we describe the unsupervised learning method for mining her
data and build low-dimensional phenotype representations using a mini-cluster
with 14 Jetson TX2 in order to distribute training and to obtain a patient pheno-
type representation. This representation could be used as an input of supervised
learning algorithms to predict the main purpose of inpatient care.
We present an application-framework called DiagnoseNET that provides
three high-level features: The first allows the construction of a processing work-
flow to select and extract the data to be processed, to construct a binary repre-
sentation, to reduce its dimensions through unsupervised learning and to process
the data through supervised learning; the second is a data resource management
to feeding the clinical dataset into the Jetson TX2 according to their memory
capacity, while multiple replicas of a model are used for minimizing the loss
function and third, an energy-monitoring tool for scalability analyses impact of
using different batch size factor to minimize the number of epochs needed to
converge and projected the energy efficiency measures.
2 Related Work
In the past century, health research models were traditionally designed to identify
patient patterns given a single target disease, where domain experts supervised
definitions of the feature scales for that particular target and usually worked with
small sample size, which was collected for research purpose [4,5]. Nevertheless,
in general, clinical data are noisy, irregular and unlabeled to directly discover
the underlying phenotypes. This is supposed to be a limitation for this approach.
Nowadays, computer science has facilitated the design and the implementation of
emerging frameworks and practical approaches, offering different ways to extract
valuable information as phenotypes [6].
To derive patients’ phenotypes, it is necessary to extract the occurrence of
their medical data (demographics, medical diagnoses, procedures performed, cog-
nitive status, etc.). Although possibly the evolution of this information over
time must be able to be extracted. A used method is vector based representation
in which, for each medical target is constructed a matrix correlation between
patients and medical group features [7], The generation of the different vectors
generally takes an important time. A couple of other possibilities are nonnegative
Parallel and Distributed Processing 5
Fig. 1. Workflow scheme to automate patient phenotype extractions and apply them
to predict different medical targets.
medical acts in French CCAM codes and other derived objects as admission hos-
pital details represented in codes established by the French agency ATIH and
generated by the PMSI system to standardize the information contained in the
patient’s medical record. The collection functions are:
1. Clinical Document Architecture (CDA): Identifies the syntax for clinical
records exchange between the system PMSI and DiagnoseNET, through the
new versions generate by the agency ATIH. The CDA schema basically con-
sists of a header and body:
– Header: Includes patient information, author, creation date, document
type, provider, etc.
– Body: Includes clinical details, demographic data, diagnosis, procedures,
admission details, etc.
2. Features Composition: Serialize each patient record and get the CDA object
for processing all patient attributes in a record object.
3. Vocabulary Composition: Enables dynamic or custom vocabulary for select-
ing and crafting the right set of corresponding patient attributes by medical
entities.
4. Label Composition: This function gets the medical target selected from the
CDA schema to build a one-hot or vector representation.
5. Binary Record: Mapping the features values from record object with the
corresponding terms in each feature vocabulary, to generate a binary corpus
using Term-document Matrix.
Unsupervised Representation Learning
After the significant success of representation learning to encode audio, images
and text with rich, high-dimensional datasets [23,24,27]. In this work we extend
the deep patient approach [13], in which all the clinical descriptors are grouped in
patient vectors and each patient can be described by a high-dimensional vector
or by a sequence of vectors computed in a predefined temporal window.
The vector collection is used to derive the latent representation of the patient
via an unsupervised encoder network. At each step of the network, the cod-
ing matrix and the associated latent representation of a smaller dimension are
obtained simultaneously as shown in Fig. 2.
This unsupervised encoder network is composed of an Unsupervised Stacked
Denoising Autoencoders: a deterministic mapping from cleaning partially cor-
rupted input x̃ (denoising) to obtain a hidden features representation y = f θ(x̃)
by layer. Therefore, each stacked layer is independently trained to reconstruct
a clean input x from a corrupted version of it z = gθ (y), this approach was
introduced by [25].
Previously each encoder was pre-trained to get a semantics representation
parameters θ by denoising autoencoder which was trained before, to obtain a
robust representation y = f θ(x̃) from a corrupted input x̃. This is represented
by the next steps:
1. Applied dropout to corrupting the initial input x into x̃ the stochastic map-
ping x ∼ qD(x|x).
8 J. A. Garcı́a Heano et al.
batch size with a GPU SM frequency of 1071.97 and 1015.49 MHz. To illus-
trate the impact of the idle status on the GPU, generated by large data batch
partition, we can observe the same window of 6 min shown in the Fig. 5a. This
window is extracted of the training of the AE model when it is executed using
three different batch partitions.
The Model Dimensionality as a Factor to Generate Quality Latent
Representation
This subsection studies the relationship between model complexity, network con-
verge time and its reliance to generate a low-dimensional space as a latent patient
phenotype representation.
Specifically the experiment comparison using an established-set of hyperpa-
rameters on three model variations for each network to compare the sigmoid
vs. relu as activation function to generate the latent representation, using three
variations of number of neurons per layer as [4086, 2048, 768], [2000, 1000, 500]
and [500, 500, 500].
The evaluation of accuracy is measured comparing the latent representation
generated by each network model and used as input for random forest classifi-
cation of the clinical major category on 23 labels and their energy efficiency.
1. The first network selected was a traditional fully connected autoencoders with
three hidden layers to generate the latent representation.
2. The second is an End to End network using three hidden fully connected
autoencoders to generate the latent representation as input for the next four
hidden multilayer perceptron.
3. Encoder network using three hidden unsupervised stacked denoising autoen-
coders to initialize the next three hidden layers to encode and generate the
latent representation (Figs. 6 and 7).
Fig. 4. Network convergence using batch partitions of [20000, 1420, 768] records to
generate [4, 59, 110] gradient updates by epoch respectively.
Parallel and Distributed Processing 13
Fig. 5. Impact of GPU idle status generated by large data batch partition, consider
the power consumption in a window of 6 min for the previous experiment.
50.6 Sec in average for processing one 25.75 Sec in average for processing one
epoch on 1 PS & 8 workers. epoch on 1CPU and 1GPU Titan X.
Fig. 8. Early convergence comparison between different groups of workers and task
granularity for distributed training with 10.000 records and 11.466 features.
Parallel and Distributed Processing 15
Table 2. Preliminary results for processing the unsupervised patient phenotype rep-
resentation on the mini-cluster Jetson TX2.
5 Conclusions
The work carried out so far has allowed us to highlight that the use of a well-
chosen latent representation instead of the initial binary representation could
make it possible to significantly improve processing times (up to 41%) while
maintaining the same precision.
Minimizing the execution time of a perceptron multi-layer on a Jetson TX2
cluster, whether to perform an auto-encoder or to perform a classification,
depends on the application’s ability to efficiently distribute data for analysis
to the various Jetsons based on the available SSD memory space and then cut
that data into a mini-batch based on the available memory space on the GPUs.
Using hundreds of gradient updates by epochs with synchronous data par-
allelism offer an efficient distributed DNN training to early convergence and
minimize the bottleneck of data transfer from host memory to device memory
reducing the GPU idle status.
The current work on the platform aims to reinforce these main elements by
comparing the performance that can be obtained for the other tasks obtained
on various platforms.
The first platform is a standard computer with a GPU, the second platform is
a 24 Jetson TX cluster connected by an Ethernet switch and the third platform
is an array server consisting of 24 Jetson TX all connected via a Gigabit Ethernet
through a specialized managed Ethernet Switch and marketed to Connect Tech.
The performance concerns the precision that can be obtained, the execution
time of the model and the energy consumption necessary to obtain it.
References
1. Heinzmann, K., Carter, L., Lewis, J.S., Aboagye, E.O.: Multiplexed imaging for
diagnosis and therapy. Nature Biomed. Eng. 1, 09 (2017)
2. Cheng, Y., Wang, F., Zhang, P., Hu, J.: A Deep Learning Approach, Risk Predic-
tion with Electronic Health Records (2016)
3. Lasko, T.A., Denny, J.C., Levy, M.A.: Computational Phenotype Discovery Using
Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data
(2013)
4. Matheny, M.E., et al.: Development of inpatient risk stratification models of acute
kidney injury for use in electronic health records. Med. Decis. Making 30(6), 639–
650 (2010)
5. Kennedy, E.H., Wiitala, W.L., Hayward, R.A., Sussman, J.B.: Improved cardiovas-
cular risk prediction using nonparametric regression and electronic health record
data. Med. Care 51(3), 251–258 (2013)
6. Sheng, Y., et al.: Toward high-throughput phenotyping: unbiased automated fea-
ture extraction and selection from knowledge sources. J. Am. Med. Inform. Assoc.
22(5), 993–1000 (2015)
7. Wang, X., Wang, F., Hu, J.: A multi-task learning framework for joint disease risk
prediction and comorbidity discovery. In: Proceedings of the 2014 22nd Interna-
tional Conference on Pattern Recognition, ICPR 2014, pp. 220–225. IEEE Com-
puter Society, Washington, DC (2014)
8. Ho, J.C., et al.: Limestone: high-throughput candidate phenotype generation via
tensor factorization. J. Biomed. Inform. 52, 199–211 (2014)
9. Perros, I., et al.: SPARTan: scalable PARAFAC2 for large & sparse data. In: Pro-
ceedings of the 23rd ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining, Halifax, NS, Canada, 13–17 August, 2017, pp. 375–384
(2017)
10. Perros, I., et al.: SUSTain: scalable unsupervised scoring for tensors and its appli-
cation to phenotyping. CoRR, abs/1803.05473 (2018)
11. Choi, E., Bahadori, M.T., Searles, E., Coffey, C., Sun, J.: Multi-layer representation
learning for medical concepts. CoRR, abs/1602.05568 (2016)
12. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new
perspectives, April 2014
13. Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised repre-
sentation to predict the future of patients from the electronic health records. Sci.
Rep. 6, 26094 (2016)
14. Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional
net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017)
15. Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based
attention model for healthcare representation learning. CoRR, abs/1611.07012
(2016)
16. Dean, J., et al.: Large scale distributed deep networks. In: NIPS (2012)
17. Keuper, J., Preundt, F.-J.: Distributed training of deep neural networks: theoretical
and practical limits of parallel scalability. In: Proceedings of the Workshop on
Machine Learning in High Performance Computing Environments, MLHPC 2016,
pp. 19–26, IEEE Press, Piscataway (2016)
18. Zhang, W., Wang, F., Gupta, S.: Model accuracy and runtime tradeoff in dis-
tributed deep learning: a systematic study. In: Proceedings of the Twenty-Sixth
International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 4854–
4858 (2017)
Parallel and Distributed Processing 17
19. Zhang, L., Ren, Y., Zhang, W., Wang, Y.: Nexus: bringing efficient and scalable
training to deep learning frameworks. In: 25th IEEE International Symposium on
Modeling, Analysis, and Simulation of Computer and Telecommunication Systems,
MASCOTS 2017, Banff, AB, Canada, 20–22 September, 2017 (2017)
20. Dünner, C., Parnell, T.P., Sarigiannis, D., Ioannou, N., Pozidis, H.: Snap Machine
Learning. CoRR, abs/1803.06333 (2018)
21. Jensen Peter, B., Jensen Lars, J., Søren, B.: Mining electronic health records:
towards better research applications and clinical care. Nature Rev. Genet. 13, 395
(2012)
22. Hripcsak, G., Albers, D.J.: Next-generation phenotyping of electronic health
records. JAMIA 20(1), 117–121 (2013)
23. Bengio, Y.: Deep learning of representations for unsupervised and transfer learning.
In: Proceedings of the 2011 International Conference on Unsupervised and Transfer
Learning Workshop - Volume 27, UTLW 2011, pp. 17–37. JMLR.org (2011)
24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representa-
tions of words and phrases and their compositionality. In: Proceedings of the 26th
International Conference on Neural Information Processing Systems - Volume 2,
NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013)
25. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denois-
ing autoencoders: learning useful representations in a deep network with a local
denoising criterion (2010)
26. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous dis-
tributed systems. CoRR, abs/1603.04467 (2016)
27. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video
representations using LSTMs. In: Proceedings of the 32nd International Conference
on Machine Learning (2015)
Evolutionary Algorithms
for Convolutional Neural Network
Visualisation
1 Introduction
During the past ten years, neural networks (NN) and particularly deep neural
networks (DNN) have come back to the forefront of artificial intelligence (AI)
and machine learning (ML) research with the “deep learning” (DL) craze.
Among them, convolutional neural networks (CNN or convnets) have a spe-
cial position for at least two reasons. They can be parallelised particularly well on
processors that were originally conceived for graphical processing (graphics pro-
cessing unit – GPU), and variant of those processors designed for deep learning
use are now relatively common. Most of the improvements in image processing
(object recognition and classification, etc.) of the recent years are actually due
to these convnets.
However, even if convnets are more parsimonious than similar-sized fully-
connected networks, an old criticism of the NN resurfaced: Once a network is
c Springer Nature Switzerland AG 2019
E. Meneses et al. (Eds.): CARLA 2018, CCIS 979, pp. 18–32, 2019.
https://doi.org/10.1007/978-3-030-16205-4_2
Evolutionary Algorithms for Convolutional Neural Network Visualisation 19
trained, it may provide a model that makes accurate prediction, but this model is
nonetheless a huge black box. Said otherwise, the computer learned how to per-
form a task, but we humans are no more advanced when it comes to understand
how it performs it, i.e., what is the overall algorithmic process (in a classical,
narrow and deterministic meaning) doing the task. At best, we have created
some kind of oracle.
One of the best known approaches [1] to lift this veil covering the convnet box
is to try to reverse it, from some internal feature of interest to the input (image)
space. Using a large image-set, one selects the few pictures that maximise the
feature’s activation. Then going back from this activated feature (the other ones
being silenced) provides a visualisation of the activated feature and hence an
idea of what made it react.
We see two main limitations in this approach:
– It is limited by the size of the image-set. Even a huge image-set is dwarfed
by the size of the actual entry space. Hence the “maximal activation” seen is
actually the maximal activation possible with this data-set.
– Most current convnets have choke points (pooling layers, etc.) where the size
of the processing space is reduced (dimensionality reduction step), temporar-
ily or not. This necessarily loses information and hinders the reconstruction of
a feature back in the entry space. Approaches have been developed to bypass
this problem (e.g., saving some intermediate space that is then used for the
reconstruction), yet the issue remains.
Our contribution here consists in using another branch of ML, evolutionary
algorithms (EA), to study a CNN’s features without the limitations aforemen-
tioned: An EA can work in the image space directly and evolve images that at
each generation increase a feature’s activation. By working in the image space,
the reconstruction issue is simply non-existent. By using an evolutionary process,
we are not constrained by the size of an image-set (even if using an image-set’s
pictures that maximise the activation for the first generation can give the EA a
head-start).
This paper combines ideas from different fields, notably neural networks and
evolutionary algorithms. In Sect. 2, we present briefly how a convolutional neural
network is organised, the challenge to understand these networks, some existing
approaches and our proposal. This last item leads us to explain what evolutionary
algorithms are and how they work. We also insist in Sect. 3 on how they differ
from neural networks, and what are the strengths and weaknesses of each. Then,
we describe our strategy and implementation in Sect. 4 before discussing some of
the results obtained in Sect. 5. Future working directions are sketched in Sect. 6.
Lo Ta rikkoo munkkivalansa.
Miss' vuoren alla tanner on, siin' punas hurme hiekan; mut
sitä paikkaa myllertäin löys' härkä maasta miekan. Ja virran
tumman partaalta puhalsi tuores tuuli — oh, miehensä on
jättänyt Ju kaunis, hymyhuuli, j.n.e.
— Mitä?… joisinko?…
— Niin, tätä minun viinaani, — selitti mies. — Minä tuon sitä tänne
vuorelle myydäkseni luostarin palvelusväelle. Se on minun toimeni.
Mutta luostarinjohtaja on antanut ankaran määräyksen, että jos
myymme munkeille pisaraakaan viinaa, niin meitä rangaistaan, me
menetämme kaikki oikeutemme ja karkotetaanpa meidät luostarin
läheisyydestäkin. Me olemme kauppamme tähden riippuvaisia
luostarista ja asumme siksi sen läheisyydessä. Tästä syystä en
uskalla millään ehdolla myydä teille viinaa.
Kun mies huomasi, että keskustelusta saattoi tulla täysi tosi, hän
sieppasi lekkerinsä ja läksi pakoon. Mutta Lo Syvähenki ryntäsi
hänen jälkeensä, tarttui lekkereihin ja potkaisi kaupustelijan
menemään kuin kerän maata pitkin. Sitten hän vei lekkerit
kesämajaan, otti kupin, avasi tulpan ja alkoi lipoa viinaa. Pian hän oli
tyhjentänyt toisen lekkerin, minkä tehtyään hän huusi miehelle:
Lo Syvähenki, joka oli vielä liian vähän aikaa ollut munkkina, jotta
olisi voinut sotaisen luonteensa hillitä, suuntasi sanomattoman
ylenkatseellisen silmäyksensä portinvartioihin ja alkoi sättiä heitä:
— Tuokaa ryyppy!
— Älkää nyt siinä enää noin kovin tarkasti kysykö, vaan tuokaa iso
pullo, niin sittenhän riittää.
— Minä arvelin, että koska olette munkki, niin ette saa syödä
koiranlihaa ja sentähden en minä sitä ilmottanut.