Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE

Concurrency Computat.: Pract. Exper. 2016; 28:20242030


Published online 19 February 2016 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3774

EDITORIAL

New advances in High Performance Computing and simulation:


parallel and distributed systems, algorithms, and applications

1. INTRODUCTION

Recent developments in research and technological studies have shown that High Performance
Computing (HPC) will indeed lead to advances in several areas of Science, engineering, and
technology, permitting the successful completion of more computationally intensive and data-
intensive problems such as those in healthcare, biomedical and biosciences, climate and
environmental changes, multimedia processing, design and manufacturing of advanced materials,
geology, astronomy, chemistry, physics, and even nancial systems. However, further research is
required for developing computing infrastructures, models to support newly evolving architectures,
programming paradigms, tools to simulate and evaluate new approaches and solutions, and
programming languages that are appropriate for the new and emerging domains and applications.
The development of the HPC infrastructure has been accelerated by the advances in silicon
technology, which permitted the design of complex systems able to incorporate many hardware and
software blocks and cores. More precisely, recent rapid advances in technology and design tools
enabled engineers to design systems with hundreds of cores, called multi-processor system-on-chip.
These systems are composed of several processing elements, that is, dedicated hardware and
software components that are interconnected by an on-chip interconnect. According to Moores law,
the number of cores on-chip will double every 18 months; therefore, thousands of cores-on-chip will
be integrated in the next 20 years to meet the power and performance requirements of applications.
Moreover, current trends on the road to exascale are moving toward the integration of more and
more cores into a single chip [1, 2]. For example, accelerators and heterogeneous processing offer
some opportunities to greatly increase computational performance and to match increasing
application requirements [3]. Engineering these computing systems is one of the most dynamic
elds in modern Science and technology. That said, there will continue to be a growing demand for
more powerful HPC in the upcoming years, not just to tackle basic mounting computing needs but
also to lay out the foundations for the HPC market that is becoming potentially larger than the
desktop/laptop computer market. Furthermore, HPC is turning out to be a major source of hope for
future applications development that require greater amounts of computing resources in various
modern Science domains such as bioengineering, nanotechnology, and energy where HPC
capabilities are mandatory in order to run simulations and perform visualization tasks.
At the time of writing this editorial, petaop computing is well established [4]. Several architectures
are making major breakthroughs: commodity, accelerators on commodity, and special-purpose cores.
All of top 500 systems are based on multicore technologies [5]. HPC usage is growing considerably,
especially in industry. And signicant efforts toward establishing exascale are underway. At the
same time, several challenges have been recently identied in order to create large-scale computing
systems that meet current and projected application requirements. Most of them are related to
system architectures, algorithms, big data processing, and programming models [6]. However,
energy cost, resilience, Central Processing Unit (CPU) access latency and memory transfers are key
challenges to address in the era of exascale. To address these challenges, completely new
approaches and technologies and a shift from the current approaches used for application
development and execution to adaptive approaches are required. Consequently, further research is
required for developing advanced exascale-based computing infrastructures, models, and paradigms
to support newly emerging architectures, programming models, tools to simulate and evaluate more

Copyright 2016 John Wiley & Sons, Ltd.


EDITORIAL 2025

elaborate solutions and applications, and programming languages that are appropriate for these new
and emerging domains and challenges.
This special issue is intended to provide an overview of some key topics and state-of-the-art of recent
advances in subjects relevant to High Performance Computing and simulation. The general objectives are
to address, explore, and exchange information on the challenges and current state-of-the-art in high-
performance and large-scale computing systems, their use in modeling and simulation, their design,
performance and use, and their impact in various Science and engineering domains and applications.

2. THEMES OF THIS SPECIAL ISSUE

This special issue contains research papers addressing the state-of-the-art in high-performance and large-
scale computing systems. A set of carefully selected works was invited based on the original presentations
at the 2013 IEEE International Conference on High Performance Computing and Simulation (HPCS
2013), which was held in Helsinki, Finland, July 0105, 2013 [7]. The extended works have been
thoroughly reviewed by an international technical reviewing committee, and only thirteen papers
covering a wide range of relevant challenges in HPC were selected for this special issue. The
manuscripts tackle research on different topics including HPC, distributed, Peer-to-Peer (P2P) systems,
data mining, Graphics Processor Unit (GPU), multicore systems as well as real-world simulations
related to Computational Fluid Dynamics (CFD), neuroinformatics, bioinformatics and weather forecast
performed on large computational infrastructures. The set of accepted papers can be organized under
the following key subjects and subsections, and are briey described in the remaining parts of this Section.

2.1. Accelerators, multicore, and special-purpose hybrid architectures


Accelerating compute-intensive applications is another recent research in HPC domain. These
accelerators are special-purpose processors, which are designed mainly to speed up compute-intensive
sections of applications and achieve better performance than CPUs for certain workloads [8]. There are
two types of accelerators: eld-programmable gate arrays (FPGAs) and GPUs. FPGAs are highly
customized and designed to be congured, while GPUs provide massive parallel execution resources
and high memory bandwidth. GPU is designed to rapidly manipulate and alter memory to accelerate
image-processing applications. Generally, GPUs are easier to program and require less hardware
resources, while FPGAs provide the best expectation of performance, exibility, and low overhead.
Hardware acceleration using GPUs or FPGAs could potentially improve run times or higher
accuracy simulations. Werner et al. [9], in their article Accelerated Join Evaluation in Semantic Web
Databases by Using FPGA, provide different FPGA implementations of the join operation in the
context of Semantic Web Databases. Authors develop a exible FPGA-based hardware accelerator
to improve the performance of query evaluation in a Semantic Web database. They propose an
architecture based on partial reconguration to integrate the FPGA in the software system logically
and physically optimized Semantic Web Database engines [10] to accelerate the database operations.
Thus, hardware architecture considers joining algorithms for query execution implemented on
FPGA. Experimental results are compared with a C code software solution for general-purpose CPU
and show the efciency of the hardware-based solution.
Another application that could benet from using accelerators like FPGA and GPU is the Weather
Research and Forecasting (WRF) model [11]. It is a model designed to serve both atmospheric research
and operational forecasting needs. It is a next-generation mesoscale numerical weather prediction
system to allow researchers to generate atmospheric simulations based on real data. However, the
WRF model requires signicant execution time and storage space. The porting to HPC platforms
enables the simulations to run faster. GPUs are designed for computationally intensive applications,
via a large number of threads on a larger number of processing elements. In their article An Analysis
of the Feasibility and Benets of GPU/Multicore Acceleration of the Weather Research and
Forecasting Model, Vanderbauwhede and Takemi [12] show that porting numerical weather
prediction model on GPU outperform current multi-core CPUs implementations. First, a simple
study is done to evaluate the possible gains of porting a kernel to the GPU. Then, one kernel is

Copyright 2016 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2016; 28:20242030
DOI: 10.1002/cpe
2026 EDITORIAL

selected through proling and ported to GPU using OpenCL. The performance is then studied both
independently and once the kernel integrated into WRF. Porting the code for GPU greatly improves
the parallelization, which translates into better scalability for the OpenMP version of the code.
HPC infrastructures can also be used for the development, acceleration, and application of
bioinformatics applications. Jaziri et al. [13], in their article High Performance Computing of
Oligopeptides Complete Backtranslation applied to DNA Microarray Probe Design, tackle the issue
of large-scale backtranslation of oligopeptides, a step of generating all possible nucleic acid
sequences from a protein sequence, which is needed for the discovery of new organisms. Because
back translation is a time-consuming task that can generate very large quantities of data, authors
propose an efcient distributed algorithm to compute a complete back translation of several
hundreds of oligopeptides for functional Deoxyribose Nucleic Acid microarrays. The proposed
algorithm was implemented, and simulations have been conducted on both simulated and real
biological datasets. The results reported show a signicant computing speedup on different
architectures (symmetric multiprocessors, cluster, and grid).

2.2. Software systems, languages, and libraries


Most applications are computationally intensive. Scientists have traditionally attempted to parallelize
their algorithms across HPC infrastructures. However, this task requires signicant work and effort
for learning parallel programming. Developing parallel libraries and high-level and easy to use
languages are required to hide parallelization complexity of programs.
Another important issue that can inuence HPC systems performance is vectorisation. Many
scientic codes have vectorisation potential that cannot be exploited due to an algorithm-driven
choice of data layouts. In their article Data Layout Inference for Code Vectorisation, Sinkarovs and
Scholz [14] propose an interesting approach for automatically generating efcient code for
vectorisation by mainly focusing on the evaluation of a family of data layout transformations. The
authors demonstrate the effectiveness of their approach by applying it on an N-body simulation.
Coullon et al. [15], in their paper Implicit parallelism on 2D meshes using SkelGIS, tackle the issue
of overcoming restrictions in parallelization of scientic simulations because of the complexity of
functional concepts and specic features. Parallelization of scientic simulations requires a lot of
efforts and eld-specic knowledge to produce efcient parallel programs. For this reason, the
authors introduce SkelGIS as a solution for abstracted and implicit parallelism. They apply SkelGIS
to solve heat equations and shallow-water equations and compare both the SkelGIS performance and
the SkelGIS programming effort with Message Passing Interface (MPI) solutions counterparts.

2.3. High Performance Computing benchmarks and evaluation tools


The DARPA High Productivity Computing Systems program (20022011) has dened and released
benchmark suits for measuring performance, portability, programmability, robustness, and the
productivity in the HPC domain [3, 16]. This suite is composed of several performance tests that are
required to examine the performance and classify the HPC architectures, languages, and libraries
[6]: (i) High-Performance Linpack for evaluating the oating point rate of execution for solving a
linear system of equations; (ii) Double precision GEneral Matrix Multiply for measuring the oating
point rate of execution of double precision real matrixmatrix multiplication; (iii) STREAM for
sustainable memory bandwidth evaluation; (iv) PTRANS for testing the total communications
capacity of the network; (v) Random Access, which is required for measuring the rate of integer
random updates of memory; (vi) Fast Fourier Transform for evaluating the oating point rate of
execution of double precision complex one-dimensional Discrete Fourier Transform; and (vii) b-eff
for measuring latency and bandwidth of a number of simultaneous communication patterns.
Heinecke et al. [17], in their article Data Mining on Vast Datasets as a Cluster System Benchmark,
introduce recent situation about benchmark and procurement of supercomputers, and describe the trend
from Linpack benchmark to miniapp benchmarks to determine the performance in real usage. Also, it
describes the difculty to optimize the benchmark for new architectures. They discuss a data mining
application that is compared on different (accelerated) cluster architectures and the needed optimization
to efciently run on different platforms. The authors in their work demonstrate such an optimization for

Copyright 2016 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2016; 28:20242030
DOI: 10.1002/cpe
EDITORIAL 2027

a data mining algorithm, which solves regression and classication problems on vast datasets. In other
terms, the authors propose a data mining application for cluster system benchmark using overlapping of
computation and communication to hide latency and overhead reduction. Experiments have been
conducted with different datasets and on the SuperMUC machine at the Leibniz-Rechenzentrum, the
local CoolMAX AMD GPU cluster in Munich, the Phi-accelerated Beacon at University of Tennessee,
and the Todi Cray XK7 at the Swiss National Supercomputing Centre. The performance results show
that for strong scaling settings, GPUs and coprocessors suffer from lack of parallelism and do not
perform as well at large scale. For weak scaling settings however, they always outperform.

2.4. High Performance Computing, cloud and distributed computing infrastructures


Several challenges, as stated previously, have been identied in order to create large-scale computing
systems that meet current application requirements. These computing systems may rely on distributed
computing mechanisms, implemented often as clusters and clouds, to provide continuous access to a
variety of resources, for example, processing cores, large data stores, and information repositories.
For example, a computational grid is a distributed computing infrastructure that can provide globally
available network resources. These environments have the potential and ability to integrate large-
scale computing resources, on demand. User ability to compute will no longer be limited to the
resources he has at hand currently or those localized statically on a set of hosts known a priori.
However, the main challenge in large-scale computing is how to program and control these
distributed systems (Clouds or Exascale) with a billion nodes. Evidently, algorithms and simulators
have to be developed to explore these new infrastructures. Distributed peer-to-peer control
mechanisms could be devised and used to simulate new algorithms and computing architectures. In
their article Flexible Replica Placement for Optimized P2P Backup on Heterogeneous, Unreliable
Machines, Skowron and Rzadca [18] tackle the issue of data replication over distributed P2P
systems with unreliable machines. They introduce a P2P backup system architecture using an
optimal replication strategy for storing data over distributed P2P systems.
Furthermore, research works to date have concentrated on static approaches tailored to parallelize
existing applications on different HPC systems. However, the rapid growth in the size and
complexity of contemporary distributed parallel applications, usually assembled out of a set of
interacting software components executed over distributed and heterogeneous platforms, makes such
approaches unsuitable for these dynamic environments. Therefore, dynamic approaches are required
to allow the system to autonomously adapt its structure and its behavior during the course of its
operation. In other words, these approaches allow the system to automatically modify its
conguration according to the settings of its computing environment and the properties of its
workload. These approaches are mainly motivated by the following issues. The high number of
nodes makes the system vulnerable to failures. Consequently, its ability to react autonomously to
faults is challenging. For example, the systems nodes should react to the changing environment by
taking over pending tasks from faulty nodes. Static and centralized congurations of these systems
are difcult or even impossible to use for dynamic and large-scale applications. For instance, for a
large system with thousands of nodes, several applications could compete for resources. However,
managing the resources at runtime and in a decentralized manner is a challenging task.
Mencagli [19] in his article, Adaptive Model Predictive Control of Autonomic Distributed Parallel
Computations with Variable Horizons and Switching Costs, addresses the dynamic reconguration
issue of parallel computations by proposing an automatic method for reconguring distributed
parallel computing. An autonomic computing approach is applied. It monitors the behavior of
parallel modules and adjusts the degree of parallelism in order to achieve a global optimization
while balancing the number of recongurations in view of performance and efciency. The
approach is based on model predictive control. The paper investigates especially the inuence of
different horizon lengths and different models for switching costs. The proposed method was
evaluated by using a video-streaming application with synthesized workload. A model predictive
control-based policy is evaluated with xed horizon and variable horizon, respectively, and results
are reported to show solutions effectiveness in improving the target properties of the adaptation
process. Such autonomic control models to automate reconguration management based on current

Copyright 2016 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2016; 28:20242030
DOI: 10.1002/cpe
2028 EDITORIAL

system load and application status could be employed in cloud computing platforms in which decision-
making strategies are required for the purpose of resource management.

2.5. Applications and emerging domains


Several compute-intensive and emerging applications have also been the subject of extensive research.
These applications range from life sciences (e.g., medical imaging and gene sequencing), nancial
trading, oil and gas exploration, to bioscience, combustion (e.g., complex uid simulation),
astrophysics (e.g., formation of stars, evolution of galaxies), and environment (e.g., modeling world
climate). The special issue includes a few such application areas with interesting representative works.

2.5.1. Image processing. Future human brain neuroimaging requires the integration of HPC to
achieve high-temporal and high-spatial resolutions. Salman et al. [20], in their article Concurrency
in Electrical Neuroinformatics: Parallel Computation for Studying the Volume Conduction of Brain
Electrical Fields in Human Head Tissues, highlight the necessity of integrating HPC tools and
techniques in order to have a systematic methodology for analyzing the main factors and study the
interdependent parameters that affect the accuracy of solutions especially for large, multi-
dimensional images. Their paper discusses challenges in human brain neuroimaging, particularly
how to achieve high-temporal and high-spatial resolution. They provide two accurate, efcient, and
reliable nite difference method-based forward solvers that are parallelized using OpenMP in shared
memory and CUDA on GPU in order to show that advances in neuroimaging science and
engineering will depend signicantly on HPC integration.
In their article A Novel Technique for Detecting Suspicious Lesions in Breast Ultrasound Images,
Karimi and Krzyzak [21] address an important practical problem of automatic classication of breast
lesion images using ultrasound. The problem of automatic classication of suspicious masses in
ultrasound images has fundamental importance in oncology. The main advantage of ultrasound is
that it is a noninvasive diagnostic tool, its main disadvantage being the heavy presence of acoustic
noise. Any progress in automatic breast cancer classication using ultrasound may have signicant
impact on early detection and treatment of breast cancer. The authors tackle this issue by
introducing a novel automatic classication technique of suspicious breast lesions using ultrasound
images. The system proposed in the paper is a pipeline, which consists of several functional
components. The rst component uses the fuzzy logic approach, texture and morphology for de-
noising, and segmentation of suspicious lesions. The second component deals with feature extraction
and selection. The authors considered geometrical, texture, and morphological features. After
applying sequential forward and backward searches, they selected the best features and passed them
to the third component, which implements support vector machine classier categorizing suspicious
lesions into benign and malignant classes. The system is validated by a computer experiment on 80
real images. According to the authors, its performance reached a 98% success rate. It was then
compared with two other methods, which were signicantly outperformed by the proposed system.

2.5.2. Pattern recognitions. Pattern recognition is another research eld, which focuses on the
recognition of patterns and regularities in data. Most approaches used in pattern recognition employ
classication methods. Support vector machines (SVM) are considered the most widely used
classication technique in the pattern recognition community. It is a supervised learning model with
associated learning algorithms that are used for classication and regression analysis needed for
recognizing patterns and data analysis. In other words, SVM is mainly a classier method that
performs classication tasks by constructing hyperplanes in a multidimensional space. Chen et al.
[22], in their article Sparse Support Vector Machine for Pattern Recognition, propose to improve
the SVM classication techniques by using sparse SVM classication. They examine the sparse
SVMs performance on pattern recognition. The authors implement a sparse SVM, base on the
LIBSVM source code, with the RBF kernel and veried its effectiveness on eight datasets from
LIBSVM: SVMGuide4, vowel, SVMGuide3, Deoxyribose Nucleic Acid, Satimage, SVMGuide1,
stage, yeast. Experimental results conducted in this paper show that the proposed SVM is feasible in
practical pattern recognition applications.

Copyright 2016 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2016; 28:20242030
DOI: 10.1002/cpe
EDITORIAL 2029

Codreanu et al. [23], in their paper Evaluating Automatically Parallelized Versions of the Support
Vector Machine, deal with the parallelization on multi-core computers of SVM supervised learning
algorithm. They propose a new gradient-ascent-based SVM algorithm combined with a particle swarm
optimization algorithm for automatic parameter tuning. The authors have investigated two parallelization
approaches on GPU using GPSME toolkit and OpenACC. The results reported demonstrate an
important speed-up for the proposed approach when compared with the CPU and OpenACC versions.

2.5.3. Complex uid simulation. Lattice Boltzmann Methods are classes of compute-intensive
applications for complex uid simulation that have attracted interest from researchers in computational
physics. Lattice Boltzmann Methods are typical examples for the large class of algorithms used to
simulate different types of ow (e.g., water, oil, and gas) that require resolution and memory
requirements. Therefore, optimizing them on recent platforms and for different application cases has
been searched intensively in the last 10 years. In their article Chip-level and Multi-node Analysis of
Energy-optimized Lattice-Boltzmann CFD Simulation, Wittmann et al. [24] analyze the behavior of
D3Q19 lattice-Boltzmann solvers on modern HPC systems. They rst present chip-level models for
both performance and energy consumption. In the article, authors also analyze the performance aspects
on Message Passing Interface-parallel runs showing that their chip-level models are effective tools to
identify optimal operating points for large-scale simulations. They highlight the importance of single-
core performance and the choice of number of cores used per chip to guide energy optimization.

3. CONCLUSIONS

The articles presented in this special issue provide recent advances in some elds related to High
Performance Computing and simulations. In particular, the manuscripts undertake research on
different topics including HPC, distributed/P2P systems, data mining, GPU/multicore systems as
well as real-world simulations related to CFD, neuroinformatics, bioinformatics, and weather
forecasting performed on large computational infrastructures. We hope that the readers can benet
from the perspectives presented in this special issue and will contribute to these strategically
important, exciting, and fast-growing research areas.

ACKNOWLEDGEMENTS
The guest editors of this special issue wish to express their sincere gratitude to all of the authors who sub-
mitted their papers to this special issue. We are also grateful to the Reviewing Committee for the hard work
and the feedback provided to the authors. As guest editors of this special issue, we also wish to express our
gratitude to the Editor-in-Chief Geoffrey C. Fox for the opportunity to edit this special issue, his assistance
during the special issue preparation, and for giving the authors the opportunity to present their work in the
international journal of Concurrency and Computation: Practice and Experience. Lastly, we wish to thank
the Journals staff for their assistance and suggestions.
We acknowledge the following Reviewing Committee members: Chaker El Amrani (Morocco), Andres
Avila (Chile), Carlos Berderian (Argentina), Massimo Cafaro (Italy), Ron Chiang (USA), Antonio Cono
(Spain), Alessandro DAnca (Italy), Minh Ngoc Dinh (Australia), Laurent dOrazio (France), Anders
Eklund (Sweden), Francoise Baude (France), Jaafar Gaber (France), Frederic Gava (France), Ivan Gonzalez
(Spain), William Gropp (USA), Bilel Hadri (USA), Miaoqing Huang (USA), Atman Jbari (Morocco), Chao
Jin (Australia), William Johnston (USA), Abdullah Kayi (USA), Harald Koestler (Germany), Harald Kosch
(Germany), Dieter Kranzlmueller (Germany), Erwin Laure (Sweden), Sergio Lopez (Spain), Nouredine
Melab (France), Mariofanna Milanova (USA), Maria Mirto (Italy), Vikram Narayana (USA), Dinh, Minh
Ngoc (Australia), Christian Obrecht (France), Amanda Peters Randles (USA), Volkmar Schau (Germany),
Olivier Serres (USA), Suboh Suboh (USA), Xiaoping Sun (China), Osamu Tatebe (Japan), Christian Trefftz
(USA), Ventzeslav Valev (Bulgaria), Timothy J. Williams (USA), Ramin Yahyapour (Germany), Chao-
Tung Yang (Taiwan), Mostapha Zbakh (Morocco), Yong Zhao (USA).

REFERENCES
1. Vetter JS (ed.). Contemporary High Performance Computing: From Petascale toward Exascale. CRC Press: Boca
Raton, Florida, 2015.

Copyright 2016 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2016; 28:20242030
DOI: 10.1002/cpe
2030 EDITORIAL

2. Limet S, Smari WW, Spalazzi L. High performance computing: to boldly go where no human has gone before. In
Concurrency and Computation: Practice and Experience, Vol. 27. John Wiley & Sons, Ltd., 2015; 31453165.
DOI:10.1002/cpe.
3. Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. High Performance Computing for Com-
putational Science VECPAR 2010, JMLM Palma et al. (eds): LNCS 6449, Springer-Verlag, pp. 125, 2011.
4. Reed DA, Dongarra J. Exascale computing and big data. Communications of the ACM, July 2015; ACM, 58(7): 5668.
5. Top 500, http://www.top500.org/lists/2015/06/.
6. Geist A, Lucas R. Major computer science challenges at exascale. Inter. Journal of High Performance Computing
Applications, vol. 23, no.4, ACM Press, 2009, pp. 427436.
7. Smari WW (ed.). Proceedings of the 2013 international conference on high performance computing and simulation (HPCS
2013), July 0105, 2013, Helsinki, Finland, ISBN: 978-1-4799-0836-3, CD ISBN: 978-1-4799-0837-0, IEEE, USA, July
2013. Available from IEEE Digital Library at http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6619547.
8. Che S, Li J, Sheaffer JW, Skadron K, and Lach J. Accelerating compute-intensive applications with GPUs and
FPGAs, Proceedings of the 2008 Symposium on Application Specic Processors (SASP;08), IEEE Computer Soci-
ety Washington, DC, USA, 2008, page 101107.
9. Werner S, Heinrich D, Stelzner M, Linnemann V, Pionteck T, Groppe S. Accelerated Join evaluation in Semantic
Web databases by using FPGAs. Concurrency and Computation: Practice and Experience 2016; 28(7):20312051.
10. Groppe J, Groppe S, Schleifer A, Linnemann V. LuposDate: a semantic web database system, CIKM09, November
26, 2009, Hong Kong, China.
11. Michalakes J, Dudhia J, Gill D, Henderson T, Klemp J, Skamarock W, Wang W, 2004: "The weather research and
forecast model: software architecture and performance," proce. of the 11th ECMWF Workshop on the Use of High
Performance Computing In Meteorology, 2529 October 2004, Reading, U.K. Ed. George Mozdzynski.
12. Vanderbauwhede W, Takemi T. An analysis of the feasibility and benets of GPU/multicore acceleration of the Weather
Research and Forecasting model. Concurrency and Computation: Practice and Experience 2016; 28(7):20522072.
13. Jaziri F, Peyretaillade E, Peyret P, Hill DRC. High performance computing of oligopeptides complete backtranslation ap-
plied to DNA microarray probe design. Concurrency and Computation: Practice and Experience 2016; 28(7):20732091.
14. Sinkarovs A, Scholz S-B. Type-driven data layouts for improved vectorisation. Concurrency and Computation:
Practice and Experience 2016; 28(7):20922119.
15. Coullon H, Le M-H, Limet S. The SIPSim implicit parallelism model and the SkelGIS library. Concurrency and
Computation: Practice and Experience 2016; 28(7):21202144.
16. Dongarra J et al. DARPAs HPCS Program: History, Models, Tools. Languages, 2008, 94 pages. Available at http://
www.sdsc.edu/pmac/papers/docs/dongarra2008darpa.pdf
17. Heinecke A, Karlstetter R, Puger D, Bungartz H-J. Data mining on vast datasets as a cluster system benchmark.
Concurrency and Computation: Practice and Experience 2016; 28(7):21452165.
18. Skowron P and Rzadca K. Flexible replica placement for optimized P2P backup on heterogeneous, unreliable ma-
chines. Concurrency and Computation: Practice and Experience 2016; 28(7):21662186.
19. Mencagli G. Adaptive model predictive control of autonomic distributed parallel computations with variable hori-
zons and switching costs. Concurrency and Computation: Practice and Experience 2016; 28(7):21872212.
20. Salman A, Malony A, Turovets S, Volkov V, Ozog D, Tucker D. Concurrency in electrical neuroinformatics: parallel
computation for studying the volume conduction of brain electrical elds in human head tissues. Concurrency and
Computation: Practice and Experience 2016; 28(7):22132236.
21. Karimi B, Krzyzak A. A novel technique for detecting suspicious lesions in breast ultrasound images. Concurrency
and Computation: Practice and Experience 2016; 28(7):22372260.
22. Chen G, Bui TD, Krzyzak A. Sparse support vector machine for pattern recognition. Concurrency and Computation:
Practice and Experience 2016; 28(7):22612273.
23. Codreanu V, Droge B, Williams D, Yasar B, Yang P, Liu B, Dong F, Surinta O, Schomaker LRB, Roerdink JBTM,
Wiering MA. Evaluating automatically parallelized versions of the support vector machine. Concurrency and Com-
putation: Practice and Experience 2016; 28(7):22742294.
24. Wittmann M, Hager G, Zeiser T, Treibig J, Wellein G. Chip-level and multi-node analysis of energy-optimized lat-
tice Boltzmann CFD simulations. Concurrency and Computation: Practice and Experience 2016; 28(7):22952315.

WALEED W. SMARI
Ball Aerospace & Technologies Corp., Fairborn, OH, USA
E-mail: smari@arys.org
MOHAMED BAKHOUYA
International University of Rabat, Parc Technopolis, 11 100 Sala el Jadida, Morocco
SANDRO FIORE
Euro-Mediterranean Center on Climate Change, Lecce, Italy
GIOVANNI ALOISIO
Euro-Mediterranean Center on Climate Change, University of Salento, Lecce, Italy

Copyright 2016 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2016; 28:20242030
DOI: 10.1002/cpe

You might also like