Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Computer Physics Communications ( ) –

Contents lists available at ScienceDirect

Computer Physics Communications


journal homepage: www.elsevier.com/locate/cpc

OFF, Open source Finite volume Fluid dynamics code: A free,


high-order solver based on parallel, modular, object-oriented
Fortran API✩
S. Zaghi 1
CNR-INSEAN, Istituto Nazionale per Studi ed Esperienze di Architettura Navale, Via di Vallerano 139, Rome, 00128, Italy

article info abstract


Article history: OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The
Received 25 February 2013 aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier–Stokes equations of
Received in revised form fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-
3 April 2014
order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a
Accepted 5 April 2014
Available online xxxx
highly modular, object-oriented application program interface (API) has been developed. In particular,
the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003)
Keywords:
have been stressed in order to represent each fluid dynamics ‘‘entity’’ (e.g. the conservative variables of a
CFD finite volume, its geometry, etc. . . ) by a single object so that a large variety of computational libraries can
Finite volume scheme be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as
Riemann’s Problem solver follows:
WENO
Programming Language OFF is written in standard (compliant) Fortran 2003; its design is highly modular
OOP Fortran
in order to enhance simplicity of use and maintenance without compromising the efficiency;
MPI
OpenMP Parallel Frameworks Supported the development of OFF has been also targeted to maximize the
computational efficiency: the code is designed to run on shared-memory multi-cores workstations and
distributed-memory clusters of shared-memory nodes (supercomputers); the code’s parallelization is
based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms;
Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhance-
ment of the code also the documentation has been carefully taken into account; the documentation is
built upon comprehensive comments placed directly into the source files (no external documentation
files needed): these comments are parsed by means of doxygen free software producing high quality html
and latex documentation pages; the distributed versioning system referred as git has been adopted in
order to facilitate the collaborative maintenance and improvement of the code;
Copyrights OFF is a free software that anyone can use, copy, distribute, study, change and improve under
the GNU Public License version 3.
The present paper is a manifesto of OFF code and presents the currently implemented features and
ongoing developments. This work is focused on the computational techniques adopted and a detailed
description of the main API characteristics is reported. OFF capabilities are demonstrated by means of
one and two dimensional examples and a three dimensional real application.

Program summary

Program title: OFF


Catalogue identifier: AESV_v1_0
Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AESV_v1_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland

✩ This paper and its associated computer program are available via the Computer Physics Communication homepage on ScienceDirect (http://www.sciencedirect.com/
science/journal/00104655).
E-mail addresses: stefano.zaghi@gmail.com, stefano.zaghi@cnr.it.
1 Ph.D., Aerospace Engineer, Research Scientist at Department of Computational Hydrodynamics at CNR-INSEAN.

http://dx.doi.org/10.1016/j.cpc.2014.04.005
0010-4655/© 2014 Elsevier B.V. All rights reserved.
2 S. Zaghi / Computer Physics Communications ( ) –

Licensing provisions: GNU General Public Licence, version 3


No. of lines in distributed program, including test data, etc.: 60466
No. of bytes in distributed program, including test data, etc.: 595575
Distribution format: tar.gz
Programming language: Fortran (standard 2003 or newer); developed and tested with Intel Fortran
Compiler v. 12.x or newer.
Computer: Designed for shared-memory multi-cores workstations and for hybrid distributed/shared-
memory supercomputers, but any computer system with a Fortran (2003+) compiler is suited.
Operating system: Designed for POSIX architecture and tested on GNU/Linux one.
Has the code been vectorized or parallelized?: Hybrid parallelization by means of MPI library and OpenMP
paradigm, tested on up to 256 processors.
RAM: [1 MB; 1 GB] x core, simulation-dependent
Classification: 4.3, 4.10, 12.
External routines: The proprietary library [1] must be linked for producing binary outputs in Tecplot Inc.
format; the MPI library [2] must be linked for running on distribute-memory parallel systems
Nature of problem:
Numerical solution of the Compressible Navier–Stokes equations for multi-fluids multi-phase flows in
complex geometries
Solution method:
Fully-conservative Finite Volume scheme based on very high-order WENO Positivity-Preserving
reconstruction technique and Strong Stability Preserving high-order Runge–Kutta time integration.
Structured multi-block general curvilinear and body-fitted grids constitute the underlining numerical
grids. MPI and OpenMP paradigms are used for parallel computations. Pre-processing tool to deal
with commercial meshing softwares is provided as well as post-processing one for numerical results
visualization
Restrictions:
At present, OFF is validated for simulating inviscid and single-phase flows (viscous fluxes computation and
fully coupled Eulerian /Lagrangian schemes have still to be validated, but they are already implemented);
modern Fortran compiler is mandatory
Unusual features:
OFF is a complex CFD software strongly based on Object-Oriented Programming paradigm by means of
modern Fortran standard (2003 or higher)
Additional comments:
OFF project adopts Git [3], a free and open source distributed version control system. A public repository
dedicated to OFF project [4] has been created on github, a web-based hosting service for software
development projects using git versioning system. Finally, a comprehensive documentation [5] is
provided parsing source code comments by means of doxygen software [6]
Running time:
The running time depends on the available computation resources, the complexity of the problem and
the selected accuracy of the numerical scheme. Simple one dimensional problems require few minutes on
personal workstation-like systems whereas complex three dimensional problems needing high accuracy
(e.g. DNS) could require weeks on supercomputers. OFF has proven to have a good scalability on small
clusters (e.g. CASPUR facilities, namely on Matrix, a GNU/Linux cluster composed by 320 nodes each one
constituted by a dual Opteron quadcore at 2.1GHz with 16/32 GB of RAM)

References:

[1] TecIO Library, Tecplot Inc. proprietary library for I/O binary files in Tecplot format, http://www.
tecplot.com/downloads/tecio-library/.
[2] The Message Passing Interface (MPI) standard, a library specification for message-passing, proposed
as a standard by a broadly based committee of vendors, implementors, and users, http://www.mcs.
anl.gov/research/projects/mpi/.
[3] Git, a free and open source distributed version control system, http://git-scm.com/.
[4] Github, a web-based hosting service for software development projects using git versioning system,
https://github.com.
[5] Ocial OFF documentation, http://szaghi.github.com/OFF/index.html.
[6] Doxygen, a documentation system for many programming languages, http://www.stack.nl/dimitri/
doxygen.
© 2014 Elsevier B.V. All rights reserved.
S. Zaghi / Computer Physics Communications ( ) – 3

1. Introduction

1.1. Background

In fluid dynamics research area there were historically two main approaches, the experimental and theoretical ones. Nevertheless,
the high availability of computational resources (e.g. the increased accessibility of supercomputers) and the development of new,
accurate numerical schemes have driven a third important approach, the Computational Fluid Dynamics (CFD), see Anderson [1]. From
the pioneering work of von Neumann [2], CFD is presently an essential tool for understanding complex physical phenomena and for
predicting complex scenario, e.g. see [3,4]. As a matter of fact, modern scientific methods of investigation are based on a complementary
experimental/theoretical/numerical approach [5] and, in many cases, the numerical one is the only practicable. This is common when
experimental tests are too expensive and the theory is too complex to obtain practical results. CFD is at present a fundamental research
and design tool. Historically, the diffusion of CFD was pushed by aerospace/aeronautics applications, but it is currently used in a large
variety of different applications ranging from hydrodynamics to meteorology. As an example, the automotive industry is increasingly
investing in CFD as a fundamental design tool for improving the performance of engines, see [6], reducing the emission of pollutants, see [7],
and optimizing aerodynamics, see [8]. Moreover, the development of the next generation power plants based on renewable energies is
progressively relying on CFD approach, e.g. for the optimization of large wind farms, see [9].
The increasing use of CFD approaches has promoted the development of a large variety of CFD codes. It is possible to distinguish the
proprietary, closed software from the free one. OFF project belongs to free software. The free software paradigm permits to anyone to
inspect, study and improve the code, thus encouraging the spreading of the scientific knowledge between researchers. Furthermore, free
softwares have often higher quality than commercial ones especially for scientific applications. On the contrary, free research codes are
generally very difficult to use and not well documented.

1.2. Related works

A thorough description of the available free CFD codes is out of the scope of the present paper, but a good starting point is the CFD-online
database that is reachable at http://www.cfd-online.com/Wiki/Codes. Five interesting projects are mentioned in the following.
OpenFOAM. The state of the art of the free, open source CFD software can be recognized in OpenFOAM,2 see Weller et al. [10]: it is an object-
oriented programming (OOP) C++ library for CFD numerical simulations providing second-order accurate finite volume discretization,
second-order discretization in time, efficient linear system solvers and support for parallel computing. OpenFOAM development started
in the late 1980s with the aim to overcome the limitations of CFD codes at the time. Those softwares were written in Fortran programming
language, a de facto standard in computer science, that was not so flexible as C++ programming language. In those years, C++ was at its
outset: it introduced a modern OOP language combining the flexibility of the classes abstraction with the computational efficiency of the
static-typed compilation. The aim of the authors of OpenFOAM (Prof. H. Jasak and Dr. H. Weller are currently the main developers of the
project) was to stress the OOP paradigm to create data types that mimic, as much as possible, those of continuum mechanics in order to
facilitate the writing of mathematical formula into operative computer programs.
Gerris. Gerris was developed from scratch by Popinet [11]. Gerris is a free software program3 aimed at the numerical solution of the time-
dependent incompressible variable-density Euler, Stokes, Navier–Stokes or linear and nonlinear shallow-water equations. Its algorithms
are second-order in space and time and its main feature is the implementation of an Adaptive Mesh Refinement (AMR) method conjugated
with an embedded boundary technique. It is written in C and can run on distributed memory cluster by means of MPI library.
PETSc-FEM. PETSc-FEM4 is a Finite Element Method (FEM, oriented to unstructured meshes) program for CFD and it is distributed as free
software. It is based on PETSc5 and it is both a library allowing the user to develop their own FEM programs over PETSc-FEM and a suite
of application programs. The underlying PETSc library has been demonstrated a great scalability, see [12], thus PETSc-FEM should have
also high parallel performance. Indeed, it has been currently tested only on Beowulf clusters. It is written in the C++ language with an OOP
paradigm.
Clawpack. The development of Clawpack6 was started (and is currently driven) by Randall J. LeVeque, Professor of Applied Mathematics
at University of Washington who has a relevant role in the modern evolution of CFD, see [13–15]. Clawpack (meaning Conservation Laws
Package) is an open source project for solving hyperbolic partial differential equations (systems of conservation laws) using the finite
volume method, in particular by means of high-resolution Godunov-like methods. Some recent efforts extend Clawpack to parabolic
systems. Clawpack is written in Fortran (fixed form) using a procedural programming paradigm.
Racoon II. Racoon II,7 standing for Refined Adaptive Computations with Object-Oriented Numerics, is a framework written in C++ for
numerically solving time-dependent phenomena by means of AMR structured grids, see [16]. The mesh-adaptive method is used to solve
systems of hyperbolic conservation laws as the compressible Euler equations for gas dynamics or Magneto-Hydrodynamics (MHD). Finite-
differences as well as shock-capturing central schemes can be used while Runge–Kutta type integrators are used for time integration.
The design and implementation of Racoon II relies on OOP paradigm. Presently, only a full-featured two-dimensional implementation is
available and is applied to problems in solar plasma physics.

2 OpenFOAM is obtainable from http://www.openfoam.com/. It is worth noting that at present OpenFOAM is also maintained by means of git versioning system and it has
an official github repository https://github.com/OpenFOAM.
3 Gerris is obtainable from http://gfs.sourceforge.net/wiki/index.php/Main_Page.
4 PETSc-FE:tM is obtainable from http://www.cimec.org.ar/twiki/bin/view/Cimec/PETScFEM.
5 PETSc is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations.
6 Clawpack is obtainable from http://depts.washington.edu/clawpack/.
7 Racoon II is obtainable from http://www.tp1.ruhr-uni-bochum.de/~jd/racoon/.
4 S. Zaghi / Computer Physics Communications ( ) –

1.3. OFF: motivations and aims

All the aforementioned free softwares (and many others omitted) are valuable projects. Nevertheless, neither of those have the follow-
ing features all together:
• it is written in modern Fortran (standard 2003 or newer);
• it is written by means of OOP paradigm;
• it has the capability to run on High Performance Computing (HPC) architectures based on hybrid shared/distributed memory clusters;
• it is well documented;
• it allows easy maintenance and enhancement into a collaborative framework.
OFF, meaning Open Finite volume Fluid dynamics code, has been developed with the aim to satisfy the above specifications. The
present paper is its first comprehensive presentation. In the following Sections 1.3.1–1.3.4 a brief explanation of the above specifications
is reported.

1.3.1. Modern Fortran standard and OOP


Presently, the old limitations of Fortran (standard 77) have been (partially) overtaken by Fortran itself (standard 2003 or newer),
see [17]. OFF has been designed to take advantage of the main Fortran capabilities:
Maturity Fortran is the de facto standard in scientific programming, being one of the oldest programming language [18]; as a consequence,
many Fortran codes (tested and strongly optimized) for solving a wide range of different problems are freely available;
Simplicity Fortran has been designed with only one aim: it is a Formula Translating System thus the implementation of scientific algorithms
is greatly simplified with respect to other, general purpose languages;
Efficiency Fortran has been designed with performance in mind (limiting the flexibility) and a lot a strongly optimized compilers are
available even for massively parallel architectures;
OOP modern Fortran standards (2003/2008) allow (partially) OOP paradigm (dynamic dispatch, encapsulation, polymorphism, inheritance
and recursion), providing an easy method to maintain and improve the code.
The choice of Fortran clearly identifies the kind of OFF users (and developers): OFF is suited to scientific researchers that are interested (and
skilled) in the physical aspects of the numerical simulations rather than computer technicians. As a matter of fact, in many applications of
computer science, such as CFD, the problem of translating mathematical and numerical models into an operative software is crucial and
has strong impact on the global performances (meaning a compromise between accuracy, efficiency, simplicity, etc. . . ). Modern standards
of Fortran programming language, being more specialized (and inflexible) than a general purpose language like C++, allow the effective
(and fast) development of efficient software based on complex scientific algorithms. Indeed, the interoperability between Fortran and C is
currently more easy8 and their application fields are less distinguished. Nonetheless, a pure Fortran code is simpler than a mixed one for
a non computer technician.
Regarding Fortran OOP capabilities and its maturity, an important remark must be highlighted: it is well known that Fortran has strongly
optimized compilers and is a de facto standard in HPC applications (due to its maturity), however OOP paradigm has been introduced only
in recent years (from standard 2003/2008, actually there is no compiler that is fully compliant with these standards). One of the main
concern with the Fortran OOP implementation that has been emerged during OFF development is the overloaded operators performance:
at least for the compilers tested (GNU gfortran 4.7.x and Intel Fortran Compiler 12.x) the non-overloaded version of OFF is remarkably
faster than the overloaded one. Because the performance has driven OFF design, the choice of Fortran could seem contradictory: it has
been chosen for its maturity and performance, whereas its OOP implementation is immature. Indeed, this is the result of a compromise:
Fortran maximizes the performance/easiness ratio when operators overloading is avoided (while allowing other OOP paradigm features)
and this finally drives the programming language choice.

1.3.2. HPC capabilities


At present, CFD approach is applied to very complex problem. As an example, in the aerospace research field it is common to perform
heavy numerical simulations with millions to hundreds of millions of finite volumes: a parallel software is mandatory. As a consequence,
OFF project has been driven by the performance optimization.
Many parallel architectures are currently available. The Graphics Processing Unit (GPU) based clusters are very promising, but they
still lack of maturity. Therefore, OFF is tailored for Central Processing Unit (CPU) based clusters. Among these, two main architectures
are available: shared memory and distributed one. OFF can run over both these architectures (and in mixed one also). The Open
Multiprocessing9 (OpenMP) paradigm is used to run on shared memory multi-cores architectures while the Message Passing Interface10
(MPI) library is used for distributed memory clusters. Heavy simulations computed by means of OFF are commonly performed on clusters
(distributed memory) of nodes in which each node is constituted by a multi-cores (shared-memory) machine. Currently the scalability
of OFF has been proven only on small clusters (up to 64 nodes with 8 cores per node), but scalability tests on huge clusters are already
scheduled.

8 With the Fortran standard 2003 [17] the interoperability with C has been greatly improved by the introduction of the intrinsic module ISO_C_BINDING that provides
named constants, types and procedures useful in a mixed-language framework.
9 The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran; it is a portable, scalable model that
gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer,
see http://openmp.org/wp.
10 MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. MPI was designed for high
performance on both massively parallel machines and on workstation clusters, see http://www.mcs.anl.gov/research/projects/mpi.
S. Zaghi / Computer Physics Communications ( ) – 5

Fig. 1. Example of OFF html documentation: description of the Runge–Kutta module highlighting the integrated latex formula.

1.3.3. Documentation
Free, open source softwares have often poor documentation. This lack compromises the diffusion of this kind of codes and makes their
usage very difficult. OFF has also a didactic purpose, thus its easiness is taken into account. OFF documentation is based on comprehensive
comments placed directly into the source code files without the necessity of other external files. This choice keeps the tree of the project
clean and lightweight. These documentation comments are parsed by means of doxygen11 free software producing high quality html and
latex documentation pages. In particular, the in-code comments allow keeping together the theory (latex-style formula are allowed inside
comments) and practice greatly simplifying the code understanding, as an example see Fig. 1.

1.3.4. Collaborative framework


Nowadays, one of the keys for the success of a free software is its capability to be easily maintained and improved. Hence, the possibility
of a collaborative development of OFF is also considered. Git 12 has been adopted as distributed versioning system and a public repository
dedicated to OFF project has been created on github.13 Git facilitates the tracking of each modification while by means of the github
repository a worldwide collaboration is possible (and effortless).

1.4. OFF: general description

OFF is a standalone program (rather than a library package, as OpenFOAM or PETSc that are more flexible) and it has been initially
developed for aerospace applications (in particular for Solid Rocket Motors simulations, see Section 5.3) where the compressibility effects
are relevant.14 It is devoted to the numerical solution of the compressible Navier–Stokes equations system (that is a Partial Differential
Equations system, PDEs, see Section 2), but it is worth to note that the OOP paradigm adopted can greatly simplify the extensions of the
resolved PDE from the Navier–Stokes system to a different one. The core of the OFF flow solver is based on WENO-like schemes providing
a formal order of space integration up to 7th in conjunction with Runge–Kutta-like schemes providing a formal order of time integration
up to 4th. It can simulate multi-fluids flows, whereas a multi-phase scheme has been already implemented, but not yet validated.

11 Doxygen is a documentation system for many programming languages (among which there is the Fortran one) freely available at http://www.stack.nl/~dimitri/doxygen.
12 Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency, see http://git-
scm.com/.
13 Github is a web-based hosting service for software development projects using git versioning system and it is free for open source projects, see https://github.com.
14 Due to its modularity, OFF can be easily modified to simulate also incompressible flows extending the application field to hydrodynamics problems.
6 S. Zaghi / Computer Physics Communications ( ) –

Fig. 2. UML diagram of a typical work-flow based on OFF project.

An important design target of OFF is the capability to simulate flows in complex geometry, e.g. see [19–21]. OFF is currently based on
a general curvilinear representation of the space: the space integration is done in a truly finite volume integral approach allowing a great
flexibility on the geometry simulated. The underlying mesh representation is founded on structured body-fitted multi-blocks of hexahe-
drons: this choice allows higher quality meshes than unstructured ones, but prevents automatic mesh generation.15 Regarding the mesh,
in order to accelerate the setups of the simulations (i.e. building the numerical grids with initial and boundary conditions) a pre-processing
tool, referred as IBM (standing for Initial and Boundary conditions, Mesh generator) has been developed within OFF. IBM can process grids
made up by:
• Ansys ICEM CFD: it is a commercial mesh generator able to deal directly with CAD geometry;
• Pointwise Gridgen: it is another commercial mesh generator also able to deal directly with CAD geometry;
• files containing description of blocks: the initial and boundary descriptions as well as the geometry are directly described by means of
simple ascii files (this input is suited only for Cartesian geometries).
IBM is distributed within OFF sharing the most part of its source files.
The post-processing phase of the simulations work-flow has also been taken into account. A post-processing code, referred as POG
(standing for Post-processing Output Generator), has been developed. POG processes the output of OFF generating three different outputs
(ready to be visualized and analyzed) according to the user’s options:
• Tecplot, Inc. output: Tecplot is a wide-used (commercial) visualization tool; POG can produce both ascii and binary files in Tecplot
standard;
• VTK output: the Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image
processing and visualization; a lot of visualization tools support VTK standard and among those tools Paraview seems to be one of
the most complete; POG can produce both ascii and binary files in VTK standard by means of Lib_VTK_IO16 another free, open-source
Fortran code developed by the authors;
• GNUplot ASCII output: GNUplot is a free and portable command-line driven graphing utility; it is well suited for visualization of small
grids.
Fig. 2 shows the Unified Modeling Language (UML) diagram of a typical work-flow based on OFF codes.
At present, many developments are being implemented into OFF : the diffusive fluxes computation is under validation analysis within a
Large Eddy Simulations (LES) model; similarly, a Lagrangian multi-phase scheme for simulating dispersed particles into mixtures of gases
is under validation; a cell-based AMR scheme is being implemented: a new hierarchical data structure (complementary to the multi-blocks
one) has been already implemented by means of hash table, whereas the modification of the fluid dynamics methods implementation is
not yet completed; an overlapping grids technique has been implemented (but not yet tested) for taking into account the relative motions
of bodies.

1.4.1. Copyrights
OFF is a free software. The authors encourage anyone to use, copy, distribute, study, change and improve the codes. OFF is distributed
under the GNU Public License version 3. It is available at https://github.com/szaghi/OFF while the documentation can be found at
http://szaghi.github.com/OFF/index.html. Researchers are kindly requested to cite the present paper when publishing results obtained
by means of OFF.
The remainder of the present paper is organized as follows: a brief description of the currently implemented mathematical and
numerical models is presented in Section 2, the API of OFF is described in Section 3, an example of OFF API extension is described in
Section 4, some test cases are reported in Section 5, the most relevant ongoing development activities are presented in Section 6 and some
concluding remarks are summarized in Section 7.

2. Mathematical and numerical models

2.1. Mathematical models

The turbulent motion of compressible viscous fluid is described by the Navier–Stokes equations, based on the original theory of
Navier [22] and Stokes [23], see [24,25], that read, in integral, non dimensional form:

  

→  

→ −

U dV + Fc + Fd · n dS = Q dV (1)
∂t V S (V ) V


V being the control volume, S (V ) its boundary, n
⃗ the outward unit normal of S (V ) and where U is the vector of conservative variables, Fc


is the convective part of the fluxes tensor, Fd is the diffusive part of the fluxes tensor and Q is the source terms vector. For fluid subjected

15 There are two ongoing activities related to the numerical grids, the first devoted to introduce Adaptive Mesh Refinement (AMR), see Section 6.3, and the second to extend
the integration to moving grids, see Section 6.4.
16 Lib_VTK_IO is a Fortran library to write and read data conforming the VTK standard; it is free available at https://github.com/szaghi/Lib_VTK_IO.
S. Zaghi / Computer Physics Communications ( ) – 7

to external forces and heat sources the above vectors and tensors could be written as:

ρ⃗v 0
   


ρ τ ρ fe
     

→ ρ⃗v v⃗ + pI − −
→  
U = ρ⃗
v .
 
Fc + Fd =  Q = (2)
 2

Re Fr
 
ρE −
→  −
 ρ→

τ ·−


v k∇ T fe · v
⃗ q0 L0
  
(ρ E + p) v⃗ − − + ρ qh
Re PrRe Fr 2 E0 v0

→ −

The conservative vector U , the fluxes tensor F = Fc + Fd and the sources vector Q are non dimensional quantities where ρ is the density,
v⃗ is the velocity vector, E is the specific, total energy, T is the absolute temperature, p is the static, isotropic pressure (I being the identity

→ −

tensor), τ is the viscous shear stress tensor, k is the thermal conductivity coefficient (−k ∇ T being the Fourier’s law of heat conduction), fe
is the external specific volume forces vector and qh is specific heat sources other than conduction. The non dimensional form is obtained
ρ0 v0 L0 µ0 cp0 v02
through the non dimensional numbers of Reynolds, Prandtl and Froude, namely Re = µ0
, Pr = k0
and Fr = f0 L0
, where the
quantities noted with the subscript ‘‘0’’ are the dimensional reference values.
The viscous shear stress tensor, in general case, is defined as:
−
→ 
τij = µ ∂i vj + ∂j vi + λ ∇ · v⃗ δij
 
(3)

0, if i ̸= j

where µ is the dynamic viscosity of the fluid, λ is the second viscosity and δij is the Kronecker delta δij = 1, if i = j
. For a Newtonian fluid
the dynamic viscosity and the second viscosity are related λ = − µ. This is valid in local thermodynamic equilibrium (except very high
2
3
temperature or pressure ranges). This is the Stokes hypothesis.
The above system of partial differential equations must be completed by the constitutive equation of state. For a perfect gas,
i.e. thermally and calorically perfect fluid, the equation of state is p = ρ RT where R is the fluid constant. This constant is related to
c γR
the specific heats at constant volume and constant pressure (and their ratio) γ = cp , R = cp − cv , cp = γ −1 , cv = γ − R
. The other
1v
|⃗v |2 |⃗v |2 γp

p
equations of state are: E = cv T + 2
= ρ(γ −1)
+ 2
is the total specific energy definition, a = ρ
is the speed of sound (acoustic
velocity).
The system (1) is made by mixed hyperbolic (convective terms) and parabolic (diffusive terms) partial differential equations (PDE) of
second order. Due to the integral nature of the system (1), it admits also discontinuous solutions (e.g. shock, contact discontinuity, etc. . . ).
One of the OFF specifications is to simulate flows with a mixture of different fluids having different physical features. To this purpose
OFF currently adopts the so called mass fraction model,17 see [26]. Each sth fluid species of the mixture has its own specific heats at constant
volume and constant pressure (and their ratio), cp,s , cv,s and γs = cp,s , respectively. Consequently, each species has its specific internal
c
v,s
energy
and pressure: es = cv,s T and ps = (γs − 1) ρs e
s , ρs being the single species density. Applying Dalton’s [27] law of partial pressures,
ρ
p= ps yields ρ e = ρs es = ρ cv T where cv = Ys cv,s , Ys = ρs being the mass fraction of sth species. Basing on the mass fraction
model the equation of mass conservation is substituted by NS (number of species composing the mixture) equations of species mass
conservation:

 
ρs dV + ρs −

v · n⃗dS = 0 s = 1, . . . , NS . (4)
∂t V S (V )

After Eq. (4) has been resolved the properties of the mixture can be computed allowing the solution of the conservation equations of
momentum and energy.

2.2. Numerical models

As stated before, the system (1) is constituted by PDEs admitting also discontinuous solutions. Many approaches are possible and a
concise classification can distinguish between:
Shock fitting methods the governing equations are integrated in divergence form and any discontinuity is governed by its own algebraic
equations requiring a special treatment;
Shock capturing methods the governing equations are integrated in conservation form and any discontinuity is computed as part of the
solution without any special treatment.
The shock fitting approach has a physical foundation, meaning that different flow solutions are computed through different formula
accordingly to their physical nature (continuous or discontinuous), see [28,29]. The shock fitting approach is generally very complex to
implement especially for a three dimensional problem. On the contrary, shock capturing approach has a mathematical foundation on the
proven legitimacy of weak solution of the integral conservative form of the PDEs system. Shock capturing schemes are conceptually simple:
the weak solution is computed in the whole domain by means of the same operations without special logics for tracking discontinuities.
This greatly simplify the computation of discontinuous solutions even in complex geometries. The most relevant drawback of shock
capturing approach, at least as the first order scheme as originally formulated by Godunov [30] is concerned, is the high numerical diffusion

17 It is well known that the mass fraction model can fail in some circumstances, therefore different models are planned to be implemented in the next future.
8 S. Zaghi / Computer Physics Communications ( ) –

(smearing discontinuities) that is due to the truncation error acting as an artificial viscosity. Many approaches have been proposed to
increase the order of accuracy of shock capturing schemes: the PPM method of Colella and Woodward [31], the MUSCL scheme of van
Leer [32], the ENO approach of Harten et al. [33] and the more recent WENO of Liu et al. [34]. OFF is based on a shock capturing scheme:
it is a finite volume Godunov-like scheme in which the conservative variables are collocated at the cell-center and their volume-averaged
values are integrated in conservative form. OFF implements modern WENO scheme that are proven to preserve the positivity of pressure
and density also for very challenging flow conditions. In this section the most relevant numerical schemes building up OFF are briefly
described.

2.2.1. Finite volume scheme


The fluid domain D is decomposed in Nb structured blocks Db , each subdivided in Ni × Nj × Nk disjoint hexahedrons Dbijk such that
Dbijk = Db . Conservation laws are then applied to each finite volume:

∂ 6  
 

→  

→ −

U dV + Fc + Fd · n dS = Q dV (5)
∂t Vijk s=1 Ss Vijk

where Ss is the sth face of the finite volume Dijk whose measure is Vijk . In order to resolve System (5) an approximation of volume and
surface quantities must be provided and then the semi-discrete system can be integrated in time by means of a Runge–Kutta scheme, see
Section 2.2.4.
An approximation of the volume (cell) quantities is quite straightforward to obtain. Following the shock-capturing finite volume


approach, cell quantities are collocated at cell-center and are approximated by some reconstruction. Assuming we know the solution U

→ 1
 −

in the whole domain for a time t, the idea is to average this solution over the finite volume support U ijk = Vijk Vijk
U dV . The approximate


solution U ijk obtained from the reconstruction step is used to evaluate the surface fluxes in order to compute the space operator of System
(5). The simplest reconstruction is the one originally proposed by Godunov [30], in which a piecewise constant reconstruction is adopted
(leading to a formal first order space accuracy). Once the approximation of cell quantities is provided the interfaces fluxes are approximated
ensuring the conservative behavior of the integrated variables.
The algorithm for computing the approximation of interfaces fluxes must distinguish between diffusive and convective terms. The
diffusive fluxes can be easily computed, e.g. by means of a standard second order central difference scheme.
More attention must be taken for the computation of convective fluxes, due to their hyperbolic nature (wave-like equation). In
particular, the direction of waves propagation must be accurately captured. In a finite volume approach the computation of convective
fluxes reduces to the solution of a Riemann’s problem [35], at the faces surrounding the finite volume. Once the cell quantities
approximation is provided there are many schemes to resolve the Riemann’s problem at cell interfaces. In Section 2.2.3 some the methods
implemented in OFF are described.
The difficulty in computing convective fluxes grows up when a high order reconstruction is adopted. As a matter of fact, a reconstruction
of order higher than one (piecewise constant) leads to non physical oscillatory solution if no limiting device is adopted, i.e. the Gibbs’
phenomenon, see [36]. During the last decades many schemes have been proposed to perform oscillations-free reconstruction of
discontinuous solutions. Among the others, Weighted Essentially Non-Oscillatory (WENO) scheme is one of the most effective for the
reconstruction of the solution of hyperbolic equations. In the next subsection the WENO scheme implemented in OFF is described.

2.2.2. WENO-positivity-preserving scheme


As anticipated above, to obtain high order (resolution) finite volume Godunov-like scheme a possibility is to replace the piecewise
constant approximation of cell quantities with higher order reconstruction of interface values. To overcome the origination of non physical
oscillatory solutions (dispersion of Gibbs’ phenomenon) a non-linear algorithm is necessary. The idea is to use the dissipation of first
order reconstruction to limit the dispersion of high order ones. Many methods are possible. Among them the Essentially Non-Oscillatory
schemes, ENO, and their modifications are the most effective, see [33,37,34,38].
Weighted ENO, WENO, scheme is a modification of the ENO reconstruction scheme, see [34]. The key idea of WENO scheme is to use
a convex combination of all candidates stencils (instead of using only the smoothest one) for achieving high order reconstruction. The
first order piecewise constant approximation of interface values in each cell is replaced by means of a high order reconstruction based on
the information contained in S stencils of S − 1 finite volumes surrounding the interface. Let us suppose that the domain is discretized
by N finite volumes. Considering the ith cell (in a 1D framework) we are interested in computing the reconstruction of solution at its
interfaces i ± 12 . The candidate stencils are Sk = {i + k − (S − 1) , . . . , i + k} k = 0, . . . , S − 1. Over the stencil k the interfaces values
is approximated by the following linear reconstruction:
S −1
±1/2

Uklinear
,i±1/2 = ak,l Ui+k−l k = 0, . . . , S − 1. (6)
l =0
±1/2
The linear coefficients ak,l is reported in Table A.3a in the Appendix A. The convex non-linear combination of all S linear reconstructions
provides a 2S − 1 order scheme. Nonlinearity, necessary for the ENO property, is introduced by non-linear weights in computing the
convex combination. The weighted convex combination, the WENO reconstruction, is the following:
S −1
±1/2

UiWENO
±1/2 = ωk Uklinear
,i±1/2
k=0
S −1
 ±1/2 (7)
ωk =1
k=0
±1/2
ωk ≥ 0 ∀k ∈ {0, . . . , S − 1} .
S. Zaghi / Computer Physics Communications ( ) – 9

±1/2
The non-linear weights ωk must be able to adapt to the relative smoothness on each candidate stencil thus to any discontinuous stencil
is assigned a zero weight, see [38]. In the smooth region the weights distribution must be as close as possible to the optimal upwind
±1/2
weights Ck (see Table A.3b in Appendix A). Thus the non-linear weights can be defined as follows:
±1/2
±1/2 αk
ωk = −1
S
±1/2
αk
l=0 (8)
±1/2
±1/2 Ck
αk = k = 0, . . . , S − 1.
(ε + ISk )p
The quantity ISk is the Smoothness Indicator of the kth stencil: it introduces the necessary non-linearity. Following Jiang and Shu [38] it is
computed as:
S −1 
 l
ISk = σk,l,m Ui+k−l Ui+k−m (9)
l=0 m=0

where the coefficients σk,l,m are reported in Table A.4 in Appendix A.


The WENO scheme as above described can be applied directly in the conservative or primitive variables. However, it is well known that
for hyperbolic systems better results are obtained when the reconstruction is performed in local-characteristic variables projection. OFF
can use, accordingly to the user’s selection, all the three set of variables above mentioned.
Even if this scheme has been proven to be very robust and accurate, in some particular conditions like the highly under-expanded jet
transient, the ENO property is not sufficient to ensure the positivity of density and pressure due to the non physical oscillatory solutions.
A modification of the WENO scheme (introducing a Positivity-Preserving limiter) has been implemented. This limiter is based mainly on
the study of Zhang and Shu [39,40]. The aim is to preserve the positivity of pressure and density without destroying the high resolution of
WENO reconstruction. Let v = ρ or p and viWENO±1/2 the high resolution reconstruction at the two interfaces values of the cell i. The key idea
is to limit the reconstructions with a limiter of the form:

v̂iWENO
±1/2 = φ vi±1/2 − vi + vi
 WENO 
(10)

where vi is mean value of v in the cell and φ is a scaling factor that must be able to preserve the positivity of v̂ WENO
i±1/2 . The following scaling
factor is used:
vi
 
φ = min ,1 (11)
vi − vi,min + ε
where ε is small positive number introduced to avoid zero division in the smooth regions (the most tiny value representable in the
computational architecture used is adopted, i.e. about 10−300 ) and vi,min is defined as the smallest reconstructed value:

vi,min = min viWENO


±1/2 , vi .

 
(12)
The value vi,min has been introduced because for the positivity-preserving property it not sufficient to consider only the reconstructed
extrema, see [39,40], but also the underlining reconstruction polynomial function. For the WENO scheme used there is not a simple
polynomial reconstruction, but a convex non-linear weighted combination of different
 polynomials.  Thus, without an explicit polynomial,
as Zhang and Shu [39], a Gauss–Lobatto quadrature rule is used in the domain xi−1/2 , xi+1/2 with the Nq quadrature points Qi =
 
Nq
Xi−1/2 = x1i , x2i , . . . , xi = xi+1/2 . The number of quadrature points Nq is the smallest integer satisfying 2Nq − 3 > S such that:

 xi+1/2 Nq −1
1 
vi = pi (x)dx = ϑl pi (xli ) + ϑ1 vi−1/2 + ϑNq vi+1/2 (13)
∆x xi−1/2 l =2

where pi (x) is the Gauss–Lobatto quadrature with ϑl the lth weight. A sufficient (but not necessary) condition for preserve the positivity
Nq −1
of v is that pi (xl ) > 0 ∀l = 1, 2, . . . , Nq . By the mean theorem, see [40], there exists some x∗i such that pi (x∗i ) = 1−12ϑ l =2 ϑl pi (xli ). Thus
1
from (13):

vi − ϑ1 vi−1/2 − ϑNq vi+1/2


vi∗ = . (14)
1 − 2ϑ1
The scheme made up of (10)–(12) and (14) preserves the positivity without destroying the high resolution of the WENO reconstruction.
The Gauss–Lobatto quadrature points are the same as Zhang and Shu [40].
OFF currently accepts 4 values for the order of space reconstruction, namely 1st, 3rd, 5th and 7th order using 1, 2, 3 and 4 WENO stencils
respectively. It is worth noting that the formal accuracy order can be easily increased providing the necessary WENO coefficients tables.

2.2.3. Riemann’s problem solvers


Once the reconstruction of interfaces values has been performed the convective fluxes can be computed by means of a proper solution
of a Riemann’s Problem, defined firstly in [35]. The concept or Riemann’s Problem (RP) can be applied to a wide range of different fields,
but here is adopted for computing the convective fluxes that are related to a subset of the Navier–Stokes equations, namely the Euler’s
laws, [41–43]. In general, solving a RP consists to find the time evolution, according to the conservative laws, of some discontinuity
10 S. Zaghi / Computer Physics Communications ( ) –

on the initial conditions. In computing the convective fluxes the necessity to solve a RP arises because each cell interface constitutes a
discontinuity separating the left and right cell states. The structure of the solution of the RP for the Euler’s laws is composed by three
waves. The middle wave (traveling along u, being the velocity
 component normal the RP interface) is always a contact discontinuity while
γp
the left and right waves (traveling along u ± a, being a = ρ
the speed of sound) are the non-linear acoustic waves and they can be either
a shock or rarefaction. Therefore, there are four possible waves patterns. The two states separated by the contact discontinuity constitute
the star region. These two states have the same velocity and pressure while density (and internal energy) jump crossing the discontinuity.
There is a great number of different solvers with their own pros and cons. There is no general best solver for all applications, see [44].
Into OFF more than ten different RP solvers have been presently implemented. The detailed description of all of them is outside the aim
of the present paper. Into the source file Lib_Riemann.f90 some documentations are available for each solver implemented. Here, some
characteristics of the most important solvers implemented are briefly described.
The first solver mentioned belongs to the ‘‘exact iterative’’ family. It is based on the Newton’s iterative method, firstly devised by
Newton [45] and Raphson [46], see also Newton [47]. Briefly, using the isentropic relations and the Rankine–Hugoniot jump conditions it
is possible to relate the unknown states with the initial left and right ones. The system so constructed is non linear thus it can be solved
iteratively. This procedure is very expensive and in some cases performs badly. For more detail see Toro [48]. The use of this solver is
discouraged if coupled with high order WENO: it is too expensive and the absence of diffusion terms can let to unstable (oscillatory)
reconstruction.
The second solver adopted is the so-called HLLC due to Toro [49]. It is a modification of the HLL solver of Harten et al. [50]. The HLL
solver assumes that the four possible waves patterns are approximated by a single one in which the contact discontinuity is absent and
an ‘‘a-priori’’ estimates of the two upper bounds of the waves speed are available. The HLLC solver is based on the same assumption, but
reintroduces the contact discontinuity. Thus HLLC assumes three constant states separated by three waves with a fixed pattern. Such a
scheme provides in a closed non-iterative form the interfaces fluxes resulting in a very efficient and accurate approach having a minor
computational cost than the exact iterative one. With respect to the exact solver it is also more robust keeping a similar accuracy. However,
HLLC scheme is operative after the three, ‘‘a-priori’’, estimations of waves speed (indicated as SL , S and SR in the following) are provided.
A lot of different algorithms have been proposed. A modification of the algorithm proposed by Toro [49] is implemented into OFF.
The algorithm is based on three very simple approximations of the star states. First it is possible to provide a very simple evaluation by
means of a linearization of the RP in the form of primitive variables, see [51]:

1 1
u∗pv l = (uL + uR ) + (pL − pR ) / (ρ a)
2 2 (15)
1 1
p∗pv l = (pL + pR ) + (uL − uR ) (ρ a)
2 2
where ρ and a are the mean of density and speed of sound. Here, the arithmetic mean is used. This approximation performs very badly
especially for large pressure ratio with strong shocks and rarefactions, but it useful for an evaluation of the velocity of the star states. A
second evaluation can be provided assuming that both the non-linear waves are rarefactions thus using the isentropic relations it possible
to find a closed form of the star states:

PLR uL /aL + uR /aR + 2(PLR − 1)/(γ − 1) γ −1


 z
pL
u∗tr = PLR = z=
PLR /aL + 1/aR γ pR
   1z  1z  (16)
γ −1  γ −1  ∗

1
p∗tr = uL − u∗ .
 
pL 1 + + pR 1 + u − uR
2 2aL 2aR

This approximation is exact for the two rarefactions waves pattern, but it introduces strong errors (and entropy violation) for patterns
with shocks. Finally, assuming two shocks waves pattern, a third semi-closed form of star states is as follows [52]:

gL (p0 )pL + gR (p0 )pR − (uR − uL ) Ak 2 γ −1

pts = gk (p) = , k = L, R Ak = Bk = pk
gL (p0 ) + gR (p0 ) p + Bk (γ + 1) ρk γ +1 (17)
1 1  ∗
u∗ts = (uL + uR ) + p − pR gR (p0 ) − p∗ − pL gL (p0 )
   
2 2

where p0 is the guess estimation for the star pressure. Here, the max 0, p∗pv l approximation is used as first estimate. This approximation
 
is not closed because it is based on the Rankine–Hugoniot jump relations. It is a very good approximation for the waves patterns with
shocks. Using the above three star states approximations the proposed algorithm for computing the upper bounds SL and SR of the left and
right waves speed is the following:

1. define pmin = min(pL , pR ) and pmax = max(pL , pR );


2. compute u∗pv l and p∗pv l ;
3. applying the following algorithm:

if (pmax /pmin > 2) OR pmin > p∗pv l ) OR (p∗pv l > pmax then
 

if p∗pv l < ppmin then use u∗ = u∗tr p∗ = p∗tr


else use u∗ = u∗ts p∗ = p∗ts
else use u∗ = u∗pv l p∗ = p∗pv l
S. Zaghi / Computer Physics Communications ( ) – 11

4. with the estimates of u∗ and p∗ , the speed of left and right waves is computed as:
Left w av e
u∗ < uL then
 
if

1
shock w av e SL = uL − aL 1+ (γL + 1) (p∗ /pL − 1)
2γL
else
rarefaction w av e SL = uL − aL
Right w av e
u∗ > uR then
 
if

1
shock w av e SR = uR + aR 1+ (γR + 1) (p∗ /pR − 1)
2γR
else
rarefaction w av e SR = uR + aR
5. provided SL and SR , the wave speed S of the contact discontinuity is computed as:
ρR uR (SR − uR ) − ρL uL (SL − uL ) + pL − pR
S= .
ρR (SR − uR ) − ρL (SL − uL )
The third RP solver considered is less accurate than the above solvers, but it is the most inexpensive. It is referred as local Lax–Friedrichs
(LLF) solver, also known as the Rusanov’s one [53]. It is derived from a linearization of the conservation laws. The contact discontinuity
wave is omitted and the possible waves patterns are assumed to be composed be a singular constant state, similarly to the HLL approach.
The waves speeds are assumed to be bounded by two ‘‘a-priori’’ estimations (SL and SR ) that must be provided to enable the closed non
iterative fluxes computation. For this purpose, into OFF the estimates of SL and SR are provided by the same algorithm adopted for the HLLC
solver (without the step 5 necessary only for the HLLC solver). The LLF solver is a more dissipative scheme than the others especially for the
capture of contact discontinuity, nevertheless, it is very inexpensive and if it is coupled with high resolution schemes it can provide accurate
solutions, comparable with the accuracy of other solvers: for high costly (huge numerical grids) computations the LLF solver/WENO high
order reconstruction is a well-balanced compromise between accuracy and computational costs.
Besides the three solvers just described, other RP solvers have been implemented into OFF. Basically, the most part of the solvers
described into [48] have implemented. The most relevant solvers are: Roe’s solver, Two Rarefactions (TR) and Two Shocks (TS) ones,
Primitive Variables (PV) linearized solver and some hybrids like the Adaptive TR–TS–PV and TR–TS–HLL ones.
It is worth noting that each RP solver implemented takes into account the multi-fluids nature of mathematical models adopted: the
states constituting the initial discontinuity have, in general, different specific heats (ratio).

2.2.4. Strong stability preserving Runge–Kutta scheme


Using the Method of Lines [54], after the WENO reconstruction and the fluxes computation, the system of equation (5) constitutes a
system of Ordinary Differential Equations (ODE) that can be solved by means of a multi-stages Runge–Kutta method, originally developed
by Runge [55] and Kutta [56]. An explicit scheme belonging to the Strong-Stability-Preserving family, SSP, see [57], is used. In the Butcher’s
form, see [58], the scheme reads:
Ns

Uin+1 = Uin + b k K i ,k (18)
k=1

where Ns is the number of Runge–Kutta stages used and the Kk is the kth stage defined as:
 
Ns −1

Kk = Dt · R Uin + al Ki,l (19)
l =1

OFF implements a 5 stages, 4th order SSP Runge–Kutta scheme for using with the 5th and 7th orders WENO scheme, and an optimal 3rd
3 stages for using with the 3rd order WENO scheme. For the 1st order scheme (piecewise constant reconstruction) an explicit Euler first
order scheme is available. The Butcher’s tables of the coefficients al and bk used are reported in Appendix B.
The system of PDEs solved is strongly non-linear. As a consequence, there is no general stability criterion ensuring the convergence of
the numerical scheme when also the consistence is achieved. Nevertheless, it is usual to adopt a heuristic approach in which the results of
linear stability analysis are applied to non-linear case. A CFL-like condition, see [59], is used in order to provide a domain-of-dependence
criterion used as an upper bound for a stability coefficient. In particular, in a three dimensional frame of reference the following sufficient,
local (cell), condition is adopted:
6
−
→   ∆t
i
CFLi = max ⃗ s + ai S s
Vs · n (20)
s =1 Vi


⃗s and Ss are the velocity vector, the unit vector and area of the sth interface of the ith finite volume and where ai , ∆ti and Vi are
where Vs , n
the local speed of sound, the local time step and the volume, respectively. In an unsteady simulation all the transitory phenomena must
be accurately captured, therefore a global limit is computed as a minimum of the above local conditions:
CFL = min (CFLi ) . (21)
i
12 S. Zaghi / Computer Physics Communications ( ) –

Fig. 3. UML diagram of sources hierarchy.

3. OFF Application Program Interface

In this section the Application Program Interface (API) of OFF code is described. An exhaustive explanation of the whole API is out of the
scope of the present paper: for a complete description refer to OFF documentation. Here, only the key-features of OFF are briefly described.
Particular relevance is given to the base objects constituting the foundation of the code. To the authors knowledge there is no free Fortran
(standard 2003/2008) CFD codes other than OFF developed with Object Oriented Programming (OOP) paradigm as a design specification.
Also some coding rules are highlighted because they have driven some algorithmic choices and to facilitate the use, understanding and
improvement of the code itself.

3.1. General coding rules

The tree of the project is simple. Into the root of the project there are five directories and one (make)file:
1. doc: a directory containing the documentation pages that are created by means of doxygen directly from the source files comments;
2. examples: a directory containing the bash scripts for running some simulations tests;
3. inputs-template: a directory containing the template of ascii input files;
4. src: a directory containing the source files (without any other nested directories);
5. util: a directory containing some utilities;
6. makefile: the makefile for compiling IBM, OFF and POG.
In order to keep the project tree clean and lightweight some rules of programming style, that could be somehow inflexible, have been
adopted. First of all the implicit typing capability of Fortran is avoided: each snippet of code (program, module or procedure) has the
implicit none statement, thus each variable must be explicitly declared. In this way the programming is more verbose, but the codes are
more clear and less prone to errors. Another styling rule concerns with the naming of the source files: the name of a source file must clearly
identifies the kind of its contents. Three main kinds of contents have been recognized:
1. data type modules: the name of this kind of files starts with Data_Type_ and indicates that they contain the (complete) definition of
a base object (in the contest of OOP); this means that this kind of files provides the definition of a Fortran derived type and all the
necessary procedures to deal with it, building up a far complete object that is easy to use and extend;
2. library modules: the name of this kind of files starts with Lib_ indicating that they contain procedures to complete some specific task
(e.g. space integration over a stencil); these libraries can (or not) be built upon Data_Type_ modules;
3. main codes: the name of this kind of files has no standard prefix; these files contain the main programs that drive the completion of
the numerical simulation (or other tasks, e.g. post-processing of the numerical results) using the previously defined library modules
and base objects; at present, they are only three: IBM, OFF and POG.
This coding rules define a hierarchy: the data type modules are the bricks over which libraries are built and, finally, the main codes use
libraries (and data type objects) to complete their tasks. One exception to the above described rules is the module IR_Precision. OFF has
been designed to be portable, meaning that it must produce the same results on all the supported architectures. In order to ensure the
portability of a code the precision of the numbers representation18 must be carefully taken into account. In modern Fortran standards
(since standard 90/95) many useful intrinsic procedures are available for selecting the precision of integer and real variables. The module
IR_Precision just contains the necessary named constants (and some useful procedures) to parametrize the precision of integers and reals
in a truly portable way. As a consequence, this module constitutes the first block of each other module: all source files use IR_Precision
module as first use statement. Fig. 3 shows the UML diagram of sources hierarchy.
Both objects and libraries must be encapsulated into a module. This rule has two pros and one cons (at least). Using the module
encapsulation an automatic interface for each procedure is created. This greatly reduces the procedures calling errors, e.g. arguments type

18 In computer science the numbers are represented by a finite model data. Fortran standard uses a finite number of bits to represent numbers (integers and reals): the
q
accuracy of the numbers representation is directly related to the number of bits used, e.g. a model for integer number is I = s × k=1 wk × 2k−1 , where s is the bit sign (−1
or +1), wk is 0 or 1 and q is the number of bits used. The number q is related to the kind specification of Fortran’s variables declaration: if the kind selection is not carefully
done Fortran intrinsic selection is architecture-dependent and, as a consequence, not portable.
S. Zaghi / Computer Physics Communications ( ) – 13

mismatch, facilitates the optimization and improves the re-usability. Secondly, it makes available the dynamic dispatch (enhancing the
portability coupling with the parametrized selection of numbers kind precision), polymorphism and inheritance capabilities (facilitating
the objects extensions) necessary for OOP. One cons is that module encapsulation complicates the compilation of source codes: before
compiling a source file each module used must be already compiled, originating a compilation hierarchy. This hierarchy can rapidly become
very complex (inter-modules dependency). Many commercial Integrated Development Environments (IDE) and some free ones can
automatically resolve this hierarchical dependency. Besides there are some frees compiling systems, e.g. scons,19 able to automatically build
such a complex project. Unfortunately these solutions are not portable due to their low availability on many supercomputer architectures.
On these architectures it is common to use make building tool. In particular, GNU Make20 is a mature tool and, consequently, OFF uses
it. GNU Make is not able to resolve modules dependency. Therefore, a (partial) solution to this problem is provided by means of a small
script (bash based): fmake. It is a simple script developed by the authors and freely distributed21 that analyzes the source files and creates
the makefile with the compiling rules having the correct hierarchical dependency. A more reliable solution of the modules dependencies
can be obtained by means of CMake, available in many HPC facilities: CMake22 is a cross-platform, open-source build system designed to
build, test and package software. It controls the compilation process using simple platform and compiler independent configuration files
generating native makefiles and workspaces that can be used in the compiler environment of your choice. However, no CMake support is
currently provided within OFF project.
Another important coding rule adopted concerns the conditional compilation of source files. In order to maximize the efficiency many of
the user options are implemented by means of conditional compilation directives rather than intrinsic Fortran conditional logics: different
optimized executables are produced accordingly to the user options. As an example if the user chooses to perform a two dimensional
simulation (rather than a three dimensional one) a possible implementation is to place, where necessary, a control like:
s e l e c t case ( dimensions )
case ( 1 )
! perform one dimensional stuffs
case ( 2 )
! perform two dimensional stuffs
case ( 3 )
! perform three dimensional stuffs
endselect

Listing 1: Fortran intrinsic logical controls.

On the contrary, it has been chosen to avoid unnecessary (inefficient) control logics by means of conditional compilations directives:
#ifdef ONED
! perform one dimensional stuffs
#ifdef TWOD
! perform two dimensional stuffs
#ifdef THREED
! perform three dimensional stuffs
#endif

Listing 2: Pre-processing conditional compiling directives.

The latter implementation produces a less clear code, but it enhances the computational velocity. A Fortran syntax of pre-processing
directives is not yet available, even if for the development of modern Fortran standards the discussion on a future Fortran pre-processor is
started. As a consequence, OFF project adopts C pre-processing syntax. The use of conditional compilation is somehow abused (presently
there are 16 pre-processing options) making complex the compilation. Nevertheless, within OFF a well-documented makefile is provided
that coupled with the script fmake mitigates the problem.
Fig. 4 shows the current OFF 23 UML diagram of the main algorithms. This is a simplified view where only the key-algorithms are present.
The most complicated algorithms are devoted to the space operator while the Runge–Kutta time integrator is quite simple. In particular, the
high order reconstruction and fluxes computations are the most relevant parts. Therefore, many of the base objects and libraries concern
with the space operator.
In the next subsections a brief description of the main data type modules, libraries and main codes is given.

3.2. Data type modules: base objects

As previously stated, many of the data types designed for OFF have been developed accordingly to OOP specifications. As a consequence,
besides the derived type components (referred also as members in the following) declaration, also type bound procedures declaration is
commonly present. Moreover, many non type bound procedures are encapsulated into data type modules. Ergo, a data type module can
be a very long file: presently OFF Data_Type_ modules have 13913 lines of code (9427 without doxygen documentation). In the following
subsections only the derived types implementation are discussed and only a brief description of the main type bound procedures is
provided.

19 SCons is an Open Source software construction tool that is a next-generation build tool, http://www.scons.org/.
20 GNU Make is a tool which controls the generation of executables and other non-source files of a program from the program’s source files, http://www.gnu.org/software/
make.
21 fmake is a small script for easy creation of makefile for Fortran (standard 90 or higher) projects. The aim is the easy creation of the compiling rules with the correct
hierarchy for inter-dependent modules, https://github.com/szaghi/fmake.
22 It is obtainable from http://www.cmake.org/.
23 IBM and POG codes share most of the base objects used for building up OFF, thus they will not be described in details.
14 S. Zaghi / Computer Physics Communications ( ) –

Fig. 4. UML diagram of OFF.

The implementation of the derived type constituting the base objects follows two coding rules. The first rule is that the internal
components are public. Even if the module objects are designed with encapsulated type procedures (for manipulating the objects) in
the framework of High Performance Computing (HPC) it is useful to have direct access to the internal members for performing highly
efficient, non standard computations; as stated into the introduction section, performing OFF profiling, it has been emerged that some
overloaded operators of Type_Conservative (see Section 3.2.4) are less efficient than using directly the Type_Conservative components.
The second rule is that, wherever it is possible, each derived type component must be initialized otherwise a suited type bound procedure
(commonly named init or alloc) must be provided. The latter is the case, for example, of the members that are allocatable arrays.

3.2.1. IR_Precision
The module IR_Precision makes available some portable kind-parameters and some useful procedures to deal with them. It also provides
variables that contain the minimum and maximum representable values, smallest real values and smallest representable difference
between numbers by the running architecture. The most important named constants are:
integer , parameter : : R16P = s e l e c t e d _ r e a l _ k i n d (33 ,4931)
integer , parameter : : R8P = s e l e c t e d _ r e a l _ k i n d (15 ,307)
integer , parameter : : R4P = selected_real_kind (6 ,37)
integer , parameter : : I8P = s e l e c t e d _ i n t _ k i n d ( 18 )
integer , parameter : : I4P = selected_int_kind (9)
integer , parameter : : I2P = selected_int_kind (4)
integer , parameter : : I1P = selected_int_kind (2)
Listing 3: Portable Kind-Parameters.

These are portable parameters defining 3 kind precisions for reals and 4 for integers. The above selected kinds allow the following
representable numbers:
1. R16P: real with at least 33 digits of decimal precision and with a decimal exponent range of at least 4931, i.e. [10−4931 , 10+4931 − 1];
2. R8P: real with at least 15 digits of decimal precision and with a decimal exponent range of at least 307, i.e. [10−307 , 10+307 − 1];
3. R4P: real with at least 6 digits of decimal precision and with a decimal exponent range of at least 37, i.e. [10−37 , 10+37 − 1];
4. I8P: integer with a decimal exponent range of at least 18, i.e. [−263 , 263 − 1];
5. I4P: integer with a decimal exponent range of at least 9, i.e. [−231 , 231 − 1];
6. I2P: integer with a decimal exponent range of at least 4, i.e. [−215 , 215 − 1];
7. I1P: integer with a decimal exponent range of at least 2, i.e. [−27 , 27 − 1].
The real kinds correspond to the old, non portable definition of quadruple, double and single precisions of Fortran standard 77 (and before).
It is worth to note that R16P kind is not supported by all architectures thus a conditional compiling option is used for it. The numbers in
the constants names (16P, 8P, 4P, 2P, and 1P) indicate (approximatively) the number of bytes that is used to represent the corresponding
numbers. This is only an estimation because the number of bytes used is architecture-dependent while the precision of the representations
is truly portable.
The module IR_Precision provides some useful procedures for type-casting (conversion) numbers to string and vice versa. The public
procedures str and strz cast numbers (integers and reals) to strings (having proper length) while cton converts strings to numbers.
Moreover, IR_Precision implements check_endian a procedure for checking the kind of endianism (the bits ordering, i.e. little of big endian
order) of the running architecture. Finally, the bit_size intrinsic function is overloaded in order to compute the number of bits of also real
variables.
The above named constants parametrize all source files. Using IR_Precision module, it is easy to select the precision of reals and integers
variables accordingly to their specific application. This preserves the accuracy of the computation, while allowing at the same to save
memory.
S. Zaghi / Computer Physics Communications ( ) – 15

3.2.2. Type_Vector
The Data_Type_Vector module contains the definition of Type_Vector a derived type useful for manipulating vectors in 3D space:
type , p u b l i c : : Type_Vector
r e a l ( R8P ) : : x = 0 . _R8P
r e a l ( R8P ) : : y = 0 . _R8P
r e a l ( R8P ) : : z = 0 . _R8P
contains
procedure , non_overridable : : set
procedure , non_overridable : : pprint
procedure , non_overridable : : sq_norm
procedure , non_overridable : : normL2
procedure , non_overridable : : normalize
endtype Type_Vector
Listing 4: Type_Vector definition.
The components of Type_Vector are reals with R8P kind as defined by the IR_Precision module and are initialized to 0. These represent the
vector components and are defined in a three-dimensional Cartesian frame of reference. The operators of assignment (=), multiplication
(∗), division (/), sum (+) and subtraction (−) have been overloaded (taking into account all parametrized precisions). Furthermore, the
dot and cross products have been defined. Therefore, this module provides a nearly complete algebra based on Type_Vector derived type.
This algebra simplifies the vectorial operations involved in solving PDE systems. The end user can implement vectorial computation in a
more natural and high-level abstraction. During the OFF profiling, it has been checked that the use of the overloaded operators does not
degrade the performance. Indeed, the 3 components of Type_Vector are equivalent, in terms of performance, to a standard 3 elements
static array: the use of derived type introduces a negligible overhead while the vantages of OOP remain.

→ − → −

The module Data_Type_Vector provides three useful named constants: ex , ey , and ez , the unit vector of x, y and z, respectively.

3.2.3. Type_Tensor
The Data_Type_Tensor module contains the definition of Type_Tensor a derived type useful for manipulating second order tensors in
3D space:
type , p u b l i c : : Type_Tensor
type ( Type_Vector ) : : x
type ( Type_Vector ) : : y
type ( Type_Vector ) : : z
contains
procedure , non_overridable : : set
procedure , non_overridable : : sq_norm
procedure , non_overridable : : normL2
procedure , non_overridable : : normalize
procedure , non_overridable : : transpose
procedure , non_overridable : : determinant
procedure , non_overridable : : invert
procedure , non_overridable : : invertible
procedure , non_overridable : : rotox
procedure , non_overridable : : rotoy
procedure , non_overridable : : rotoz
procedure , non_overridable : : rotou
endtype Type_Tensor
Listing 5: Type_Tensor definition.
It is built upon Type_Vector. The components of Type_Tensor are derived type of Type_Vector (initialized intrinsically). The components
are defined in a three-dimensional Cartesian frame of reference. As for Type_Vector, a nearly complete algebra based on Type_Tensor has
been developed, simplifying the tensorial operations involved in solving PDE systems without introducing significant overheads.
−
→ −
→ −
→
The module Data_Type_Tensor provides a useful named constant: unity, the unity (identity) tensor, defined as I = ex , ey , ez .

3.2.4. Type_Conservative
The Data_Type_Conservative module contains the definition of Type_Conservative a derived type useful for manipulating conservative
variables:
type , p u b l i c : : Type_Conservative
r e a l ( R8P ) , a l l o c a t a b l e : : r s ( : )
type ( Type_Vector ) : : rv
r e a l ( R8P ) : : re = 0 . _R8P
contains
procedure , non_overridable : : i n i t
procedure , non_overridable : : f r e e
procedure , non_overridable : : cons2array
procedure , non_overridable : : array2cons
procedure , non_overridable : : p p r i n t
endtype Type_Conservative
Listing 6: Type_Conservative definition.
It has 3 components:
1. rs(:): allocatable array of reals containing the partial densities of single species; at runtime it is dimensioned as [1 : NS ], NS being the
number of species;
16 S. Zaghi / Computer Physics Communications ( ) –

2. rv: Type_Vector scalar containing the momentum vector of the mixture;


3. re: real scalar containing the specific total energy times the density of the mixture.
These quantities are conserved during the Navier–Stokes PDE integration, thus they are of crucial relevance for conservative schemes.
The partial densities array is allocated at runtime when the number of the species constituting the initial mixture is known from the
input files. There is a type bound procedure for its initialization, init, and the partial densities can be set either to zero or to a user-
provided value. Similarly, there is a type bound procedure, free, for freeing the its memory. In some particular circumstances (e.g. in
parallel communications using external libraries such as MPI that does not know the Type_Conservative definition) it is useful to cast
Type_Conservative variable to a simple array of reals and vice versa. To this aim, the type bound procedures cons2array and array2cons are
provided.
The module Data_Type_Conservative provides a nearly complete algebra based on Type_Conservative by means of a set of overloaded
and specific operators. However, after a code profiling, it has been highlighted a performance degrade due to these operators (to complete
the fifth test of Kurganov and Tadmor [60] reported in figure 9C the overloading-enabled code takes about 7% extra CPU time with respect
to the non-overloaded code). This is essentially due to the presence of allocatable arrays members that introduces some overheads and
negatively impacts on the work of compiler optimization. This lack of performance is probably due to the immaturity of OOP paradigm
implementation of the compilers (currently there is no fully compliant compiler with respect to the Fortran standard 2003/2008). As a
consequence, in some critical algorithms of OFF the overloaded operators of Type_Conservative have been avoided preferring a direct
manipulation of its components.

3.2.5. Type_Primitive
The Data_Type_Primitive module contains the definition of Type_Primitive a derived type useful for manipulating primitive variables:
type , p u b l i c : : Type_Primitive
r e a l ( R8P ) , a l l o c a t a b l e : : r ( : )
type ( Type_Vector ) : : v
r e a l ( R8P ) : : p = 0 . _R8P
r e a l ( R8P ) : : d = 0 . _R8P
r e a l ( R8P ) : : g = 0 . _R8P
contains
procedure , non_overridable : : i n i t
procedure , non_overridable : : f r e e
procedure , non_overridable : : prim2array
procedure , non_overridable : : array2prim
procedure , non_overridable : : p p r i n t
endtype Type_Primitive

Listing 7: Type_Primitive definition.

It has 5 components:
1. r(:): allocatable array of reals containing the partial densities of single species; at runtime it is dimensioned as [1 : NS ], NS being the
number of species;
2. v: Type_Vector containing the velocity vector of the mixture;
3. p: real scalar containing the pressure of the mixture;
4. d: real scalar containing the density of the mixture;
5. g: real scalar containing the specific heats ratio of the mixture.
The first three quantities are commonly referred as primitive variables and are useful, for example, for imposing the boundary conditions
or for computing the interfaces fluxes. As for the Type_Conservative, the partial densities array is allocated at runtime when the number
of initial species is known from the input files. The type bound procedures init and free have the same aims of the Type_Conservative ones.
Again the type bound procedures prim2array and prim2array cast Type_Primitive to a simple array of reals and vice versa.
The module Lib_Fluidyanmic, see Section 3.3.3, provides procedures for converting primitive variables to conservative one and vice
versa.
As Data_Type_Conservative, also Data_Type_Primitive provides a nearly complete algebra based on Type_Primitive. The same
considerations on the performance of overloaded operators made in the previous subsection are valid for Type_Primitive.

3.2.6. Type_BC
The Data_Type_BC module contains the definition of Type_BC a derived type that is a container rather than an object. It is useful for
imposing the boundary conditions information:
type , p u b l i c : : Type_BC
i n t e g e r ( I1P ) : : tp = bc_ext
i n t e g e r ( I4P ) , allocatable :: inf
type ( Type_Adj ) , a l l o c a t a b l e : : adj
contains
procedure , non_overridable : : i n i t
procedure , non_overridable : : f r e e
procedure , non_overridable : : s e t
procedure , non_overridable : : s t r 2 i d
procedure , non_overridable : : i d 2 s t r
endtype Type_BC

Listing 8: Type_BC definition.


S. Zaghi / Computer Physics Communications ( ) – 17

It has 3 components:
1. tp: integer scalar of one byte (for saving memory) indicating the type of the boundary conditions (at present only 7 different boundary
conditions types have been implemented);
2. inf: integer scalar used for different purposes accordingly to the type of BC;
3. adj: Type_Adj scalar used in the case of blocks-adjacency BC.
It is worth to note that the last two components are allocatable variables and are allocated accordingly to the type of BC in order to save
memory.
The Data_Type_BC module defines the types of BC available and it associates to each type an integer named constant. The BC currently
implemented are:
1. bc_ref: reflective BC;
2. bc_ext: extrapolation BC;
3. bc_per: periodic BC;
4. bc_adj: blocks-adjacent BC;
5. bc_wll: no-slip wall BC (experimental, see Section 6.1);
6. bc_in1: inflow 1 BC (steady supersonic conditions);
7. bc_in2: inflow 2 BC (unsteady supersonic conditions).
Accordingly to the BC type, the meaning of inf component is:
1. tp=bc_in1: inflow 1 BC (expressed in primitive variables) are stored into the array in1 contained into Type_Global (see Section 3.2.9)
for each BCs of type bc_in1; inf is the index of that array: the BC data can be accessed by in1(inf);
2. tp=bc_in2: inflow 2 BC (expressed in primitive variables) are stored into the array in2 contained into Type_Global (see Section 3.2.9)
for each BCs of type bc_in1; inf is the index of that array: the BC data can be accessed by in2(n, inf), n being the time step counter.
The type Type_Adj is defined as the following:
type , p u b l i c : : Type_Adj
i n t e g e r ( I4P ) : : b = 0_I4P
i n t e g e r ( I4P ) : : i = 0_I4P
i n t e g e r ( I4P ) : : j = 0_I4P
i n t e g e r ( I4P ) : : k = 0_I4P
endtype Type_Adj

Listing 9: Type_Adj definition.

It contains the indexes information of the adjacent cell. The imposition of BC is achieved by means of ghost cells technique. As a matter of
fact, a frame of ghost cells are clustered around the inner (domain) cells: the boundaries are constituted by the interfaces separating the
inner cells from the ghost ones. The ghost cells variables (in particular the primitive ones) are computed by an algorithm ensuring that
the fluxes on inner/ghost interfaces respect the BC. This technique greatly simplifies the high-order reconstruction: the WENO algorithm
always operates on standard stencils without the necessity of special controls for boundary fluxes, strongly improving the simplicity and
performance of the space operator. On the contrary, ghost cells technique has inefficient memory management: for numerical grids with
huge number of blocks the ghost cells frame can rapidly become a bottleneck due to its memory usage.

3.2.7. Type_Cell
The Data_Type_Cell module contains the definition of Type_Cell a derived type used as a container for all quantities associated to the
cell center location:
type , p u b l i c : : Type_Cell
r e a l ( R8P ) : : V = 0 . _R8P
type ( Type_Vector ) : : cent
r e a l ( R8P ) : : Dt = 0 . _R8P
type ( Type_Primitive ) : : P
type ( Type_Conservative ) : : U
type ( Type_Conservative ) , a l l o c a t a b l e : : KS ( : )
contains
procedure , non_overridable : : i n i t
procedure , non_overridable : : f r e e
endtype Type_Cell

Listing 10: Type_Cell definition.

It has 6 components:
1. V: real scalar containing the value of the cell volume;
2. cent: Type_Vector scalar containing cell center coordinates;
3. Dt: real scalar with the local (cell) time step value;
4. P: Type_Primitive scalar defining the cell primitive variables;
5. U: Type_Conservative scalar defining the cell conservative variables;
6. KS(:): allocatable array of Type_Conservative containing the Runge–Kutta stages (the PDE residuals of each stages); at runtime it is
dimensioned as [1 : rkord ], rkord being the number of Runge–Kutta stages;
Type_Cell being a container rather an object has no overloaded algebra. There are two types bound procedures for initializing and finalizing
allocatable variables, init and free, respectively.
18 S. Zaghi / Computer Physics Communications ( ) –

3.2.8. Type_Face
The Data_Type_Face module contains the definition of Type_Face a derived type used as a container for all quantities associated to the
face center location:
type , p u b l i c : : Type_Face
type ( Type_Vector ) : : N
r e a l ( R8P ) : : S = 0 . _R8P
type ( Type_BC ) : : BC
contains
procedure , non_overridable : : i n i t
procedure , non_overridable : : f r e e
endtype Type_Face

Listing 11: Type_Face definition.

It has 3 components:
1. N: Type_Vector defining the unit, normal vector to the face;
2. S: real containing the face area;
3. BC: Type_BC scalar defining the necessary BC information if the face is a boundary.
Similarly, to Type_Cell, Type_Face is a container rather an object and so no overloaded algebra is provided. There are two types bound
procedures for initializing and finalizing allocatable variables, init and free, respectively.

3.2.9. Type_Global
The Data_Type_Global module contains the definition of Type_Global a derived type used as a container for all variables of global
interest:
type , p u b l i c : : Type_Global
integer ( I_P ) : : myrank = 0 _ I _ P
type ( T y p e _ F i l e ) : : f i l e
! mesh data
i n t e g e r ( I _ P ) : : Nl = 1_I_P
i n t e g e r ( I _ P ) : : Nb = 0_I_P
i n t e g e r ( I _ P ) : : Nb_tot = 0 _ I _ P
i n t e g e r ( I1P ) : : gco = 1_I_P
! boundary conditions data
integer ( I_P ) : : Nin1 = 0 _ I _ P
type ( Type_Primitive ) , a l l o c a t a b l e : : in1 ( : )
integer ( I_P ) : : Nin2 = 0 _ I _ P
type ( Type_Primitive ) , a l l o c a t a b l e : : in2 ( : , : )
! fluid dynamic data
i n t e g e r ( I8P ) : : n = 0_I8P
r e a l ( R_P ) : : t = 0 . _R_P
integer ( I_P ) : : Ns = 1_I_P
integer ( I_P ) : : Np = 7_I_P
integer ( I_P ) : : Nc = 5_I_P
logical :: inviscid = . true .
logical :: unsteady = . true .
i n t e g e r ( I8P ) : : Nmax = 0_I8P
r e a l ( R_P ) : : Tmax = 0 . _R_P
i n t e g e r ( I1P ) : : sp_ord = 1_I_P
i n t e g e r ( I1P ) : : rk_ord = 1_I_P
r e a l ( R_P ) : : CFL = 0 . 3 _R_P
r e a l ( R_P ) : : r e s i d u a l _ t o l l = 0.01 _R_P
logical :: residual_stop = . false .
r e a l ( R_P ) , a l l o c a t a b l e : : cp0 ( : )
r e a l ( R_P ) , a l l o c a t a b l e : : cv0 ( : )
type ( Type_Adimensional ) : : adim
contains
procedure , non_overridable : : a l l o c _ b c
procedure , non_overridable : : load_bc_in1
procedure , non_overridable : : load_bc_in2
procedure , non_overridable : : l o a d _ f l u i d _ s o p t i o n
procedure , non_overridable : : l o a d _ f l u i d _ N s
procedure , non_overridable : : l o a d _ f l u i d _ 0 s p e c i e s
procedure , non_overridable : : a l l o c _ f l u i d
endtype Type_Global

Listing 12: Type_Global definition.

It has 27 components:
1. myrank: integer scalar containing the rank ID of the process which data belongs to (used for MPI communication);
2. file: Type_File scalar containing variables for I/O tasks;
3. Nl: integer scalar containing the number of grid refinements levels used for multi-grids convergence acceleration scheme;
4. Nb: integer scalar containing the number of structured blocks composing the numerical grids; it indicates only the process-local blocks
that can be lower than whole number for parallel MPI simulations;
S. Zaghi / Computer Physics Communications ( ) – 19

5. Nb_tot: integer scalar containing the total number of structured blocks composing the numerical grids; it indicates the whole number
Nprocs−1
of blocks that can be greater than Nb for parallel MPI simulations, i.e. Nb_tot = p=0 Nbp , Nprocs being the number of MPI
processes;
6. gco: integer scalar containing the number of ghost cells necessary to achieve the space reconstruction order selected; depends on
sp_ord value;
7. Nin1: integer scalar containing the number of inflow 1 BC;
8. in1(:): allocatable array of Type_Primitive containing the inflow 1 BC; it is dimensioned at runtime as [1 : Nin1] when Nin1 is read
from input files;
9. Nin2: integer scalar containing the number of inflow 2 BC;
10. in2(:, :): allocatable array of Type_Primitive containing the inflow 2 BC; it is dimensioned at runtime as [1 : Ntmax , 1 : Nin1] when
Nin2 is read from input files and Ntmax being the maximum number of time steps by which the BC are discretized;
11. n: integer scalar containing the actual value of the time steps counter;
12. t: real scalar containing the actual integration time;
13. Ns: integer scalar containing the number of single species composing the mixture of fluids;
14. Np: integer scalar containing the number of primitive variables (depends on Ns);
15. Nc: integer scalar containing the number of conservative variables (depends on Ns);
16. inviscid: boolean scalar used for switching between inviscid PDE (Euler’s equations) and viscid PDE (Navier–Stokes equations);
17. unsteady: boolean scalar used for switching between steady and unsteady simulations;
18. Nmax: integer scalar containing the maximum number of time steps; it is used as a simulation-stop sentinel;
19. Tmax: real scalar containing the maximum integration time; it is used as a simulation-stop sentinel;
20. sp_ord: integer scalar used for selecting the formal accuracy order of space reconstruction; presently admissible values are 1, 3, 5, 7;
its value affect the number of ghost cells used that ranges, respectively, in 1, 2, 3, 4;
21. rk_ord: integer scalar used for selecting the formal accuracy order of time reconstruction; presently admissible values are 1, 2, 3, 4, 5;
its value affect the number of Runge–Kutta stages used;
22. CFL: real scalar containing the Courant–Friedrichs–Lewy stability coefficient, see [59];
23. residual_toll: real scalar containing the tolerance for residuals vanishing evaluation;
24. residual_stop: boolean scalar used a sentinel for stopping steady simulation when residuals vanish;
25. cp0(:): real allocatable array containing the initial specific heat cp for each species; at runtime is dimensioned as [1 : Ns ].
26. cv0(:): real allocatable array containing the initial specific heat cv for each species; at runtime is dimensioned as [1 : Ns ].
27. adim: Type_Adimensional scalar containing non-dimensional numbers.
OFF allocates only one (global) variable of Type_Global with the target specification. Each structured block of Type_SBlock (see
Section 3.2.10) has a pointer linked to the global Type_Global variable. This implementation save memory and simplify the calling signature
because there is no necessity to explicitly pass the global data even for procedures dealing with block (local) data.
The definition of Type_Global is based on two other derived types, Type_File and Type_Adimensional. These types are simple container
and their definitions are omitted. Data_Type_Global provides also useful procedures for I/O operations. Type_Global has also some type
bound procedures for loading global BC information and initial species data.

3.2.10. Type_SBlock
The Data_Type_SBlock module contains the definition of Type_SBlock a derived type used as a container for all variables associated to
a structured block of cells:
type , p u b l i c : : Type_SBlock
type ( Type_Global ) , p o i n t e r : : global
i n t e g e r ( I1P ) : : gc ( 1 : 6 ) = ( / 1 , 1 , 1 , 1 , 1 , 1 / )
i n t e g e r ( I4P ) : : Ni = 0
i n t e g e r ( I4P ) : : Nj = 0
i n t e g e r ( I4P ) : : Nk = 0
type ( Type_Vector ) , a l l o c a t a b l e : : node ( : , : , : )
type ( Type_Face ) , allocatable :: Fi (: , : , :)
type ( Type_Face ) , allocatable :: Fj ( : , : , :)
type ( Type_Face ) , a l l o c a t a b l e : : Fk ( : , : , : )
type ( Type_Cell ) , allocatable :: C(: , : , :)
contains
procedure , non_overridable : : a l l o c
procedure , non_overridable : : f r e e
procedure , non_overridable : : save_mesh
procedure , non_overridable : : load_mesh_dims
procedure , non_overridable : : load_mesh
procedure , non_overridable : : print_info_mesh
procedure , non_overridable : : save_bc
procedure , non_overridable : : load_bc
procedure , non_overridable : : s a v e _ f l u i d
procedure , non_overridable : : l o a d _ f l u i d
procedure , non_overridable : : p r i n t _ i n f o _ f l u i d
procedure , non_overridable : : metrics
procedure , non_overridable : : m e t r i c s _ c o r r e c t i o n
procedure , non_overridable : : node2center
endtype Type_SBlock

Listing 13: Type_SBlock definition.


20 S. Zaghi / Computer Physics Communications ( ) –

Type_SBlock is one of the fundamental bricks upon which OFF project is built. It has 10 components:
1. global: Type_Global pointer (see Section 3.2.9) containing data of global interest, meaning data each blocks are concerned with;
2. gc(1:6): one byte integer array of 6 elements (one for each face of cell hexahedron) containing the number of ghost cells used for
boundary conditions imposition;
3. Ni: integer scalar indicating the number of cells in ‘i’ direction;
4. Nj: integer scalar indicating the number of cells in ‘j’ direction;
5. Nk: integer scalar indicating the number of cells in ‘k’ direction;
6. node(:, :, :): Type_Vector allocatable array containing the nodes (vertices of cells) coordinates; at runtime it is dimensioned as
[0 − gc (1) : Ni + gc (2), 0 − gc (3) : Nj + gc (4), 0 − gc (5) : Nk + gc (6)];
7. Fi(:, :, :): Type_Face allocatable array containing the ‘i’ faces data; at runtime it is dimensioned as [0 − gc (1) : Ni + gc (2), 1 − gc (3) :
Nj + gc (4), 1 − gc (5) : Nk + gc (6)];
8. Fj(:, :, :): Type_Face allocatable array containing the ‘j’ faces data; at runtime it is dimensioned as [1 − gc (1) : Ni + gc (2), 0 − gc (3) :
Nj + gc (4), 1 − gc (5) : Nk + gc (6)];
9. Fk(:, :, :): Type_Face allocatable array containing the ‘k’ faces data; at runtime it is dimensioned as [1 − gc (1) : Ni + gc (2), 1 − gc (3) :
Nj + gc (4), 0 − gc (5) : Nk + gc (6)];
10. C(:, :, :): Type_Cell allocatable array containing the cells’ data; at runtime it is dimensioned as [1 − gc (1) : Ni + gc (2), 1 − gc (3) :
Nj + gc (4), 1 − gc (5) : Nk + gc (6)].
The allocatable arrays of Type_SBlock are allocated at runtime when the dimensions of the grids are known form the input files.
Type_SBlock has some type bound procedures for I/O operations of BC, mesh and initial conditions data. It also has procedures for
computing the metrics of the block, namely the procedures metrics and metrics_correction.
The main data of OFF is allocated into an array of Type_SBlock. This greatly simplify the calling signatures of procedures contained into
the libraries. Presently, the main array is dimensioned as [1 : NB , 1 : NL ], NB being the number of blocks (local to each process) and NL
is the number of grid refinement levels used by the multi-grids scheme (at present OFF multi-grids scheme implementation is not still
validated).

3.3. Library modules: libraries of procedures (built upon base objects)

The 3 main codes currently developed, namely IBM, OFF and POG, use more than 10 libraries. A comprehensive description of all the
libraries is out of the scope of the present paper. The most interesting libraries aimed at numerical solution of PDE systems are the only
described.

3.3.1. Lib_WENO
The module Lib_WENO contains the definition of procedures for computing Weighted Essentially Non-Oscillatory (WENO)
reconstruction with a user-defined Pth formal order. As aforementioned, WENO scheme is a modification of the ENO reconstruction
scheme [34,38] aimed at computing a high order (>1) reconstruction of a (scalar) variable at the left and right of an interface using the
variable values into some stencils. The key idea of WENO scheme is to use a convex combination of all candidates stencils (instead of using
only the smoothest one) for achieving high order reconstruction, see Section 2.2.2.
The library is designed to be a low level back-end without the use of any base object. The focus is posed on the non-linear weighting of
the stencils used for the reconstruction of a generic real scalar variable. As a consequence, the procedures provided by Lib_WENO can be
used for reconstructing any kind of variables, e.g. for primitive, conservative and local-characteristic ones.
Lib_WENO has two main public procedures:
1. weno_init: for initializing private members of Lib_WENO;
2. weno: for performing WENO reconstruction.
The first procedure, weno_init, is used as library front-end for initializing the private members of Lib_WENO, accordingly with the user-
selected options. In particular, the WENO coefficients (see Appendix A) are all private members of Lib_WENO and must be initialized when
the formal order of space reconstruction (i.e. the number of stencils to be used) has been selected. At present, the WENO coefficients have
been tabulated only for 3rd, 5th and 7th, see [38,61]. Moreover, only upwind-biased scheme have been implemented, whereas central
algorithms are now implementing.
The second, most important, procedure provided is weno:
pure subroutine weno( S , V , VR)
i m p l i c i t none
i n t e g e r ( I _ P ) , i n t e n t ( IN ) : : S
r e a l ( R_P ) , i n t e n t ( IN ) : : V (1: ,1 − S : )
r e a l ( R_P ) , i n t e n t (OUT ) : : VR ( 1 : )

Listing 14: weno calling signature.

It is a pure procedure (for avoiding the side effects and enhancing the optimization) where:
1. S: is the number of stencils used;
2. V : is a real array, dimensioned as [1 : 2, 1 − S : −1 + S ], containing the cell centered values of the variable to be reconstructed;
3. VR: is a real array, dimensioned as [1 : 2], containing the left (1) and right (2) reconstructed values of V .
Using upwind-biased algorithm the V input is organized in left (1) and right (2) components. It is worth to note that no geometrical data
are passed: the smoothness of each stencil is evaluated by means of undivided difference supposing that the grid is uniform. This is a
simplification that can be too strong if the numerical grids have non-smooth, sudden variations.
S. Zaghi / Computer Physics Communications ( ) – 21

The procedure weno calls weno_weights that is a private member of Lib_WENO aimed at the computing of non linear weights of the
stencils. This is a crucial procedure. Its behavior can be modified using two conditional compilation options, namely WENOZ and WENOM.
These options are mutual exclusive and activate two different modifications of the weights computation:
1. WENOZ : the algorithm of Borges et al. [62] is used for limiting the dissipation of the original smooth indicators;
2. WENOM: the Mapped WENO scheme of Henrick et al. [63] is used for achieving optimal order near critical points.
Lib_WENO contains also some experimental procedures for computing central WENO reconstruction (instead of upwind-biased) and
for hybrid schemes implementation (where WENO and optimal central compact schemes are hybridized). Presently, these procedures are
not validated.

3.3.2. Lib_Riemann
The module library Lib_Riemann contains the procedures for computing the solution of the Riemann’s Problem, presently only for the
Euler’s conservation laws, see Section 2.2.3. Similarly, to Lib_WENO, Lib_Riemann is a low level back-end library thus base objects are not
used.
The developed RP solvers provide a solution of the convective fluxes F c normal to the interface direction. All the solvers provided have
the same, unique API. They take as input the primitive variables in the left (state 1) and right (state 4) cells with respect to the interface
and provide the convective fluxes as output. The listing 15 shows the calling signature shared between each RP solver provided.
elemental subroutine Riem_Solver ( p1 , r1 , u1 , g1 , p4 , r4 , u4 , g4 , F_r , F_u , F_E )
i m p l i c i t none
r e a l ( R_P ) , i n t e n t ( IN ) : : p1
r e a l ( R_P ) , i n t e n t ( IN ) : : r1
r e a l ( R_P ) , i n t e n t ( IN ) : : u1
r e a l ( R_P ) , i n t e n t ( IN ) : : g1
r e a l ( R_P ) , i n t e n t ( IN ) : : p4
r e a l ( R_P ) , i n t e n t ( IN ) : : r4
r e a l ( R_P ) , i n t e n t ( IN ) : : u4
r e a l ( R_P ) , i n t e n t ( IN ) : : g4
r e a l ( R_P ) , i n t e n t (OUT ) : : F_r
r e a l ( R_P ) , i n t e n t (OUT ) : : F_u
r e a l ( R_P ) , i n t e n t (OUT ) : : F_E
Listing 15: Riemann’s Problem solver calling signature.

where p1 , r1 , u1 , g1 define the left state, p4 , r4 , u4 , g4 define the right one and Fr , Fu and FE are the convective fluxes of mass, momentum
and energy conservation, respectively. The left and right states must be provided by the reconstruction algorithm, i.e. WENO reconstruction
for high-order schemes. The unique API shared between all solvers is used in conjunction of Fortran’s aliasing for optimize the efficiency:
using a conditional compilation directive, namely RSU, the actual solver name is aliased to Riem_Solver, resolving the actual solver name
at compile time.
It is worth mentioning that each geometry consideration, e.g. vectors projections, must be computed outside the RP solver. This
simplifies the implementation of more sophisticated algorithms, e.g. Rotated Riemann’s Problem solver, see [64–66].
As stated in Section 2.2.3, more than 10 solvers are currently implemented. A complete list is present into the official API documentation.
It is worth mentioning that all the public procedures of Lib_Riemann are pure or elemental. This avoids the possible side effects and greatly
enhances the optimization.

3.3.3. Lib_Fluidynamic
Lib_Fluidynamic contains the most important procedures of fluid dynamic interest, i.e. the high level procedures for completing the
space and time integrations. Many of these procedures are the most computational expensive thus they have been parallelized by means
of OpenMP and MPI paradigms. This library takes advantages of all the base objects above defined.
There are six public procedures. The first two, namely prim2cons and cons2prim, are aimed at variables set transformation from primitive
to conservative ones and vice versa. Listings 16 and 17 show the calling signatures of these two procedures. They operate on scalar objects of
Type_Primitive and Type_Conservative that are alternatively an input or an output. The procedure cons2prim needs, as input, also the value
of the specific heats of initial species. In many circumstances it is necessary to transform the variables set into the whole cells of a block
at once (and in parallel). For this purpose, and for avoiding to rewrite the same operations many time (in other words for improving the
modularity), two higher-level (of abstraction) procedures are provided, namely primitive2conservative and conservative2primitive whose
calling signatures are reported in listings 18 and 19, respectively. These latter procedures are optimized for efficiently transform variables
set of the whole cells of a block, being parallelized by means of OpenMP paradigm.
pure subroutine prim2cons ( prim , cons )
i m p l i c i t none
type ( Type_Primitive ) , i n t e n t ( IN ) : : prim
type ( Type_Conservative ) , i n t e n t ( INOUT ) : : cons
Listing 16: prim2cons calling signature.

pure subroutine cons2prim ( cp0 , cv0 , cons , prim )


i m p l i c i t none
r e a l ( R_P ) , i n t e n t ( IN ) : : cp0 ( : )
r e a l ( R_P ) , i n t e n t ( IN ) : : cv0 ( : )
type ( Type_Conservative ) , i n t e n t ( IN ) : : cons
type ( Type_Primitive ) , i n t e n t ( INOUT ) : : prim
Listing 17: cons2prim calling signature.
22 S. Zaghi / Computer Physics Communications ( ) –

subroutine p r i m i t i v e 2 c o n s e r v a t i v e ( block )
i m p l i c i t none
type ( Type_SBlock ) , i n t e n t ( INOUT ) : : block

Listing 18: primitive2conservative calling signature.

subroutine c o n s e r v a t i v e 2 p r i m i t i v e ( block )
i m p l i c i t none
type ( Type_SBlock ) , i n t e n t ( INOUT ) : : block

Listing 19: conservative2primitive calling signature.

The fifth public procedure, boundary_conditions, is aimed at the computation of the correct values of primitive variables of the ghost
cells accordingly with the BC imposed. It is a high level procedures and all the blocks data (of the actual process) must be passed, namely the
array block. Its calling signature is reported in listings 20. It performs, if necessary, the MPI inter-process communications for exchanging
the data of the boundaries. It is important to note that the BC are imposed in terms of primitive variables: the conservative ones, being
the variables truly integrated, remain unchanged after the execution of the boundary_conditions procedure. This algorithmic choice is
used by the Runge–Kutta stages calculation where only the primitive variables are recomputed. The input integer l indicates the grid level
currently processed and is used by the multi-grids algorithm (currently disabled).
subroutine boundary_conditions ( l , block )
i m p l i c i t none
integer ( I_P ) , i n t e n t ( IN ) : : l
type ( Type_SBlock ) , i n t e n t ( INOUT ) : : block ( 1 : )

Listing 20: boundary_conditions calling signature.

The last public procedure is solve_grl that is the highest level procedure provided by Lib_Fluidynamic. The aim of this procedure is to
perform one time step integration of the conservation equations for grid level l, i.e. it computes the space and time operators for the whole
grids of level l for one time step. It also completes some auxiliaries tasks: (1) collect the minimum local time step of each block (of each
process) for computing the global minimum (used for unsteady simulations), (2) update the standard output shell messages and (3) save
the output files. Its calling signature is reported in listing 21.
subroutine s o l v e _ g r l ( l , block )
i m p l i c i t none
integer ( I_P ) , i n t e n t ( IN ) : : l
type ( Type_SBlock ) , i n t e n t ( INOUT ) : : block ( 1 : )

Listing 21: solve_grl calling signature.

Among the private procedures of Lib_Fluidynamic a fundamental one is residuals whose calling signature is reported in listing 22. This
procedure computes the residuals, i.e. the right-hand-side, of the PDE system being approximated. In particular, it computes the space
operator of only one block at time. The residuals computed are stored into the s1 th Runge–Kutta stage, corresponding to the s1 th element
of the block%RK array. As aforementioned, the computation of the residuals, i.e. the computation of space operator, is based on the primitive
variables, therefore the user must ensure that primitive variables are updated before calling residuals, e.g. the correct values of ghost cells
have been computed accordingly the BC imposed. This is done into a loop where all Runge–Kutta stages are sequentially computed as
shown in listing 23.
subroutine r e s i d u a l s ( s1 , block )
i m p l i c i t none
integer ( I_P ) , i n t e n t ( IN ) : : s1
type ( Type_SBlock ) , i n t e n t ( INOUT ) : : block

Listing 22: residuals calling signature.

do s1 =1 , g l o b a l %rk_ord
i f ( s1 >1) then
 
Dt · rkc2 (s1, s2) · Ks2
s1−1
! summing the stages up to s1-1: Ks1 = R U n + s2=1

! updating primitive variables: P = conser v ativ e2primitiv e(Ks1 )


do b=1 , g l o b a l %Nb
c a l l rk_stages_sum ( s1=s1 , block = block ( b ) )
enddo
! imposing the boundary conditions
c a l l boundary_conditions ( l = l , block = block )
endif
! computing the s1th Runge–Kutta stage
do b=1 , g l o b a l %Nb
c a l l r e s i d u a l s ( s1=s1 , block = block ( b ) )
enddo
enddo

Listing 23: Runge–Kutta stages computing.

The computation of the residuals is local (one block at time) because each inter-process communication is performed by bound-
ary_conditions, if necessary. As a consequence, residuals is parallelized by means of only OpenMP. In particular, residuals calls the pro-
cedures fluxes_convective and fluxes_diffusive for computing the inter-cell fluxes. The memory access of these callings can be inefficient.
S. Zaghi / Computer Physics Communications ( ) – 23

Fortran adopts column-major memory organization of arrays: each non column-major access constitutes a possible cache missing. Con-
sequently, this inefficiency can happen for the computation of fluxes along ‘‘y’’ and ‘‘z’’ directions, the three dimensional arrays being
organized as array(x, y, z). One possible cure for this lack of memory-access performance is to organize three dimensional data in a more
spatial-locality conservative structure than a column-major array. The implementation of Adaptive Mesh Refinement algorithms into OFF,
see Section 6.3, involves a complete re-design of the memory structures that are also suited for preserving the data spatial-locality. These
new memory structures will also enhance the memory efficiency of fluxes computations. However, the hierarchical AMR data structure
will not substitute the structured block data, instead they will be complementary: the user will ever allowed to perform a standard sim-
ulation or an AMR one. As consequence, the API and performance of standard data structures above described will not be influenced by
AMR implementation.
For the computations of convective and diffusive fluxes two libraries have been developed. The diffusive fluxes computation is quite
simple involving standard central differentiation algorithms (that are actually under the validation phase and they are not presented in
the present paper, see Section 6.1). More complex are the computation of convective fluxes due to their hyperbolic nature. The procedure
fluxes_convective is aimed at their computation and it is contained into the module library Lib_Fluxes_Convective, see Section 3.3.4.
Finally, the procedure compute_time is the last, fundamental, private procedure contained into Lib_Fluidynamic. This procedure
computes the local (of each cell) time steps accordingly with the pseudo CFL condition, see Section 2.2.4. It is a local procedure working
with one block at time thus it is parallelized by means of only OpenMP. Besides the cell-local time steps also a local minimum is computed.
After compute_time has been applied to each block, residuals computes a global minimum for unsteady simulations by means of the loop
reported in listing 24.
do b=1 , g l o b a l %Nb
c a l l compute_time ( block = block ( b ) , Dtmin=Dtmin ( b ) )
enddo
DtminL = minval ( Dtmin )
#ifdef MPI2
! for multi-processes simulation all processes must exchange their DtminL for computing the global variables
c a l l MPI_ALLREDUCE ( DtminL , gDtmin , 1 , MPI_REAL8 , MPI_MIN , MPI_COMM_WORLD, e r r )
#else
! for single processes DtminL are already global variables
gDtmin = DtminL
#endif
i f ( g l o b a l %unsteady ) then ! for an unsteady accurate simulation each cell is updated by means of global minimum time step
! control for the last iterate
i f ( g l o b a l %Nmax<=0) then
i f ( ( g l o b a l % t +gDtmin) > g l o b a l %Tmax) then
! the global minimum time step is so high that the last iteration will go over Tmax
! it is decreased both for in order to achieve exactly Tmax
gDtmin=abs ( g l o b a l %Tmax−g l o b a l % t )
endif
endif
g l o b a l % t = g l o b a l % t + gDtmin
do b=1 , g l o b a l %Nb
block ( b)%C%Dt = gDtmin
enddo
endif

Listing 24: Global minimum time step computation.

3.3.4. Lib_Fluxes_Convective
Lib_Fluxes_Convective has only one public procedure, fluxes_convective, devoted to the computation of the convective fluxes. Its calling
signature is reported in listing 25. It operates on a one dimensional stencil whose length is [0 − gcu : Np + gcu], where gcu is the number
of ghost cells used (gc = gcu) and N = Np is the number of inner cells of the stencil, p = i, j, k being the direction actually considered.
As stated above, this one dimensional stencil computation can lead to cache missing problems. Basically, its algorithm consists in (1) per-
forming high order reconstruction of primitive variables at left and right of each interface and (2) using these reconstructed values for
computing the RP solution at each interface. The reconstruction phase is performed by means of Lib_WENO, as explained in Section 3.3.1,
whereas the RP solution is done by means of Lib_Riemann procedures, see Section 3.3.2. Nevertheless, before the calling of WENO recon-
struction procedures, a projection from three dimensional Cartesian space vectors into a local reference must be performed. In particular,
the RPs are solved in a local-face frame of reference, i.e. along the normal and tangent vectors to the (inter)faces. Moreover, for highly
compressible simulations it is better to perform WENO reconstruction in local-characteristic variables rather than primitive (or conserva-
tive) ones. Finally, in some circumstances (e.g. highly under expanded jets transient), the WENO reconstruction must coupled with some
algorithms for preserving positivity of pressure and density of reconstructed interfaces values.
subroutine f l u x e s _ c o n v e c t i v e ( gc , N, Ns , cp0 , cv0 , F , C , F l )
i m p l i c i t none
i n t e g e r ( I1P ) , i n t e n t ( IN ) : : gc
integer ( I_P ) , i n t e n t ( IN ) : : N
integer ( I_P ) , i n t e n t ( IN ) : : Ns
r e a l ( R_P ) , i n t e n t ( IN ) : : cp0 ( 1 : Ns )
r e a l ( R_P ) , i n t e n t ( IN ) : : cv0 ( 1 : Ns )
type ( Type_Face ) , i n t e n t ( IN ) : : F (0− gc : )
type ( Type_Cell ) , i n t e n t ( IN ) : : C (1− gc : )
type ( Type_Conservative ) , i n t e n t ( INOUT ) : : F l ( 0:)

Listing 25: fluxes_convective calling signature for computing convective fluxes.


24 S. Zaghi / Computer Physics Communications ( ) –

The complete algorithm of fluxes_convective for computing the convective fluxes at each interface is the following:
1. project the 3D Cartesian variables into the (inter)face frame of reference;
2. transform the primitive variables to the local-characteristic ones, if necessary;
3. perform the high order WENO reconstruction of left and right interfaces values;
4. transform the reconstructed local-characteristic variables to the primitive ones, if necessary;
5. apply the positivity preserving limiter to the reconstructed interfaces values, if necessary;
6. solve the Riemann Problem constituted by the reconstructed interfaces values.

3.4. Main codes

Three main codes have been actually developed: IBM, OFF and POG, for initializing, performing and post-processing CFD simulations,
respectively. All these codes share the same environment, e.g. the same base objects and libraries.
IBM code is aimed to facilitate the setup of the simulations. Starting from the geometry description inputs and initial and boundary
conditions, IBM creates the input files for OFF. As said before it can deal with numerical grids made by means of Ansys ICEM CFD and
Pointwise Gridgen as well as a simple ascii descriptions files. IBM needs one command line argument being the name of the file containing
its main options. A template of this file is contained into the directory inputs-template in the root of the project and the details of its syntax
can be found into the OFF documentation. The output of IBM consists of three kinds of files:
1. files with .itc extension: contain the initial condition in terms of primitive variables and have the identical syntax of the fluid dynamic
solution file (in order to make simple the re-start of simulations);
2. files with .bco extension: contain the boundary conditions of each (inter)face;
3. files with .geo extension: contain the numerical grids, i.e. the nodes coordinates vectors.
All these files are designed to contain more than one block. Nevertheless, in order to simplify the multi-processes load balancing, they
actually contain just only one block. As a consequence, IBM creates as many files as the number of blocks (for each one of the above three
kinds). They are named sequentially with the suffix .b###. Presently, the load balancing algorithm is being strongly revisited (due to the
introduction of AMR algorithms, see Section 6.3) thus the one-block/one-file scheme will change in the next version. IBM does not perform
heavy computations and so it is a serial code. The only issue is the memory consumption. In order to save memory IBM deals with one
block at time rather than with the whole grids.
Even if the OFF code is the core of the project, it is, indeed, very simple. As IBM, it needs only one command line arguments being
the name of the file containing its options. Also for this file there is a template into the directory inputs-template and a comprehensive
documentation. OFF essentially acts as a driver calling the fluid dynamics libraries to complete the simulations. It has only auxiliaries tasks:
(1) create the output directories structure, (2) make a copy of input files (for log-checking purposes), (3) allocate main memory and (4)
perform the temporal loop checking if some end conditions are reached. It also initializes and finalizes the MPI environment if necessary.
The output of OFF are essential the fluid dynamics solution files that typically have the extension.sol and have the identical format of initial
conditions files. When performing simulations with AMR or moving grids (their implementations are ongoing activities, see Section 6) the
output of OFF consists also in updated mesh files.
Finally, POG is designed to speedup the analysis of the numerical results obtained. It is aimed at post-processing the OFF outputs in
order to translate its results into some visualizing formats. Despite of IBM and OFF, POG does not use a main input file of options. The
post-processing is completely driven by few command line switches and arguments (a complete list is present into OFF documentation).
POG can post-process only mesh or both mesh and fluid dynamic solution. As stated into the introduction, it can produce Tecplot Inc.
and VTK formatted output, both ascii and binary files. The post-processed fluid dynamics fields (primitive variables) can be located at
cell center (conserving the finite volume approach) or can be interpolated at mesh nodes. Moreover, the user can save only the inner
cells or also the boundary ones (the ghost cells). Some of the procedures used by POG (mainly contained into Lib_Postprocessing module)
have been parallelized by means of OpenMP paradigm to speedup the post-processing. Similar to IBM, POG adopts a block-wise strategy,
post-processing only one block at time in order to limit the memory consumption.
The three main codes share the same source directory and can be compiled using the same unique makefile provided. In order to avoid
problems during the linking phase of the compilation a filter rule into the makefile is used to remove the already compiled objects of main
codes other than the one currently linking, e.g. for linking OFF the filter is @rm -f $(filter-out $(DOBJ)off.o,$(EXESOBJ)). More details of the
provided makefile can be found into the OFF documentation.

4. Extending OFF : example of MHD implementation

One of the OFF design specifications is the easiness of extensions implementation. To demonstrate how to improve and extent OFF, a
possible approach for implementing the Magnetohydrodynamics (MHD) equations is here described by means of pseudo code snippets.
Magnetohydrodynamics studies the dynamics of electrically conducting fluids such as plasmas, electrolytes, liquid metals, etc. It is of
particular relevance in astrophysics. For the sake of simplicity, let us consider the ideal conservative form of the MHD equations, see [67]:
∂ρ
+ ∇ · (ρ⃗v ) = 0
∂t
∂ (ρ⃗v )
 
1
+ ∇ · ρ⃗v v⃗ + pI − ⃗B⃗ = 0
B
∂t µ0
(22)
∂ (ρ E )
  
1 
+ ∇ · (ρ E + p) v⃗ − v⃗ · B⃗ B⃗ = 0
∂t µ0
∂B
⃗  
+ ∇ · B⃗v⃗ − v⃗ B⃗ = 0
∂t
S. Zaghi / Computer Physics Communications ( ) – 25

where ρ is the density, v


⃗ is the velocity vector, p is the pressure, I is the identity tensor, µ0 is the permeability, B⃗ is the magnetic
induction vector and E is the total specific internal energy. System (22) is quite similar to the Euler’s one, the conservative variables
and the corresponding fluxes being extended:
ρ
 

→ ρ⃗v 
U MHD = ρ E  (23)

B
ρ⃗v
 
1
ρ⃗v v⃗ + pI − ⃗B⃗
B
 
µ0
 
F MHD = . (24)
  
(ρ E + p) v⃗ − 1
v⃗ · B⃗ B⃗
 µ0 
⃗v⃗ − v⃗ B⃗
B
As a consequence, OFF can be easily extended to solve system (22). The first step is to modify Type_Conservative. In particular, the type
bound procedure must be defined as ‘‘overridable’’ (i.e. the ‘‘non_overridable’’ specification must be eliminated):
type , p u b l i c : : Type_Conservative
r e a l ( R8P ) , a l l o c a t a b l e : : r s ( : )
type ( Type_Vector ) : : rv
r e a l ( R8P ) : : re = 0 . _R8P
contains
procedure : : i n i t
procedure : : f r e e
procedure : : cons2array
procedure : : array2cons
procedure : : p p r i n t
endtype Type_Conservative
Listing 26: Type_Conservative modification.

Secondly, the Type_Conservative must be extended introducing the magnetic induction vector (a new module Data_Type_Conservative
MHD should be created):
type , extends ( Type_Conservative ) , p u b l i c : : Type_ConservativeMHD
type ( Type_Vector ) : : B
contains
procedure : : i n i t
procedure : : f r e e
procedure : : cons2array
procedure : : array2cons
procedure : : p p r i n t
endtype Type_ConservativeMHD
Listing 27: Type_ConservativeMHD definition.

Now, the new Type_ConservativeMHD extends Type_Conservative adding the B ⃗ member. All the original Type_Conservative methods
must be extended with the proper logics for handling the new B ⃗ member. The same extension must be done for the Type_Primitive (a new
module Data_Type_PrimitiveMHD should be created).
Once the base objects have been created the API of OFF must be accordingly modified. It must be distinguished between high and low
levels of abstraction. For the high-level modules, that use directly the MHD objects, the API modification should be quite simple by means of
the exploitation of the new OOP Fortran features: the Euler’s and MHD objects can be handled by means of a polymorphic definition of the
conservative variables and fluxes. In particular, through the substitution of the definition-statements ‘‘type(Type_Conservative/Primitive)’’
with ‘‘class(Type_Conservative/Primitive)’’ and by means of the addition of the proper logics (e.g. ‘‘select type()’’ statement). As an example
of such a modification, let us consider the Lib_Fluxes_Convective. A possible pseudo code of the calling signature of the subroutine
fluxes_convective, extended to the MHD system, is reported in the listing 28.
subroutine f l u x e s _ c o n v e c t i v e ( gc , N, Ns , cp0 , cv0 , F , C , F l )
i m p l i c i t none
i n t e g e r ( I1P ) , i n t e n t ( IN ) : : gc
integer ( I_P ) , i n t e n t ( IN ) : : N
integer ( I_P ) , i n t e n t ( IN ) : : Ns
r e a l ( R_P ) , i n t e n t ( IN ) : : cp0 ( 1 : Ns )
r e a l ( R_P ) , i n t e n t ( IN ) : : cv0 ( 1 : Ns )
type ( Type_Face ) , i n t e n t ( IN ) : : F (0− gc : )
type ( Type_Cell ) , i n t e n t ( IN ) : : C (1− gc : )
c l a s s ( Type_Conservative ) , i n t e n t ( INOUT ) : : F l ( 0:)
Listing 28: fluxes_convective calling signature for computing convective fluxes; extended MHD version.

The array of output fluxes, Fl , is now defined as class (Type_Conservative), thus it has a polymorphic type: it can be either type
(Type_Conservative) or type (Type_ConservativeMHD). This is an elegant and effective polymorphic API ensuring easy maintenance. After
the statements of variables definition, the algorithm for fluxes computations can be easily branched for taking into account the new MHD
fluxes. The listing 29 reports a pseudo code of a branching logics for extending the computation of the convective fluxes. In particular, the
statement select type is evaluated at compile time and it acts as a selecting (branching) test: if Fl is type (Type_Conservative) the Euler’s
26 S. Zaghi / Computer Physics Communications ( ) –

algorithm is executed, whereas if Fl is type (Type_ConservativeMHD) the MHD algorithm is executed. The details on how to implement
the MHD algorithm for the computation of the convective fluxes is out of the scope of the present paper. However, one important remark
is necessary: during the implementation of the computation of the MHD fluxes, particular attention must be applied to the projection on
local-characteristic variables (that is mandatory for enabling the high-order reconstruction). As a matter of fact, Euler’s and MHD systems
have different eigenvalues and, consequently, different local-characteristic pseudo-invariants. More in depth, the function LR and RP, that
are locally defined into the subroutine fluxes_convective and that have not been discussed in the present paper, must be carefully extended.
!...
! variables definition-statements
s e l e c t type ( F l )
type i s ( Type_Conservative )
! execute the Euler’s algorithm
type i s ( Type_ConservativeMHD )
! execute the MHD algorithm
endselect
return
!...
Listing 29: Branching of the computation of the convective fluxes.

Despite of the high-level modules, the low-level ones, that operate on standard Fortran types, have no need to be modified. Only the
calling statements of the low-level procedures must be properly modified. As an example, let us consider the WENO reconstruction for
the high-order computation of the convective fluxes. The WENO reconstruction is used into a Recursive Order Reduction loop (ROR) as
reported in the pseudo code of listing 30. This listing is a snippet of the subroutine preconstruct_n, a subroutine locally defined into the
subroutine fluxes_convective and that has not been discussed in the present paper.
!...
ROR_check : do or=gc , 2 _I1P , −1 _I1P
! computing WENO reconstruction
do v =1 ,Ns+2
c a l l weno( S=or , V=CA( v ,1:2 ,1 − or:−1+or ) , VR=CR ( v , 1 : 2 ) )
enddo
!...
enddo ROR_check
!...
Listing 30: ROR WENO loop.

In listing 30 weno is the subroutine for computing the WENO reconstruction, S = or is the order of the reconstruction (currently used in
the ROR loop), V = CA is the local-characteristic variable to be reconstructed and VR = CR is the high-order reconstructed values (at the
left and right sides of the interface). All the IO variables of the weno subroutine are standard Fortran types (namely, one integer and two
reals), thus there is no need to modify this subroutine for handling MHD system, whereas its actual arguments (i.e. CA and CR) must be
extended. In particular, the number of local-characteristic variables reconstructed, NS + 2, and their meaning should be different for the
MHD system with respect to the Euler’s one. However, such a modification belongs to the high-level procedure fluxes_convective and, to
be more precise, it belongs to the implementation of the MHD algorithm generically mentioned into the listing 29.
One exception is the low-level module Lib_Riemann. This module contains the subroutines for computing the solution of the Riemann’s
problem for the Euler’s system. Therefore, these subroutines are strongly related to the system being integrated, or, equivalently, to the
conservative quantities, i.e. the OOP conservative objects. However, the module Lib_Riemann has not been yet modified to exploit the
new OOP features of modern Fortran: its subroutines operate on low level, standard Fortran types, whereas, due to its numerical meaning,
it should belong to the high level family. Consequently, the extension of Lib_Riemann must be carefully accomplished. At present, the
polymorphic extension suggested for the other high level procedures is not viable while the modification of the passed arguments in the
calling statements of low level procedures is not enough. Because MHD system has different eigenvalues with respect to Euler’s one (thus
it has also different waves patterns), a new family of Riemann’s solvers must be implemented. A possible calling signature of such a new
family is suggested in the listing 31 where b1 and b4 are the magnetic induction of the left and right cells and FB is the corresponding new
flux. It is relevant that the standard family of Riemann’s solvers and the new one have different calling signatures: a new pre-processing
option, e.g. RSU=RSMHD, should be introduced for selecting the new MHD solvers family. Many Riemann’s solvers have been developed
for the MHD waves patterns. Among the others, the approximate solvers of Roe and the HLL one have been extended to MHD system. The
implementation details of such solvers are out of the scope of the present paper, for more details see [68–70].
elemental subroutine Riem_SolverMHD ( p1 , r1 , u1 , b1 , g1 , p4 , r4 , u4 , b4 , g4 , F_r , F_u , F_E , F_B )
i m p l i c i t none
r e a l ( R_P ) , i n t e n t ( IN ) : : p1
r e a l ( R_P ) , i n t e n t ( IN ) : : r1
r e a l ( R_P ) , i n t e n t ( IN ) : : u1
r e a l ( R_P ) , i n t e n t ( IN ) : : b1
r e a l ( R_P ) , i n t e n t ( IN ) : : g1
r e a l ( R_P ) , i n t e n t ( IN ) : : p4
r e a l ( R_P ) , i n t e n t ( IN ) : : r4
r e a l ( R_P ) , i n t e n t ( IN ) : : u4
r e a l ( R_P ) , i n t e n t ( IN ) : : b4
r e a l ( R_P ) , i n t e n t ( IN ) : : g4
r e a l ( R_P ) , i n t e n t (OUT ) : : F_r
r e a l ( R_P ) , i n t e n t (OUT ) : : F_u
r e a l ( R_P ) , i n t e n t (OUT ) : : F_E
r e a l ( R_P ) , i n t e n t (OUT ) : : F_B
Listing 31: MHD Riemann’s Problem solver calling signature.
S. Zaghi / Computer Physics Communications ( ) – 27

(a) 1st Order. (b) 3rd Order.

(c) 5th Order. (d) 7th Order.

Fig. 5. Sod’s Problem: Comparison of Exact, HLLC and LLF solvers with Exact solution (t = 0.2).

5. Applications

In order to prove OFF capabilities and its reliability we present same illustrative applications. In particular, we present one and
two dimensional simulations of purely theoretical problems whereas a real application is reported as an example of three dimensional
simulation. In the last subsection a partial analysis of the OFF scalability is reported.

5.1. One dimensional simulations

Two Riemann’s Problems have been simulated as an example of one dimensional applications.
The first Riemann’s Problem considered is the Sod’s one, see [71]. The space domain considered is [0, 1] and it is discretized with a
uniform grid of N = 100 finite volumes. The left and right boundary conditions are set as non-reflective. The initial discontinuity is placed
at x = 0.5. The initial conditions (in terms of primitive variables) at left and right with respect to the discontinuity are:

ρ 1.000 ρ 0.125
       
PL = u = 0.000 PR = u = 0.000 (25)
p 1.000 p 0.100
c
where the specific heats ratio is γ = cp = 1.4. After the rupture of the initial discontinuity a shock moving to the right is generated. Also
v
rarefaction fan is originated at x = 0.5 moving to left. A contact discontinuity, moving to the right, separates the left and right states. The
1st, 3rd, 5th and 7th order solutions, computed by means the Exact (iterative), the HLLC and Local-Lax–Friedrichs approximated solvers,
are compared with the exact solution in Fig. 5. The (non-dimensional) time unit considered is t = 0.2. For all accuracy orders the solution
of the Exact solver and the HLLC one are very close, while the solution of the Local-Lax–Friedrichs solver is more dissipative (especially at
low order). Nevertheless, at the high order (O ≥ 5th) the LLF solution is comparable with the Exact and HLLC ones.
This is more clear in Fig. 6a where the LLF solutions at all four orders are compared with the exact solution; as the order of accuracy
increases the too high dissipation of LLF is mitigated by means of the high information of the enlarged reconstruction stencil. The same
behavior subsists for HLLC solutions (Fig. 6b), but the differences between low and high order are less than the LLF solutions. Thus this test
highlights an interesting feature of LLF solver: coupled with high order scheme (O ≥ 5th) LLF solver can give accurate results comparable
with solutions of more expensive solvers like Exact and HLLC, but with lower computational cost. However, this is not true in general:
for more discriminating tests the dissipation of LLF solver introduces strong errors that discourage using it especially when the flow
field has strong discontinuities (i.e. in the two interacting blast waves test that follows). This is more important in the 1D case while for
multidimensional one the lack of LLF solver is less evident.
28 S. Zaghi / Computer Physics Communications ( ) –

(a) LLF solutions with different accuracy orders. (b) HLLC solutions with different accuracy orders.

Fig. 6. Sod’s Problem: solutions with different accuracy orders (t = 0.2).

(a) Exact solver solutions with different accuracy orders. (b) HLLC solutions with different accuracy orders.

(c) LLF solutions with different accuracy orders. (d) Comparison of different Riemann Solvers 7th-order
solutions.

Fig. 7. Two interacting blast waves (t = 0.038).

The second Riemann’s Problem considered is the interaction of two blast waves, firstly presented by Woodward and Colella [72]. The
space domain is always [0, 1], but the initial conditions are different from the standard shock-tube problem of Sod:
PL 0.0 < x < 0.1

P (x, t = 0) = P0 (x) = PM 0.1 < x < 0.9 (26)
PR 0.9 < x < 1.0
ρ 1.00 ρ 1.00 ρ 1.00
           
PL = u = 0.00 PM = u = 0.00 PR = u = 0.00 (27)
p 1000.00 p 0.01 p 100.00
c
where the specific heats ratio is γ = cp = 1.4. For this test the space domain has been discretized with uniform grid of N = 200
v
finite volumes and reflective boundary conditions have been used at both left and right sides. The (non-dimensional) time unit considered
is t = 0.038. This test is more discriminating than the Sod’s one. Two strong blast waves develop and collide, producing a new contact
discontinuity. Because the interaction takes place in a small volume, this problem is difficult to solve on uniform Eulerian grid. Fig. 7 shows
the computed solutions with Exact, HLLC and LLF solvers for all accuracy orders. The generated contact discontinuity at about x = 0.725
S. Zaghi / Computer Physics Communications ( ) – 29

Fig. 8. Two dimensional domain of 2D Riemann Problems simulations.

Table 1
Initial conditions of considered configurations; pressure (p), density (ρ ) and velocity Cartesian components (u, v ) are reported in non dimensional form for each sub-quadrant
(sqi i = 1, 2, 3, 4).
p ρ u v p ρ u v p ρ u v
sq1 1.500 1.5000 0.000 −0.000 1.100 1.1000 0.0000 0.0000 1.0 1.0 −0.75 −0.5
sq2 0.300 0.5323 1.206 −0.000 0.350 0.5065 0.8939 0.0000 1.0 2.0 −0.75 0.5
sq3 0.029 0.1380 1.206 −1.206 1.100 1.1000 0.8939 0.8939 1.0 1.0 0.75 0.5
sq4 0.300 0.5323 0.000 −1.206 0.350 0.5065 0.0000 0.8939 1.0 3.0 0.75 −0.5
Conf. 3 Conf. 4 Conf. 5
sq1 1.0 1.0 0.75 −0.5 0.4 0.5313 0.0000 0.0000 1.0 1.0000 0.0 −0.4000
sq2 1.0 2.0 0.75 0.5 1.0 1.0000 0.7276 0.0000 1.0 2.0000 0.0 −0.3000
sq3 1.0 1.0 −0.75 0.5 1.0 0.8000 0.0000 0.0000 0.4 1.0625 0.0 0.2145
sq4 1.0 3.0 −0.75 −0.5 1.0 1.0000 0.0000 0.7276 0.4 0.5197 0.0 −1.1259
Conf. 6 Conf. 12 Conf. 17

is not well resolved for order up to 3rd. Higher orders (O ≥ 5th) are able to resolve the contact discontinuity, but the LLF resolution is very
poor if compared with the Exact and HLLC ones (Fig. 7d). Besides the LLF solver dissipates too much the strength of the blast waves. This
result confirms that even when coupled with very high order schemes LLF solver could introduce too much dissipation. This is of particular
relevance in one dimensional case. Nevertheless, the excessive dissipation of LLF solver is less crucial in multidimensional case where the
numerical diffusion of dimensional split schemes is relevant.
It is worth noting that within OFF project a set of bash scripts for reproducing the above simulations are also provided. The scripts are
placed into the directory examples into the root of the project. For more details see OFF documentation.

5.2. Two dimensional simulations

Two sets of two dimensional examples are reported: the first set is based on completely structured geometry, whereas the second
example demonstrates how to simulate ‘‘unstructured’’ macro geometry by means of structured multi-blocks grids.

5.2.1. Two dimensional Riemann’s problems


Two dimensional Riemann’s problems are presented as an example of two dimensional simulations of structured geometry. The initial
conditions of the Riemann Problems are the same as the ones reported by Kurganov and Tadmor [60]. A 2D quadrant [0, 1] × [0, 1] is
subdivided into 4 sub-quadrants as shown in Fig. 8. The initial conditions are, in general, different in each sub-quadrant. In particular, the
initial conditions can be chosen in order to reproduce particular flows. The initial conditions are imposed in terms of primitive variables
as reported in Eq. (28).

P̄1 if x, y ∈ [0.5, 1] × [0.5, 1]



 
p

ρ  if x, y ∈ [0, 0.5] × [0.5, 1]

 P̄ 2
P̄ =   = (28)
u  P̄3 if x, y ∈ [0, 0.5] × [0, 0.5]
v


P̄4 if x, y ∈ [0.5, 1] × [0, 0.5] .

Here we consider only 6 configurations of the whole 19 presented by Kurganov and Tadmor. In particular, we consider the configurations
3, 4, 5, 6, 12 and 17, following the numeration of Kurganov and Tadmor. These configurations are reported in Table 1.
The domain has been discretized by a uniform grids composed by 4 blocks, one for each sub-quadrant. Each block is constituted by a
200 × 200 two-dimensional cells therefore the whole domain is covered by 400 × 400 finite volumes, providing a spatial resolution of
1
400
. For all the simulations a 3rd order WENO reconstruction has been used coupled with a three stages 3rd order SSP-RK time integration.
The CFL coefficient has been set to 0.475 for all simulations.
30 S. Zaghi / Computer Physics Communications ( ) –

(a) Conf. 3, t = 0.3. (b) Conf. 4, t = 0.25. (c) Conf. 5, t = 0.23.

(d) Conf. 6, t = 0.3. (e) Conf. 12, t = 0.25. (f) Conf. 17, t = 0.3.

Fig. 9. Density fields of two dimensional Riemann Problems simulations.

Fig. 9 shows the density field of each solution of the considered configurations. The Fig. 9(a) is an enlarged view of the domain 0.15×0.55
whereas the other figures report the whole domain. The final integration time is different for each simulation and is reported into the
figures captions.
The results of Fig. 9 are comparable (in accuracy) with respect to the 3rd order the results of Kurganov and Tadmor [60]. Both entropy
and acoustic waves are well resolved, i.e. shock and contact discontinuities show sharp profile without relevant spurious oscillations. This
proves the accuracy and robustness of WENO reconstruction. Moreover, the results exhibit a great level of details with small structures
(vortices, wave interactions, etc. . . ) well resolved. An analysis of asymptotic convergence (verification of the resolution independence) has
not been done because it is out of the scope of the present paper.
Within OFF project a set of bash scripts for reproducing also the above simulations are also provided, similarly to the one dimensional
Riemann’s Problems of the previous subsection.

5.2.2. Shock diffraction


Shock wave traveling along a wall with a convex corner undergoes diffraction [73]. This test is considered a representative two
dimensional benchmark for flows involving shock waves: as the flow pass the corner a Prandtl–Meyer rarefaction fan (centered at the
corner) is originated. The incident plane shock interacts with the rarefaction fan starting to bend. As a matter of fact, the rarefaction fan
decelerate the shock front bounded by the fan itself. From analytical and experimental results it is expected that for strong incident Mach
number and 90◦ corner the diffraction develops a kinked shock front. This kind of flows interactions is similar to those of the under-
expanded jet transient; at the very early start of the transient the bow shock crossing the orifice endures a diffraction as mentioned above.
In the absence of viscosity and heat flux the diffraction is a self-similar flow.
The space domain x y ∈ [0, 13] × [0, 11] has been discretized with a uniform grid with a resolution of 32 1
. The corner is located at
γp

x = 1, y = 6. The upstream initial conditions are ρ = 1.4, p = 1.0 and a relative Mach number of Mr = wa = 5.0 being a = ρ
= 1.0
the speed of sound and w = 5.0 the velocity
 of the shock; the downstream conditions are ρ = 7.0000, p = 29.0000 and a relative Mach
γp
number of Mr = w−v
a
= 0.4152 being a = ρ
= 2.4083 the speed of sound. The non-dimensional time unit considered is t = 2.35.
Fig. 10 reports the computed density, absolute Mach Number and pseudo Schlieren flow fields. The 5th formal order scheme has been
used. Considering density and Mach number field, Fig. 10a and b, the results show no positivity violation. The main flow structures are
captured with good agreement with respect to the open literature. This is more clear from Fig. 10c showing the pseudo Schlieren flow field.
Compare this figure with Figs. 26d and 27c on pages 293 and 294 (respectively) of Bazhenova et al. [73]. As predicted theoretically and
S. Zaghi / Computer Physics Communications ( ) – 31

(a) Density flow field. (b) Mach number flow field. (c) Pseudo Schlieren flow field.

Fig. 10. Shock wave diffraction down a step: density, Mach number and pseudo Schlieren flow fields; 5th order solution; sonic line is the red one; flow structures: (a)
slipstream, (b) stagnation wave, (c) separation line, (d) kinked shock front and triple point, (e) rarefaction fan head. (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)

Fig. 11. SRM main components.

observed experimentally the diffracted shock has a kinked triple point (region d) in Fig. 10(c); the flow is separated showing the separation
line (c); the slipstream (a) and the stagnation wave (b) are correctly captured. It is interesting to note that the separation region and the
kinked triple point are similar to those of the under-expanded jet transient above studied.

5.3. Three dimensional simulations

Currently there is no simple 3D simulation into examples directory of OFF project. Here we present some three dimensional results
concerning with a real complex application. Indeed, these results have been computed in 2007/2008 when OFF was at its earliest
development. In 2008 the code was referred as MUG (Multidimensional Unsteady Gas dynamics code) and its API was slightly different
with respect to the actual OFF. However, the library Lib_Fluidynamic is essentially the same of MUG code, meaning that OFF is an evolution
of MUG. OFF is aimed at more general purposes than MUG ones.
The numerical simulations presented in this subsection concern with the study of the ignition transient phase of a Solid Rocket Motor
(SRM). These results have been already published in [20]. Here we report only some illustrative results proving OFF reliability.
A SRM is a chemical thermal rocket where the thermal energy due to the exothermic reactions of the propellants combustion is
transformed into kinetic energy by the propulsive nozzle. A SRM has the combustible and combustive agent mixed together in a single
plastic grain, see Fig. 11. The geometry of the combustion chamber is defined by the grain one. A seal diaphragm is placed into the nozzle
throat in order to isolate the chamber (filled by an inert pressurizing gas, commonly nitrogen) from the external ambient. The propellant
grain cannot burn itself: it has to be ignited by an external source of energy, that are the hot (supersonic) jets produced by the igniter.
This originates a transient phase during which the hot jets of the igniter flows into the combustions chamber pressurizing the chamber
itself. Although ignition transient is very a short phase (of the order of 5 · 10−2 s) compared to the overall combustion time of a SRM, the
occurrence of several unsteady phenomena contributes to makes the ignition transient a very critical operative phase.
The motor simulated is Zefiro 16 that is a prototype demonstrator of the Zefiro SRM family developed for the new European launcher
Vega.24 Due to its very innovative features the qualification of the Zefiro 16 motor has been very interesting: it shows a peculiar ignition

24 The Italian industry AVIO S.p.A. is the leading designer/builder/developer of the Vega launcher for ESA, the European Space Agency. AVIO S.p.A. has supported this
research providing the necessary SRM designs and the experimental data.
32 S. Zaghi / Computer Physics Communications ( ) –

(a) Combustion chamber elements. (b) Selected longitudinal slices.

Fig. 12. Overview of Zefiro 16 numerical grids. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

(a) t = 0.0094 s, θ = 40◦ . (b) t = 0.0094 s, θ = 60◦ .

(c) t = 0.0135 s, θ = 40◦ . (d) t = 0.0135 s, θ = 60◦ .

(e) t = 0.0209 s, θ = 40◦ . (f) t = 0.0209 s, θ = 60◦ .

Fig. 13. Time sequence of specific heats ratio field into the longitudinal slices at θ = 40◦ , 60◦ .

transient behavior characterized by anomalous and potentially dangerous pressure oscillations. The reported numerical simulations has
proven that the stronger is the compressibility discontinuity between the igniter jets and the pressurizing gas, the more promoted is the
pressure oscillations generation, for more details see [20].
Fig. 12a shows an overview of the mesh used for the simulations of Zefiro 16 ignition transient. The numerical grids is composed by
structured blocks in cylindrical coordinates. Due to the combustion chamber symmetry only 120◦ of the whole cross section has been
discretized, using periodic boundary conditions at θ = 0◦ , 120◦ . In order to facilitate the analysis of the flow field, two longitudinal slices
have been selected: the first is taken at θ = 40◦ corresponding to the first propellant grain root (red slice in Fig. 12b), the second is taken
θ = 60◦ corresponding to the center of igniter nozzle into the middle of the propellant grain star tip (green slice in Fig. 12b).
Fig. 13 reports the time sequence of the specific heats ratio into the selected longitudinal slices. The visualization of the specific heats
ratio is useful for recognizing the igniter jet interface, i.e. the contact discontinuity (material wave) that separates the hot igniter gas from
the cold pressurizing gas (nitrogen). This surface has unsteady, unstable behavior.
The Fig. 14 reports a time sequence of the 2D stream traces into the impingement region over the longitudinal slice at θ = 60◦ ; in
order to identify the igniter jet the stream traces have been plotted over the specific heats ratio field highlighting the generation of 2
recirculating bubbles.
Fig. 15 shows the three dimensional behavior of the igniter jet into the impingement region. This figure reports the three dimensional
stream traces obtained from a source line placed at the right of the igniter jet core. The jet core can be identified by the green surface: this
is a specific heats ratio isosurface taken at 1.22, value that well represents the boundary of the jet front discontinuity.
As aforementioned, the presented numerical results are part of a more complex study, see [74,75,21]. This study has demonstrated
that the use of a pressurizing gas having higher compressibility (high specific heats ratio) than nitrogen (e.g. helium) limits the generation
of pressure oscillations during the ignition transient phase of SRM operative life. This also proves that even if OFF is a young project
(and many important features are being validated and/or implemented) it is already an effective and reliable numerical tool also for real
complex application. It is worth to note that the first Vega lift off occurred on 13 February 2012 and conducted a flawless qualification
flight from Europe’s Spaceport in French Guiana.
S. Zaghi / Computer Physics Communications ( ) – 33

(a) t = 0.0057 s. (b) t = 0.0094 s.

(c) t = 0.0135 s. (d) t = 0.0179 s.

Fig. 14. Time sequence of stream traces close to the impingement region into the longitudinal slice at ϑ = 60◦ .

(a) Longitudinal-side orthogonal view. (b) Back-side prospective view.

Fig. 15. Three dimensional behavior of impingement region at time t = 0.0127 s. (For interpretation of the references to colour in this figure legend, the reader is referred
to the web version of this article.)

5.4. Assessment of OFF parallel scalability

For the (partial) assessment of OFF performance let us consider the fifth test of Kurganov and Tadmor [60] reported above. Four different
grid refinement levels have been considered for the whole [0, 1]×[0, 1] two dimensional domain ranging from 100×100 = 10000 (grid-1)
to 800 × 800 = 640,000 (grid-4) finite volumes. Each grid level is four times finer than the previous one.
Two different architectures have been used: (1) shared memory, multi-core workstation and (2) distributed memory cluster built up of
shared memory multi-core nodes. The shared memory workstation has two Esa core (12 cores) Intel(R) Xeon(R) X5650 at 2.67 GHz with
34 S. Zaghi / Computer Physics Communications ( ) –

(a) Speedup for grid-1. (b) Speedup for grid-2.

(c) Speedup for grid-3. (d) Speedup for grid-4.

Fig. 16. OFF scalability for the fifth test of [60] for grid levels 1, 2, 3 and 4; serial (SER), pure OpenMP (OMP), pure MPI (MPI) and hybrid
OpenMP/MPI performances.

24 GB of DDR3 RAM at 1333 MHz, whereas the cluster nodes have two Quad core (8 cores) Intel(R) Xeon(R) X5462 at 2.8 GHz with 8 GB
of DDR3 RAM at 800 MHz.
The numerical grid of Section 5.2.1 has been modified: each original block has been split into 4 new ones obtaining a 16 blocks grid.
This allows using up to 16 MPI processes increasing the tests matrix.
The assessment of OFF scalability has been performed comparing the performance of the serial-compiled code with respect to the
CPU
parallel versions ones. In particular, defined CPUSER the CPU time used for the serial runs, the speedup is defined as CPU . As a consequence,
SER
the ideal speedup is a linear function of the number of cores (processors, indicated as #CORE hereafter) used, ranging from 1 (serial case) to
1
#CORE
(best case). In both the two architectures, three different parallel compilations have been considered: (1) pure OpenMP, (2) pure MPI
and (3) hybrid OpenMP/MPI. All serial and parallel compilations have been performed using Intel Fortran compiler v. 13.1.1 and OpenMPI
v. 1.6.4 (compiled with Intel Parallel suite, v. 13.1.1). Moreover, the -O3 (entailing vectorization) and -ipo optimization flags have been
used.

5.4.1. Shared memory multi-core workstation


Table 2(a) summarizes the tests matrix performed on the shared memory architecture and the computed speedup. The same results
are plotted in Fig. 16. For the pure OpenMP runs all 11 possible configurations have been tested with the number of OpenMP threads used
ranging in [2, 12]. For the pure MPI and hybrid runs also some not balanced runs have performed. In particular, the tests with 3, 6 and 12
MPI processes are not balanced (having 16 blocks, the processes have different loads in these cases).
OFF shows a good scalability on the shared memory architecture available. The pure MPI version exhibits a quasi-linear speedup for the
balanced configurations, whereas OpenMP performance is lower. After a deep profiling (that is not reported being outer of the scope of the
present paper) some OpenMP issues have arisen: the current OpenMP (3.1) support for advanced Fortran 2003 features still lags behind
and the most complex code sections had to be manually refactored to be parallelized.25 This clearly limits the performances which can be
obtained especially when comparing to the pure MPI version. It is worth noting that, due to the saturation of the memory bandwidth the
intra-node scaling is strongly limited up to 6 cores. As a consequence, all the benchmarks using more than 6 cores show a non exciting
speedup. As expected after the code profiling, the hybrid version scales better than the pure OpenMP.

5.4.2. Distributed memory cluster


The available distributed memory cluster has a total of 20 nodes, but only 8 nodes (for a total of 64 cores) have been available for
these benchmarks. The numerical grid used has 16 blocks thus a maximum of 16 MPI processes can be executed in parallel. Due to the
nodes RAM limitation (8 GB) the serial run with the finest grid (grid-4) uses swap memory, thus invalidate the reference performance. As a
consequence, for the cluster benchmarks the finest grid used is the third. Table 2(b) summarizes the tests matrix performed on the cluster
architecture and the computed speedup. The same results are plotted in Fig. 17 only for the grid levels 1 and 3. OFF shows good scalability
also on the distributed memory architecture. The inter-nodes performance are satisfactory as well as the intra-node scalability previously
assessed on the shared memory workstation. The pure MPI and hybrid OpenMP/MPI codes are more efficient than the pure OpenMP
version for the same overheads aforementioned. As expected, comparing the tests with different number of nodes and the number of

25 Some compute-intensive procedures call a back-end procedure (at lower level of abstraction) specifically designed for OpenMP, thus introducing some overheads with
respect to the non OpenMP callings.
S. Zaghi / Computer Physics Communications ( ) – 35

Table 2
Parallel performance.
(a) Shared memory workstation.
[Symbols] Code Version #MPI #OMP #CORE SpeedUp grids-1,2,3,4 [%]
Serial / / 1 100.0, 100.0, 100.0, 100.0
/ 2 2 58.9, 63.0, 58.6, 29.0
/ 3 3 48.4, 49.2, 47.8, 29.3
/ 4 4 42.8, 41.7, 40.4, 29.2
/ 5 5 36.4, 36.8, 37.0, 60.9
/ 6 6 37.0, 34.7, 34.0, 46.7
pure OpenMP / 7 7 35.2, 33.1, 32.3, 40.1
/ 8 8 32.8, 31.8, 30.8, 37.1
/ 9 9 31.6, 30.9, 29.9, 34.5
/ 10 10 33.2, 29.5, 29.6, 32.4
/ 11 11 30.6, 30.5, 28.5, 30.8
/ 12 12 31.1, 29.7, 30.1, 29.9
2 / 2 49.2, 50.9, 50.3, 50.2
3 / 3 37.8, 37.1, 37.7, 37.9
4 / 4 26.0, 26.3, 28.0, 26.5
pure MPI 6 / 6 21.1, 21.8, 21.6, 20.9
8 / 8 15.6, 15.9, 15.7, 15.5
12 / 12 15.9, 15.8, 17.0, 16.4
2 2 4 32.5, 29.9, 32.0, 32.2
2 3 6 26.5, 26.3, 24.8, 25.8
2 4 8 23.3, 22.5, 21.3, 21.5
2 5 10 20.9, 19.7, 19.9, 20.3
2 6 12 20.3, 20.7, 18.2, 19.2
hybrid OpenMP/MPI 3 2 6 25.4, 24.5, 23.7, 24.9
3 3 9 20.5, 19.4, 19.1, 19.0
3 4 12 17.4, 17.7, 17.0, 17.5
4 2 8 17.9, 17.1, 17.0, 17.5
4 3 12 14.5, 14.6, 13.5, 14.6
6 2 12 14.3, 14.9, 14.2, 14.1
(b) Distributed Memory Cluster
[Symbols] Code Version #MPI #MP #CORE #NODE SpeedUp grids-1,2,3 [%]
Serial / / 1 1 100.0, 100.0, 100.0
/ 2 2 1 70.7, 68.4, 67.4
pure OpenMP / 4 4 1 55.3, 52.5, 49.5
/ 8 8 1 47.1, 44.6, 41.6
2 / 2 1 52.6, 52.0, 51.8
2 / 2 2 52.3, 51.9, 52.1
4 / 4 1 29.1, 29.8, 31.4
4 / 4 2 28.5, 28.9, 30.1
4 / 4 4 28.8, 27.3, 27.7
8 / 8 1 22.8, 21.3, 21.6
pure MPI
8 / 8 2 19.6, 18.9, 20.8
8 / 8 4 20.1, 17.0, 16.0
8 / 8 8 15.8, 14.9, 15.2
16 / 16 2 16.5, 15.7, 16.9
16 / 16 4 13.7, 10.8, 12.1
16 / 16 8 12.8, 9.2, 9.7
2 2 4 1 38.4, 36.0, 36.3
2 2 4 2 39.7, 36.3, 35.7
2 4 8 1 30.1, 28.7, 27.7
2 4 8 2 29.9, 28.4, 27.6
2 8 16 2 25.5, 24.9, 24.0
4 2 8 1 22.6, 21.9, 23.5
hybrid OpenMP/MPI
4 2 8 4 21.3, 19.5, 19.5
4 4 16 4 18.6, 15.5, 15.0
4 8 32 4 16.0, 13.4, 13.4
8 2 16 8 12.3, 10.7, 10.8
8 4 32 8 15.8, 8.8, 9.2
8 8 64 8 15.2, 7.9, 7.6
#MPI is the number of MPI processes, #OMP is the number of OpenMP threads, #CORE is the number of cores used, #NODE is the number of nodes used, SpeedUp is defined
CPU
as CPU
SER
[%] for grid levels 1, 2,3 and 4.

cores used being equals, the best performance are obtained with the largest number of nodes because of the saturation of node memory
bandwidth is minimized, e.g. comparing the MPI tests for grid-3 using 16 processors (with 2, 4 or 8 nodes) the best speedup is achieved
with 8 nodes (9.7%) while the 2 nodes test is the worst (16.9%).
It is worth noting that, the reported benchmarks analysis constitutes only a partial assessment of the OFF parallel scalability. In
particular, tests on more powerful architectures (GIGA/PETA scale) with thousands to hundreds of thousands cores are necessary. On
both the two architectures used the intra-node memory bandwidth limit has strongly affected the parallel scalability, more than the
communications overheads. Indeed, this is a typical behavior of CFD codes, where the computations are orders of magnitude heavier than
the communications.
36 S. Zaghi / Computer Physics Communications ( ) –

(a) Speedup for grid-1. (b) Speedup for grid-3.

Fig. 17. OFF scalability for the fifth test of [60] for grid levels 1 and 2; serial (SER), pure OpenMP (OMP), pure MPI (MPI) and hybrid OpenMP/MPI
performances.

A last remark concerns with the operators overloading performance. As stated in the API description, a deep profiling of the code (the
analysis of which is not reported here being out of the scope of the present paper) has pointed out a performance drop when the object
Type_Conservative uses its own operators. In particular, using the grid-4 the overloading-enabled code uses about 7% more CPU time than
the overloading-disabled one (serial compiled with optimization enabled). The main reason of the performance drop can be attributed to
the presence of allocatable arrays members that introduces some overheads and negatively impacts on the work of compiler optimization.
This is probably to be attributed to the young (immature) implementation of the OOP paradigm in the modern Fortran compilers.

6. Ongoing development activities

Several improvements are presently being implemented into OFF project. Many of these activities (AMR and overset grids) involves a
re-factoring of the codes. In this section some of the most interesting ongoing improvements are described.

6.1. Validation of diffusive fluxes

Starting from its origins, OFF has been mainly developed for gas dynamics applications where the inviscid phenomena dominate over
the viscid ones. Consequently, the implementation of the algorithms for the viscous terms computation has been delayed with respect
to the convective terms. Currently the algorithms for the computation of the diffusive fluxes have been implemented, but they are now
under accurate validation.
In order to perform simulations where not all turbulence scales are resolved a turbulence model must be adopted. At present, OFF
project has an experimental branch focused on turbulence models implementation. We are implementing mainly a Large Eddy Simulations
(LES) model, see [76], using Smagorinsky [77] closure. It is planned to perform also experiments with Detached Eddy Simulations (DES)
model, see [78,79].

6.2. Validation of the multi-phase model

One of the original aims of OFF is to simulate multi-phase flows, in particular mixtures of gas and dispersed, solid particles. Actually
one multi-phase model is under validation on an experimental branch of OFF project. The model adopted consists on a fully coupled
Eulerian–Lagrangian methods: the fluid phase is solved by means of the above described models whereas the solid phase is described
by a finite number of Lagrangian particles driven by the Newton’s second law. The model is fully-coupled: the fluid phase and the solid
one are coupled interacting by means of a non-linear inter-forces model. After the validation of this multi-phase model it is planned the
introduction of a level-set algorithm, see [80,81], for capturing the interface of multi-fluids/multi-phase flows.

6.3. Implementation of adaptive mesh refinement algorithms

Background. Adaptive Mesh Refinement (AMR) algorithms for fluid dynamics applications were originally presented by Berger and
Oliger [82], see also [83,84]. The aim of AMR strategy is to use efficiently the computational resources in order to accurately resolve
all time/space scales by means of limited (controlled) numerical grid resolution. The key idea is to locally adapt the grid resolution to the
actual solution scale by means of a non-linear, dynamic algorithm, i.e. to use fine grids only where (and when) high resolution is necessary.
An example of two dimensional AMR grids (quadtree) is depicted in Fig. 18 where a discontinuity front is captured by means of cells of
locally varying resolution.
The basis of AMR algorithm is the construction of a (dynamic) hierarchy of grids having different refinement levels. An important
design choice is the arrangement of the hierarchy grids. Many approaches have been proposed and they can be divided into two main
algorithms: (1) block-based refinement in which a sequence of nested structured blocks at different levels are overlapped with or patched
onto each other and (2) cell-based refinement in which each single cell can be refined separately from others. Both two approaches have
pros and cons. The block-structured strategy is easy to integrate with solver originally developed for static grids: each AMR block can be
separately solved by static-grid solver and then some interpolation algorithms are used for inter-blocks computations. On the contrary, the
block-structured approach is inflexible thus the cost/accuracy ratio is not maximized. The cell-based algorithm is instead very flexible. The
very granular (atomic) refinement capability allows maximizing cost/accuracy ratio. However, the integration of this kind of algorithms
with traditional solvers is not straightforward: the traditional static-grid solvers must be modified in order to handle neighboring cells.
In particular, the cell-based approach leads to a memory structure, tree of data, more complex to the standard list (array) of structured
blocks. Moreover, the parallelization of such an approach is very difficult: the design of an efficient memory structure is a great challenge.
S. Zaghi / Computer Physics Communications ( ) – 37

Fig. 18. Schematic representation of Adaptive Mesh Refinements Strategy (quadtree) applied to Interface Capturing.

Related works. At present, many free, open source projects are devoted to AMR. PARAMESH,26 Parallel Adaptive Mesh Refinement, released
under the NASA-wide Open-Source software license, is a package for building a hierarchy of block-based Cartesian grids. BoxLib27 is a
free software for block-based AMR with parallel (OpenMP/MPI) capabilities. The libMesh is an open source library providing a framework
for the numerical simulation of PDE using arbitrary unstructured discretization with AMR capabilities. For the ongoing implementation
of AMR algorithms into OFF the authors have chosen to avoid external libraries, mainly because the most mature, freely available, AMR
packages are written in C/C++ and because they are block-based.
OFF implementation. The AMR approach being implemented belongs to the cell-based algorithms. The new memory structure being
implemented is more complex than the Type_SBlock described in Section 3.2.10. As a matter of fact, each cell now constitutes a tree,
in particular an octree (eight children), of cells. A cell can be a leaf of the tree if it has no children or it can be a parent otherwise: the cells
are organized in parents, children, siblings (cells having the same parent) and neighboring. The coarsest cell is the root (level zero) and the
grid refinement level of a child cell of the tree corresponds to the number of its ancestors from the root. The new memory structure must
provide the following features: flexibility that allows inserting, searching and deleting efficiently cell data during refinement/coarsening
phase, parallelism for allowing parallel computations providing and lightweight for minimizing memory consumption.
During the last decades many data structures have been proposed to efficiently represent trees addressing the above specifications. It
is possible to recognize two main approaches: (1) pointer-based structures and (2) pointer-less one. Pointer-based approach is relatively
easy to implement, but the connectivity of the pointers (parent, children and neighboring links) requires heavy overhead to be computed
whenever the grids are refined/coarsened. Besides pointer-based structure is very difficult to be parallelized. Finally, it has inefficient
memory access (searching algorithm). As consequence a pointer-less data structure is implementing into OFF.
The selected memory structure relies on a linearized hash table. Hash tables are well established techniques to efficiently manage huge
amounts of data, for a non exhaustive review see Knuth [85], Larson [86], Fagin et al. [87], Litwin [88] and Griebel and Zumbusch [89].
Compared to linked-list, hash table structure is proven to possess constant memory access overhead, independently of the tree dimension,
namely O (1) in ideal case. The main idea is to map each node of the tree to a key which is used as an address in hash table. The mapping
is performed by means of a hashing function h(k), k being the key. The space of all potentially attainable keys is the universe of keys that is,
in general, larger than the dimension of the hash table. Consequently, the deterministic hash function is not injective, i.e. different nodes
could be mapped to the same address (bucket hereinafter) originating collisions that must be resolved. The chaining technique [85] has
been adopted: each hash table bucket consists of a linked list (chain) such as, if k and b = h(k) are given, a linear search is performed on
the list of bucket b in order to find the key k. The efficiency of the linear search in the bucket is O (Nb ), Nb being the dimension of the bucket
chain. In order to maximize hash table efficiency the buckets chain dimension must be minimized, ideally Nb → 1 ∀b. Hence the hash
table dimensions (number of buckets and chains length) are crucial for the memory efficiency. In OFF implementation the modulo function
is used for key mapping and, consequently, the number of buckets is chosen as a prime number proportional to the number of root cells
and grid refinement levels used. In Fig. 19 a schematic representation of a hash table using chaining technique for collision resolution is
shown.
At present, OFF has three Data_Type_ modules devoted to the implementation of the hash tables for node (indicated as vertex hereinafter
in order to avoid confusion with tree’s node), face and cell data, namely Type_HashTNode, Type_HashTFace and Type_HashTCell. Listing 32
reports Type_HashTCell implementation. It is a derived type that constitutes cell data hash table.
type , p u b l i c : : Type_HashTCell
type ( Type_SLL ) , a l l o c a t a b l e : : ht ( : )
i n t e g e r ( I4P ) : : leng = 0_I4P
contains
procedure , non_overridable : : i n i t
procedure , non_overridable : : put
procedure , non_overridable : : get
procedure , non_overridable : : del
procedure , non_overridable : : f r e e
endtype Type_HashTCell

Listing 32: Type_HashTCell: cell data hash table.

26 PARAMESH is available at http://www.physics.drexel.edu/~olson/paramesh-doc/Users_manual/amr.html.


27 BoxLib is available at https://ccse.lbl.gov/BoxLib/index.html.
38 S. Zaghi / Computer Physics Communications ( ) –

Fig. 19. Hash table mapping and chaining collision resolution.

It has two components: (1) ht, an allocatable array which elements are the hash table buckets (2) leng, a scalar integer containing the
actual length of the hash table. Type_HashTCell has five type bound procedures:
1. init: for initializing the hash table;
2. put: for inserting a node into the table;
3. get: for getting a node from the table;
4. del: for deleting a node from the table;
5. free: for freeing the hash table.
The buckets array consists of Type_SLL elements. Type_SLL is a derived type implementing the Single Linked List (SLL) used for chaining
resolution of collisions, see listing 33. Type_SLL constitutes a single node of the linked list. Such a type is identical for vertex, face and cell
data except that the component containing node data is different in the three cases, namely Type_Vector, Type_Face and Type_Cell. Type_SLL
and has three members: (1) next is a Type_SLL pointer linked to the next node of the list, if any, (2) ID a Type_HashID containing the (unique)
identifier of the node and (3) d a Type_Cell containing the node data.
type , p u b l i c : : Type_SLL
type ( Type_SLL ) , p o i n t e r : : next => NULL ( )
type ( Type_HashID ) , a l l o c a t a b l e : : ID
type ( Type_Cell ) , allocatable :: d
contains
procedure , non_overridable : : put
procedure , non_overridable : : get
procedure , non_overridable : : del
procedure , non_overridable : : f r e e
endtype Type_SLL

Listing 33: Type_SLL: Single Linked List for chaining algorithm.

Type_SLL has four type bound procedures:


1. put: for inserting a node into the list (chain);
2. get: for getting a node from the list (chain);
3. del: for deleting a node from the list (chain);
4. free: for freeing the list (chain).
The other two types of hash tables, i.e. Type_HashTNode and Type_HashTFace, are similar to Type_HashTCell therefore they are omitted.
The identification of each node of the tree relies on key defined as a Type_HashID. The construction of this key takes into account the
octree connectivity. In particular, the key of each node is computed as a locational code allowing a linearization of the tree. The locational
code approach has manifold advantages: (1) the nodes (cells) spatial locality is preserved for both static and AMR grids, (2) pointer-based
connectivity is avoided and (3) the load-balancing and grids partitioning is greatly simplified (due to the linearization). The locational code
computation is performed by means of Space Filling Curves (SFC).28 The interest on SFC is old, see Peano [90] and Hilbert [91], and their
original application was a purely mathematical speculation. During the last decades SFC have been applied to several problems [92–94].
OFF presently adopts a SFC algorithm based on Morton’s order (or Z-order), see [95], as shown in Fig. 20 for a two dimensional case. The
mapping is based on the encoding of the indexes i, j, k, l, addressing the location and level of cells inside the octree, into a single key, the
HashID. Such an encoding consists of bits interleaving operations and logical manipulations that are very inexpensive, see [96–98].
The encoded ID (key) is an integer having a number of bits equals to the sum of bits of i, j, k, l. For the sake of simplicity suppose that
the indexes i, j, k, l have only two significant bits:

i ≡ i1 i0
j ≡ j1 j0
(29)
k ≡ k1 k0
l ≡ l1 l0

28 SFC are class of locality preserving mappings from D−dimensional space to one dimensional space, i.e. N D ↔ N 1 . This class of mappings is injective, i.e. each node in N D
corresponds to a unique node in N 1 and vice versa. The computation of SFC is inexpensive allowing practical uses for locational code construction.
S. Zaghi / Computer Physics Communications ( ) – 39

Fig. 20. Morton’s space filling curve used for mapping (numbering) quadtree cells.

where x1 , x0 indicate the two bits (x = i, j, k, l). The Morton’s key is an eight bits integer constructed as follows:
m(l, k, j, i) = l1 k1 j1 i1 l0 k0 j0 i0 . (30)
In order to perform this kind of encoding the library Lib_Morton is being developed into the AMR branch of OFF. As stated in [98],
interleaving consists of dilating two or more integers by placing null bits between their significant bits, shifting them in order that
significant bits of each integer match with the inserted zeros in the other integers and then ORing them together. Applying such a dilating
function to the i, j, k, l indexes one obtains:

dil4 (i) = i1 0000i0


dil4 (j) = j1 0000j0
(31)
dil4 (k) = k1 0000k0
dil4 (l) = l1 0000l0
where 4 null bits have been interleaved to the significant original bits. Using bits left shifting, ≪, and logical XOR/XAND operators, ⊕/&,
the dilating function can be efficiently constructed:
dil4 (x) = [x ⊕ (x ≪ 4)] &F 0F . (32)
Finally, the Morton’s key is constructed as:
m(l, k, j, i) = (dil4 (l) ≪ 3) & (dil4 (k) ≪ 2) & (dil4 (j) ≪ 1) & dil4 (i). (33)
Lib_Morton provides two procedures, morton and demorton, for encoding and decoding, respectively, up to 4 integers into on key. Once the
cells of the grids are numbered by means of the above described SFC technique the hash tables can be constructed.
At present, OFF AMR implementation is not complete. The hash table memory structures (subsumed into Type_AMRBlock derived type)
are being validate whereas the fluid dynamics libraries are not still modified for dealing also with AMR grids. An interesting perspective
of the AMR structures under testing is to develop an OOP object devoted to octree element manipulation, similar to the Treelm of Roller
et al. [99].
Refactoring of data structures: impact on the performance. It is worth noting that the implementation of AMR scheme will not influence the
performance of non AMR simulations. The hierarchical AMR data structure and the structured multi-block Type_SBlock are complementary
and the user is allowed to chose between them. Moreover, the modifications of Lib_Fluidynamic will not influence the non AMR simulations
performance, because new procedures devoted to AMR will be implemented rather than modifying the non AMR procedures already
implemented. As a consequence, the features and performance above presented must be attributed to the current, stable (mature) OFF
version and they will be not influenced by AMR implementation.

6.4. Extension to moving grids

There are several applications where the relative motion of bodies are of great interest, e.g. turbines, propellers, maneuvering vehicles,
etc. In a structured grids (static or adaptive) environment, such as the one of OFF, it is complex to handle the bodies motion. One effective
approach is to allow partial overlapping of different grids. This technique is referred to as the Chimera or overset grid approach and it is also
used for very complex geometries where structured or unstructured multi-blocks grids are not enough flexible. During the last decades
overset techniques have been intensively investigated. In [100] some of the earliest composite grid approaches used in aerodynamic
applications have been reviewed. A fully conservative algorithm has been developed for overlapped grids by Wang [101].
At present, it planned to implement the chimera method originally proposed and developed by [102] in which a body-force-like
approach for moving grids has been presented. This technique has been successfully applied by OFF authors in many applications, see [5,
103,104] thus a branch of OFF, tagged as chimera, for chimera testing has been created.
A chimera grids system consists of geometrically simple overlapping grids that are eventually free to move. Fig. 21 reports an example
of a chimera mesh made for the simulation of a maneuvering submarine [105]: the relative motion of the horizontal control appendage of
40 S. Zaghi / Computer Physics Communications ( ) –

(a) Blocks topology. (b) Body-fitted grids example.

(c) Body-fitted overlapping blocks. (d) Donor (active) cells inside the overlapping region.

Fig. 21. Dynamic Chimera Mesh of a Complex Moving Surface.

the submarine stern is simulated by means of dynamic overset block-structured body-fitted grids. In the following a brief description of
the method is reported.
Chimera grids require a modification of both the boundary conditions and internal point treatment of the overlapping regions.
In particular, a reliable interpolation mechanism for the overlapped grids is mandatory. In the overlapping zones the donor (of the
interpolation values) and receiver (of the interpolated values) cells must be identified. Donor/receiver identification is performed
accordingly to the resolution of the cells: a donor cell is more refined than the receiver cell inside the overlapped region. The crucial aspect
of the chimera approach is the algorithm used for the donor/receiver interpolation. An effective technique is to enforce the interpolated
solution on the receiver cells by means of body-force-like term:
 
k n
n +1 n
− ∆t R n + n

Ureceiver = Ureceiver Ureceiver − Uinterpolated (34)
δ
where U is the vector of state variables, R is the vector of the residuals, k = O (10) is a parameter chosen through numerical tests, and δ
is the minimum between the cell diameter and the physical time step. The last term is essentially a body-force term driving the solution
n
toward the interpolated value Uinterpolated .

7. Concluding remarks and perspectives

The present paper reports the first manifesto of OFF, a new open source (free software) Computational Fluid Dynamics code being based
on reliable mathematical and numerical models. In particular, the numerical integration of compressible Navier–Stokes equations system
is preformed by means of high-order (WENO) reconstruction algorithms (up to 7th formal order), finite volume scheme for multi-fluids,
multi-phase flows using body-fitted structured multi-block grids. By means of a thorough presentation, a deep insight of OFF API has been
reported. Moreover, a description of the most important algorithmic choices has given, explaining the coding rules adopted and their
reasons. To the authors knowledge there is no free Fortran (standard 2003 or newer) CFD codes other than OFF developed with Object
Oriented Programming (OOP) paradigm as a design specification and this highlights the novelty of the OFF project.

7.1. Key features

OFF is developed by a team of research scientists rather than of computer technicians meaning that OFF is designed to be as simple as
possible while the performances are not degraded. In particular, OFF is aimed at promoting the rapid, efficient and easy implementation of
complex physical models into effective softwares suited for real applications. To these purposes modern Fortran programming language
has been chosen due to its simplicity (it is, de facto, the best Formula Translator) and efficacy (many powerful compilers and optimizers
are freely available).
In order to perform heavy computations the parallelism of codes has been carefully designed. The MPI paradigm has been adopted
for multi-processes parallel computations on distributed memory clusters while OpenMP paradigm has been chosen for multi-threads
S. Zaghi / Computer Physics Communications ( ) – 41

Table A.3
WENO linear and optimal coefficients.
±1/2
(a) WENO linear coefficients ak,l appearing in Eq. (6).
S l k = −1 k=0 k=1 k=2 k=3
i-1/2
3 1
0 2 2
− 12
2
1 − 12 1
2
3
2
i+1/2

i-1/2
11 1
0 6 3
− 16 1
3
3 7 5 5
1 − 6 6 6
− 76
1
2 3
− 16 1
3
11
6
i+1/2

i-1/2
25 1 1 1
0 12 4
− 12 12
− 41
23 13 7 5 13
4 1 − 12 12 12
− 12 12
13 5 7 13 23
2 12
− 12 12 12
− 12
1 1 1 1 25
3 − 4 12
− 12 4 12
i+1/2
±1/2
(b) WENO optimal coefficients Ck appearing in Eq. (8).
S k = −1 k=0 k=1 k=2 k=3
i-1/2
2 1 2
2 3 3 3
i+1/2

i-1/2
3 1 6 3
3 10 10 10 10
i+1/2

i-1/2
4 1 12 18 4
4 35 35 35 35 35
i+1/2

parallel computations on shared memory workstations. The mixed implementation of both MPI and OpenMP parallel paradigms allows
OFF to run over supercomputer constituted by cluster of distributed computing nodes of multi-cores machine.
The usability, maintenance and enhancement of OFF project have also taken into account. A comprehensive documentation is provided
by means of exhaustive source files comments that are parsed through doxygen free software producing high quality html and latex
documentation pages. To support the collaborative development of the project the git distributed versioning system is adopted. Moreover,
the dedicated github web repository, i.e. https://github.com/szaghi/OFF, promotes the international collaboration.
OFF can already deal with other softwares that are crucial for an effective work-flow. OFF can read (through IBM pre-processor) complex
numerical grids constructed by means of Ansys ICEM CFD and Pointwise Gridgen softwares. OFF can produce (through POG post-processor)
outputs that are easily analyzed by means of Tecplot, Inc. and Paraview (or other VTK-based visualizers) softwares. Moreover, the free
nature of OFF makes easy to support other softwares.

7.2. Known issues

The Fortran support for OOP still lags behind. OFF profiling has pointed out that overloaded operators can, in sum circumstances,
degrade the performance. This lack of performance is probably due to the immaturity of OOP paradigm implementation of the compilers
(currently there is no fully compliant compiler with respect to the Fortran standard 2003 or newer) thus, in some critical algorithms, the
overloaded operators have been avoided preferring a direct manipulation of objects components.
Similar, the OpenMP support for new Fortran OOP features is still incomplete. In particular, OpenMP (standard 3.1) paradigm seems to
produce false sharing and memory leaks when parallelizing directly the OOP objects. To overcame this issue, some critical procedures have
been refactored in order to call (if OpenMP is activated) specific OpenMP-safe back-end procedures that operates on object components
rather than on the object itself. The back-end callings allow the use of OpenMP, but introduce some overheads.

7.3. Perspectives

The reported examples of applications have demonstrated that OFF is a mature project able to already provide accurate results for many
real application (e.g. for complex gas dynamics applications). However, OFF is a relatively young project and many important improve-
ments are ongoing. Moreover, some important algorithms implemented must be accurately validated, i.e. the viscous fluxes computations
and turbulence models. Nevertheless, the above described design specifications of OFF constitute a solid basis. The perspective of OFF are
very interesting: the AMR capability will maximize the accuracy/costs ratio allowing a deep insight into complex physical phenomena
without the computational power of supercomputers. The implementation of dynamic overlapping grids will allow the easy and accurate
simulations of complex scenario where bodies are in relative motion.
42 S. Zaghi / Computer Physics Communications ( ) –

Table A.4
WENO IS coefficients σk,l,m appearing in Eq. (9).
(a) Coefficients for S = 2, 3.
S k l m=0 m=1 m=2 m=3
0 1 −2 / /
0
1 0 1 / /
2
0 1 −2 / /
1
1 0 1 / /
0 10/3 −31/3 11/3 /
0 1 0 25/3 −19/3 /
2 0 0 4/3 /
0 4/3 −13/3 5/3 /
3 1 1 0 13/3 −13/3 /
2 0 0 4/3 /
0 4/3 −19/3 11/3 /
2 1 0 25/3 −31/3 /
2 0 0 10/3 /
(b) Coefficients for S = 4.
S k l m=0 m=1 m=2 m=3
0 2107 −9402 7042 −1854
1 0 11003 −17246 4642
0
2 0 0 7043 −3882
3 0 0 0 547
0 547 −2522 1922 −494
1 0 3443 −5966 1602
1
2 0 0 2843 −1642
4 3 0 0 0 267
0 267 −1642 1602 −494
1 0 2843 −5966 1922
2
2 0 0 3443 −2522
3 0 0 0 547
0 547 −3882 4642 −1854
1 0 7043 −17246 7042
3
2 0 0 11003 −9402
3 0 0 0 2107

Table B.5
Butcher’s table of 3rd order 3 stages RK-SSP.

1 1
1/2 1/4 1/4
1/6 1/6 2/3

Table B.6
Butcher’s table of 4th order 5 stages RK-SSP.

0.39175222700392 0.39175222700392
0.58607968896779 0.21766909633821 0.36841059262959
0.47454236302687 0.08269208670950 0.13995850206999 0.25189177424738
0.93501063100924 0.06796628370320 0.11503469844438 0.20703489864929 0.54497475021237
0.14681187618661 0.24848290924556 0.10425883036650 0.27443890091960 0.22600748319395

Acknowledgments

The author would like to thank AVIO S.p.A. (Colleferro factory) that has kindly provided all the required motor data used for the
simulations reported in Section 5.3. It is worth noting that OFF project begins as a Ph.D. Thesis supported by AVIO.
Special thanks are due to Professor Bernardo Favini for his support. His help shaped the author knowledge. The author is also grateful
to Dr. Andrea Di Mascio: without his remarkable help OFF project will never start.
The author thanks support from CASPUR, Consorzio interuniversitario per le Applicazioni di Supercalcolo per Università e Ricerca, that is
now part of CINECA supercomputing facilities: special thanks are due to Dr. Francesco Salvadore for his hints on the scalability assessment
and on the parallel profiling of the codes.

Appendix A. WENO tables of coefficients

In this appendix the complete set of WENO coefficients (linear and ideal weights and smoothness indicator coefficients) are reported.
The coefficients appearing in Eqs. (6), (8) and (9) were given by Jiang and Shu [38] for S = 2, 3 and by Balsara and Shu [61] for S = 4, 5, 6.
Here we report in Tables A.3a, A.3b and A.4 the coefficients only for S = 2, 3, 4.
S. Zaghi / Computer Physics Communications ( ) – 43

Appendix B. Runge–Kutta-SSP tables of coefficients

This appendix reports the coefficients al and bk appearing in Eqs. (18) and (19), respectively. The coefficients are reported in the Butcher’s
tables form:
c2 a2,1
c3 a3,1 a3,2
.. .. ..
. . .
cNs aN s , 1 aN s , 2 ··· aNs ,Ns −1
b1 b2 ··· bNs −1 bNs
The optimal third order three stages RK-SSP scheme has been proposed by Shu and Osher [37]. The fourth order five stages RK-SSP
scheme (low storage) is due to Spiteri and Ruuth [106] (see Tables B.5 and B.6).

References

[1] J. Anderson, Computational Fluid Dynamics: The Basics with Applications, Science/Engineering/Math, McGraw-Hill, 1995.
[2] J. von Neumann, AMG-IAS 1 (1944).
[3] R. Rotunno, Annu. Rev. Fluid Mech. 45 (2013) pp. 59.
[4] J. Yoo, Annu. Rev. Fluid Mech. 45 (2013) pp. 495.
[5] S. Zaghi, R. Broglia, A. Di Mascio, Ocean Eng. 38 (2011) pp. 2110.
[6] A. Raj, J. Mallikarjuna, V. Ganesan, Appl. Energy 102 (2013) pp. 347.
[7] M. Jia, M. Xie, T. Wang, Z. Peng, Appl. Energy 88 (2011) pp. 2967.
[8] J. Katz, Annu. Rev. Fluid Mech. 38 (2006) pp. 27.
[9] J. Sørensen, Annu. Rev. Fluid Mech. 43 (2011) pp. 427.
[10] H. Weller, G. Tabor, H. Jasak, C. Fureby, Comput. Phys. 12 (1998) pp. 620.
[11] S. Popinet, J. Comput. Phys. 190 (2003) 572–600.
[12] C. Burstedde, O. Ghattas, M. Gurnis, G. Stadler, E. Tan, T. Tu, L. Wilcox, S. Zhong, Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, IEEE Press, 2008,
p. 62.
[13] R.J. LeVeque, J. Comput. Phys. 131 (1997) 327–353.
[14] M. Berger, R. LeVeque, SIAM J. Numer. Anal. 35 (1998) 2298–2316.
[15] R. LeVeque, Finite Volume Methods for Hyperbolic Problems, vol. 31, Cambridge University Press, 2002.
[16] J. Dreher, R. Grauer, Parallel Comput. 31 (2005) 913–932.
[17] ISO/IEC DIS 1539-1:2004(E), Draft International Standard for Fortran 2003, Published by the International Fortran standards committee, ISO/IEC JTC1/SC22, 2004.
[18] J.W. Backus, R.J. Beeber, S. Best, R. Goldberg, L.M. Haibt, H.L. Herrick, R.A. Nelson, D. Sayre, P.B. Sheridan, H. Stern, I. Ziller, R.A. Hughes, R. Nutt, Papers Presented at the
February 26–28, 1957, Western Joint Computer Conference: Techniques for Reliability, in: IRE-AIEE-ACM ’57 (Western), ACM, New York, NY, USA, 1957, pp. 188–198.
[19] B. Favini, S. Zaghi, F. Serraglia, M. Di Giacinto, 42nd AIAA/ASME/SAE/ASEE Joint Propulsion Conference, Sacramento, California, AIAA 2006-4953, AIAA, 2006.
[20] B. Favini, S. Zaghi, M. Di Giacinto, F. Serraglia, 43rd AIAA/ASME/SAE/ASEE Joint Propulsion Conference, Cincinnati, Ohio, AIAA 2007-5781, AIAA, 2007.
[21] S. Zaghi, B. Favini, M. Di Giacinto, F. Serraglia, 44th AIAA/ASME/SAE/ASEE Joint Propulsion Conference, Hartford, Connecticut, AIAA-2008-4606, AIAA, 2008.
[22] C. Navier, Mém. Acad. Roy. Sci. Inst. France 6 (1822) pp. 389.
[23] G. Stokes, British Assoc. Adv. Sci. 1 (1846) pp. 1.
[24] A. Craik, Annu. Rev. Fluid Mech. 36 (2004) pp. 1.
[25] A. Craik, Annu. Rev. Fluid Mech. 37 (2005) pp. 23.
[26] R. Abgrall, S. Karni, J. Comput. Phys. 169 (2001) pp. 594.
[27] J. Dalton, Journal of Natural Philosophy, Chemistry, and The Arts 5 (1801) pp. 241.
[28] G. Moretti, Comput. & Fluids 7 (1979) 191–205.
[29] G. Moretti, Annu. Rev. Fluid Mech. 19 (1987) 313–337.
[30] S. Godunov, Mat. Sb. 47 (1959) 271.
[31] P. Colella, P. Woodward, J. Comput. Phys. 54 (1984) pp. 174.
[32] B. van Leer, J. Comput. Phys. 32 (1979) 101–136.
[33] A. Harten, B. Einfeldt, S. Osher, S. Chakravarthy, J. Comput. Phys. 71 (1987) pp. 231.
[34] X. Liu, S. Osher, T. Chan, J. Comput. Phys. 115 (1994) pp. 200.
[35] B. Riemann, Aus dem achten Bande der Abhandlungen der Königlichen Gesellschaft der Wissenschaften zu Göttingen 8 (1860).
[36] J. Gibbs, The Scientific Papers of J. Willard Gibbs, Longmans, Green and Co., NY, 1906.
[37] C. Shu, S. Osher, J. Comput. Phys. 77 (1988) pp. 439.
[38] G. Jiang, C. Shu, J. Comput. Phys. 126 (1996) pp. 202.
[39] X. Zhang, C. Shu, J. Comput. Phys. 229 (2010) pp. 8918.
[40] X. Zhang, C. Shu, J. Comput. Phys. 229 (2010) pp. 3091.
[41] L. Euler, Mém. Acad. Sci. Berlin 11 (1757) pp. 274.
[42] L. Euler, Mém. Acad. Sci. Berlin 11 (1757) pp. 316.
[43] L. Euler, Novi Comm. Acad. Petropolitanæ 6 (1761) pp. 271.
[44] J. Quirk, Internat. J. Numer. Methods Fluids 18 (1994) pp. 555.
[45] I. Newton, The Royal Society, London 1711 (1669).
[46] J. Raphson, Analysis Æ quationum Universalis, Royal Society, Cambridge, 1702.
[47] I. Newton, The Method of Fluxions and Infinite Series: With Its Application to the Geometry of Curve-lines, Royal Society, Printed by Henry Woodfall and sold by John
Nourse, 1736, translated from the Author’s Latin Original by John Colson.
[48] E. Toro, Riemann Solvers and Numerical Methods for Fluid Dynamics, A Practical Introduction, third ed., Springer, 2009.
[49] E. Toro, Shock Waves 4 (1994) pp. 25.
[50] A. Harten, P. Lax, B. van Leer, SIAM Rev. 25 (1983) pp. 35.
[51] E. Toro, Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 434 (1991) pp. 683.
[52] E. Toro, Shock Waves 5 (1995) pp. 75.
[53] V.V. Rusanov, J. Comput. Math. Phys. 1 (1961) pp. 267.
[54] V. Lebedev, Vestnik Moskov. Gosudarstvennogo Univ. 10 (1955) pp. 47.
[55] C. Runge, Math. Ann. 46 (1895) pp. 167.
[56] W. Kutta, Z. Angew. Math. Phys. 46 (1901) pp. 435.
[57] S. Gottlieb, C. Shu, E. Tadmor, SIAM Rev. 43 (2001) pp. 89.
[58] J. Butcher, Numerical Methods for Ordinary Differential Equations, second ed., Wiley, 2008.
[59] R. Courant, K. Friedrichs, H. Lewy, Math. Ann. 100 (1928) pp. 32.
[60] A. Kurganov, E. Tadmor, Numer. Methods Partial Differential Equations 18 (2002) pp. 584.
[61] D. Balsara, C. Shu, J. Comput. Phys. 160 (2000) pp. 405.
[62] R. Borges, M. Carmona, B. Costa, W. Don, J. Comput. Phys. 227 (2008) pp. 3191.
[63] A. Henrick, T. Aslam, J. Powers, J. Comput. Phys. 207 (2005) pp. 542.
[64] H. Nishikawa, K. Kitamura, J. Comput. Phys. 227 (2008) pp. 2560.
44 S. Zaghi / Computer Physics Communications ( ) –

[65] Y. Ren, Comput. & Fluids 32 (2003) pp. 1379.


[66] D. Levy, K. Powell, B. van Leer, J. Comput. Phys. 106 (1993) pp. 201.
[67] J. Bittencourt, Fundamentals of Plasma Physics, Springer, 2004.
[68] D.S. Balsara, Astrophys. J. Suppl. Ser. 116 (1998) pp. 119.
[69] D.S. Balsara, J. Comput. Phys. 228 (2009) pp. 5040.
[70] D.S. Balsara, J. Comput. Phys. 229 (2010) 1970–1993.
[71] G. Sod, J. Comput. Phys. 27 (1978) pp. 1.
[72] P. Woodward, P. Colella, J. Comput. Phys. 54 (1984) pp. 115.
[73] T. Bazhenova, L. Gvozdeva, M. Nettleton, Prog. Aerospace Sci. 21 (1984) pp. 249.
[74] S. Zaghi, SRM Igniter Jet Transient, Multidimensional Unsteady Gasdynamics, Ph.D. thesis, Scuola di Ingegneria Aerospaziale, Università degli Studi di Roma Sapienza,
2009.
[75] S. Zaghi, B. Favini, M. Di Giacinto, F. Serraglia, 2nd International Symposium on Propulsion for Space Transportation, Heraklion, Creta, 2008b.
[76] R. Rogallo, P. Moin, Annu. Rev. Fluid Mech. 16 (1984) pp. 99.
[77] J. Smagorinsky, Mon. Weather Rev. 91 (1963) pp. 99.
[78] P. Spalart, W. Jou, S. Allmaras, Advances in DNS/LES, in: G. Press (Ed.), First AFOSR International Conference on DNS/LES, Louisiana Tech University, Ruston, Louisiana,
USA, 1997, pp. 137–47.
[79] P. Spalart, Annu. Rev. Fluid Mech. 41 (2009) pp. 181.
[80] S. Osher, J. Sethian, J. Comput. Phys. 79 (1988) pp. 12.
[81] J.A. Sethian, P. Smereka, Annu. Rev. Fluid Mech. 35 (2003) pp. 341.
[82] M.J. Berger, J. Oliger, J. Comput. Phys. 53 (1984) pp. 484.
[83] M. Berger, P. Colella, J. Comput. Phys. 82 (1989) pp. 64.
[84] M. Berger, R. Leveque, 9th AIAA Computational Fluid Dynamics Conference, Buffalo, New York, 1989.
[85] D. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching, second ed., vol. 3, Addison Wesley Longman Publishing Co., Inc., Redwood City, CA,
USA, 1998.
[86] P. Larson, BIT Numerical Mathematics 18 (1978) pp. 184.
[87] R. Fagin, J. Nievergelt, N. Pippenger, H. Strong, ACM Trans. Database Syst. 4 (1979) pp. 315.
[88] W. Litwin, Proceedings of the Sixth International Conference on Very Large Data Bases, vol. 6, IEEE Computer Society, 1980, pp. 212–223.
[89] M. Griebel, G. Zumbusch, Parallel Comput. 25 (1999) pp. 827.
[90] G. Peano, Math. Ann. 36 (1890) pp. 157.
[91] D. Hilbert, Math. Ann. 38 (1891) pp. 459.
[92] H. Sagan, Space-Filling Curves, Springer-Verlag, New York, 1994.
[93] K. Lorton, D. Wise, SIGARCH Comput. Architect. News 35 (2007) pp. 6.
[94] P. Gottschling, D. Wise, M. Adams, Proceedings of the 21st Annual International Conference on Supercomputing, ACM, Seattle, Washington, USA, 2007, 116–125.
[95] G.M. Morton, IBM Germany Scientific Symposium Series, International Business Machines Company, 1966.
[96] P. van Oosterom, T. Vijlbrief, 7th International Symposium on Spatial Data Handling, Delft, The Netherlands, 1996.
[97] D. Wise, in: A. Bode, T. Ludwig, W. Karl, R. Wismüller (Eds.), Euro-Par 2000 Parallel Processing, in: Lecture Notes in Computer Science, vol. 1900, Springer, Berlin
Heidelberg, 2000, pp. 774–783.
[98] L. Stocco, G. Schrack, IEEE Trans. Comput. 58 (2009) pp. 424.
[99] S. Roller, J. Bernsdorf, H. Klimach, M. Hasert, D. Harlacher, M. Cakircali, S. Zimny, K. Masilamani, L. Didinger, J. Zudrop, in: M. Resch, X. Wang, W. Bez, E. Focht,
H. Kobayashi, S. Roller (Eds.), High Performance Computing on Vector Systems 2011, Springer, Berlin, Heidelberg, 2012, pp. 93–105.
[100] J. Steger, J. Benek, Comput. Methods Appl. Mech. Eng. 64 (1987) pp. 301.
[101] Z. Wang, J. Comput. Phys. 122 (1995) pp. 96.
[102] R. Muscari, R. Broglia, A. Di Mascio, 16th International Offshore and Offshore and Polar Engineering Conference Proceedings (ISOPE The International Society of Offshore
and Polar Engineers, Cupertino, CA, 95015-0189, USA, San Francisco, California, USA, 2006), pp. 243–248.
[103] R. Broglia, S. Zaghi, A. Di Mascio, J. Marine Sci. Technol. 16 (2011) pp. 254.
[104] S. Zaghi, R. Broglia, A. Di Mascio, J. Hydrodynamics, Ser. B 22 (2010) pp. 545.
[105] S. Zaghi, G. Dubbioso, R. Broglia, 10th International Conference on Hydrodynamics, St. Petersburg, Russia, 2012.
[106] R. Spiteri, S. Ruuth, SIAM J. Numer. Anal. (2003) pp. 469.

You might also like