Beckner 2015

IPTC-18383-MS
General Parallel Reservoir Simulation

B. L. Beckner, K. B. Haugen, S. Maliassov, V Dyadechko, and K. D. Wiegand, ExxonMobil
Copyright 2015, International Petroleum Technology Conference
This paper was prepared for presentation at the International Petroleum Technology Conference held in Doha, Qatar, 6 –9 December 2015.
This paper was selected for presentation by an IPTC Programme Committee following review of information contained in an abstract submitted by the author(s).
Contents of the paper, as presented, have not been reviewed by the International Petroleum Technology Conference and are subject to correction by the author(s).
The material, as presented, does not necessarily reflect any position of the International Petroleum Technology Conference, its officers, or members. Papers
presented at IPTC are subject to publication review by Sponsor Society Committees of IPTC. Electronic reproduction, distribution, or storage of any part of this paper
for commercial purposes without the written consent of the International Petroleum Technology Conference is prohibited. Permission to reproduce in print is restricted
to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of where and by whom the paper
was presented. Write Librarian, IPTC, P.O. Box 833836, Richardson, TX 75083-3836, U.S.A., fax ⫹1-972-952-9435
Abstract
Faster reservoir simulation turnaround time continues to be a major industry priority. Simultaneously,
model sizes are reaching a billion cells and the recovery mechanisms and reservoir management processes
to be modeled are rapidly changing and are becoming computationally more expensive. A new reservoir
modeling solution has been developed to quickly solve these largest and most complex modeling studies
within ExxonMobil.
With this singular goal to reduce study turnaround time, ExxonMobil has developed its fifth-generation
reservoir simulator. This latest generation reservoir simulator has been designed from the ground up with
60 years of internal reservoir simulator development experience. Some of the key learnings incorporated
into the new reservoir simulator include a requirement for a flexible and modular software framework, a
general flow formulation that is decoupled from a very fast and highly accurate phase behavior engine,
and a distributed architecture for unstructured grids. This simulator is optimized for massive distributed
memory parallelism to take advantage of ExxonMobil’s world class supercomputer, Discovery.
A new fluid agnostic formulation has been developed based on general material balance. A highly
optimized and highly accurate fluid library supports liquid-liquid-vapor calculations and is the only
differentiation between black-oil and compositional options. These considerations are critical for main-
taining computationally efficient software under heavy software development with a large development
team. A general formulation framework, based on material balance and generic gradients, was imple-
mented in order to ensure minimal overlap between fluid models and primary variable selections while
maintaining the flexibility to allow any component-phase interaction. Assuming Darcy flow and no
diffusive flux or chemical reactions, the differential expression for material balance can be easily applied
to different types of formulations. This may require the addition of a constraint equation as well as
selection of different primary variables.
Previous in house simulators focused primarily on delivering new modeling functionality such as full
field compositional modeling or use of unstructured girds. These simulators followed the performance
improvements of the hardware so execution speed increases were tied to increases in computing clock
rates from the 1980s to the 2000s. The new simulator’s distributed memory parallelism has been tested
and has shown to be very efficient on the Discovery platform. Strong scalability tests have been run to
16,000 cores with good parallel performance to 6000 unknowns per core. These are some of the largest
2 IPTC-18383-MS
core counts for parallel reservoir simulation with unstructured grids seen in the industry and reduce model
run times from days to minutes. These drastically reduced run times have allowed the new simulator to
include heavy computational methods, such a nonlinear finite volumes or implicit reactions, for practical
use as well as supporting very large models in excess of 100 million cells.
Flexible well management control is provided through Python scripts. This allows users to customize
asset-specific control strategies via a well-known and straight-forward scripting language. Alternatively,
an internal optimization engine can manage the field subject to high level constraints provided by the user.
Well and well group control strategies follow three options. The first is a suite of programmed well
definitions, constraints and remedies that can be applied to a single well or an arbitrary group of wells.
The second user option is the use of a non-linear optimization engine to determine rate allocation. The
third option is a user-programmable method for well control via the Python programming language.
At the bottom of the operational stack of the programmed well controls lies a core of highly efficient
and parallelized routines that monitor and adjust wellhead and bottom-hole pressures, phase and com-
ponent flow rates, and phase fractions based on user-defined limits. Appropriate actions – such as reducing
flow-rates or switching wells from rate control to pressure control – are performed automatically at every
time-step. The next layer extends the automated well control to groups of wells that are not necessarily
confined to subtrees of the physical flow hierarchy, but may represent geographic location, reservoir
connectivity, and material balance groups, as examples. Well groups can have a multitude of constraints
linked to them, which the software evaluates in a pre-defined order, taking into account the dependencies
between individual constraints. For instance, the field may have limits for total liquid and gas production.
Depending on which constraint is violated and to what extend the production exceeds a given limit,
different well-sorting and reduction strategies need to be applied. The user can instruct the simulator about
preferences or use defaults for what course to take, e.g. reduce high water-cut wells first, followed by
wells with high gas-oil ratio. In order to not compromise parallel efficiency, constraints for well groups
are processed according to a user-specified schedule rather than each time step.
Although the built-in automated well control provides flexible solutions for a variety of standard well
and field management problems, it cannot possibly solve every problem that a user of the simulator may
pose. Consider the following trivial example: A field study requires an accurate prediction of gas
production and re-injection, taking into account flare restrictions, compressor availability, and fuel
requirements. Although the engine already provides built-in support for compressor constraints and flare
limits, the required accuracy for the particular study demands an exact calculation of the fuel costs based
on the type of compressor(s) used and the amount of gas that is re-injected. If the engine is providing the
necessary data, a small Python script can solve the problem, which is a better solution from a software
development point of view than thinking of all possible problems up-front, and adding the necessary
variables and functions to the simulator code. Real-world well and field management problems are of
course more complicated, and hence, a carefully designed support infrastructure for Python scripting has
been developed for this next generation reservoir simulator.
Parallel scalability tests were conducted on ExxonMobil’s Discovery computer. Discovery is com-
prised of over 300,000 cores on over 14,000 Cray XE6 and XC30 nodes. The Discovery machine was
specifically designed with a higher ratio of memory and bandwidth to total computational capability to
successfully solve industrial applications of HPC, like reservoir simulation. With a peak performance of
5.1 petaflops, Discovery is one of the most powerful massively parallel computers designed for industrial
research and applications. These following strong scalability tests exclude partitioning times, which can
be large for large partitions and is performed once at the beginning of the run.
Fig. 1 shows strong scalability experiments using the SPE10 comparative Solution Project Data set.
With 1.12 million cells this model is sufficiently large to perform strong scalability testing. The model was
run in various combinations of cores and nodes. The first configuration was on 8 nodes, each using one
core, denoted 8x1. This configuration has no internode, core-to-core communication; all communication
IPTC-18383-MS 3
is made node-to-node. Additional runs, increasing the number of used cores per node, where made to test
the parallel scalability of the simulator. A series of five runs were made with increasing number of used
cores per node: 1, 2, 4, 6 and 8 cores per node. The blue curve on Fig. 1 shows the strong scalability result
for 8 nodes with total core counts from 8 (8x1) to 64 (8x8). Parallel speedups are compared to a single
core simulation run. Excellent parallel speedup is seen for this test case. Speedups from the runs with one
or two cores per node showed linear speedup from 4 (4x1) to 512 (128x4) cores. In some cases superlinear
speedup (256 cores in the 128x2 configurations) is seen due to cache effects.
Figure 1—SPE 10 Strong Scalability Test
When more than 2 cores per node are utilized parallel speeds generally degrade from perfect speedup.
This is due to the memory bound nature of the reservoir simulation calculations. In these experiments the
computational nodes contain 8 cores but have 4 memory channels, dual four-core Xeon processors per
node. As reservoir simulation calculations require significant information passing between cores, the lack
of additional memory channels are bottlenecks to further parallel speedups. With increasing the used cores
from 4 to 8 cores per node only modest additional speedups are possible due to the limited range of
calculations that are independent and embarrassingly parallel (such as grid property calculations).
Provided sufficient memory channels, excellent parallel efficiency is obtained to 512 cores. Runs on
128x4 (512 cores) and 64x4 (256 cores) showed nearly perfect speedups. Comparing runs on 128x4 and
64x8, both 512 cores, we see a marked reduction in parallel efficiency in the 64x8 configuration. The 64x8
configuration has half the number of memory channels as the 128x4 run, resulting in lower memory
throughput and reduced parallel efficiency.
The runs on 128 nodes show an effective limit of around 2000 cells per core. At these small grid cell
to core ratios, interprocessor communication reduces parallel efficiency with additional cores. This limit
of 2000 cells per core provides useful guidance to users when configuring a massively parallel reservoir
simulation run.
A second test of strong scalability was run on a 14 million field model is shown in Fig. 2. For this
scalability plot the reference time was the 2 cpu run as the model could not fit in the memory available
on a single cpu. There is almost perfect parallel scalability from 2 to 2000 cores. At 16000 cores the
parallel efficiency is over 50%. At this level of massive parallelism, the 16000 core run is almost 5000
4 IPTC-18383-MS
times faster than the 2 core run. The 2 core run took 29 hours to complete while the 16000 core run
completed in 21 seconds.
Figure 2—Strong Scalability Test to 16000 Cores
Reservoir simulation on massively parallel computers can reduce simulation times over three orders of
magnitude. Achieving these speed increases requires careful construction of simulation calculation kernels
to minimize effective communication. Attention to modularized and generic calculations reduces code
complexity and facilitates parallel efficiency. Strong scalability experiments show parallel efficiency of
over 50% when grid cell to processor ratio is above 2000 cells per core. Linear and super-linear speedups
were observed over a wide range of processor configurations up to 1024 cores.
The magnitude of the real time speedup is impressive with massively parallel computing. With run
times capable of being reduced from days to seconds, tremendous gains in the reduction of reservoir
management study times are possible. It is now also possible to consider many more alternative reservoir
management scenarios than previously possible within the same time frame. This clearly opens the door
for practical use of automated scenario calculation technologies like inverse modeling and optimization.
The capability to reduce simulation turnaround times by over 1000 times via strong scalability should
alter the science that is routinely applied to reservoirs via simulation. Simulation technologies such as
advanced discretization, integrated reactions, and coupled geomechanics to name a few are not currently
practical for routine field application due to onerous runtimes these more computationally advance
techniques produce at field scale. Strong scalability parallelism offers the solution to make computation-
ally intensive methods practical to apply to routine field studies.

Beckner 2015

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Beckner 2015

Uploaded by

Copyright:

Available Formats

IPTC-18383-MS

General Parallel Reservoir Simulation

Copyright 2015, International Petroleum Technology Conference

Figure 1—SPE 10 Strong Scalability Test

Figure 2—Strong Scalability Test to 16000 Cores

You might also like