Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

URTEC-208344-MS

3D Seismic Facies Classification on CPU and GPU HPC Clusters

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Sergio Botelho, RocketML Inc; Vishal Das, Shell Global Solutions, US Inc; Davide Vanzo, Microsoft Corporation;
Pandu Devarakota, Shell Global Solutions, US Inc; Vinay Rao and Santi Adavani, RocketML Inc

Copyright 2021, Unconventional Resources Technology Conference (URTeC) DOI 10.15530/AP-URTEC-2021-208344

This paper was prepared for presentation at the SPE/AAPG/SEG Asia Pacific Unconventional Resources Technology Conference to be held virtually on 16–18
November, 2021.

The URTeC Technical Program Committee accepted this presentation on the basis of information contained in an abstract submitted by the author(s). The contents
of this paper have not been reviewed by URTeC and URTeC does not warrant the accuracy, reliability, or timeliness of any information herein. All information is the
responsibility of, and, is subject to corrections by the author(s). Any person or entity that relies on any information obtained from this paper does so at their own
risk. The information herein does not necessarily reflect any position of URTeC. Any reproduction, distribution, or storage of any part of this paper without the written
consent of URTeC is prohibited.

Abstract
The commonly used method of analyzing 3D seismic facies classification data by stitching together 2D
cross-sections can produce unrealistic discontinuities in geological features. Depending on the direction
in which the 2D cross-sections are taken, some features such as depositional geomorphology, channel
boundaries, and faults might not be fully visible, resulting in misleading labeled data and incorrect
interpretations. Hence, in this work, we propose the application of 3D machine learning models to solve
the problem of seismic facies classification. This introduces a two-fold challenge: first, using 3D models
substantially increases the memory requirements of the computational framework; second, neural network
design becomes increasingly challenging due to the higher number of parameters in the model and its
larger training time. We utilize distributed deep learning techniques in order to address these challenges,
and efficiently train 3D deep learning models for seismic facies classification on Microsoft Azure High-
Performance Computing (HPC) clusters. Using those techniques, we were able to train 3D networks with
millions of trainable parameters within a span of 3 hours, enabling rapid hyperparameter tuning and
different network architecture evaluation. We found that the networks performed better when the 3D seismic
input cuboids (and their corresponding labels) were longer along the depth dimension compared to the
X and Y axes. Data augmentation through the non-uniform overlap of the training cuboids (with more
overlap in areas of greater geological heterogeneities) was also shown to improve training performance.
Overall, domain knowledge of the problem along with distributed computing techniques helped improve
the efficiency and performance of deep learning-based 3D seismic facies classification.

Introduction
No table of contents entries found.
Seismic data plays a key role in understanding the subsurface, especially for exploration and production
industry. Apart from helping in identification of structural features, seismic data is also used in identification
of stratigraphic features that are potential targets for drilling. The identification of different geological
features from subsurface seismic image is known as seismic facies classification. The identified seismic
facies are used in making a geological model of the subsurface which helps in exploration of minerals
and hydrocarbons. In traditional workflows, specialized seismic interpreters process the 3D data to identify
2 URTEC-208344-MS

seismic facies. In recent times, large 3D seismic datasets that cover several hundreds of km2 have
become commonly available. To interpret such large datasets manually is challenging, time-consuming and
subjective, and often relies on the expertise of the interpreter.
With the advancements in machine learning, several researchers have attempted to solve the seismic
facies classification problem as a semantic segmentation task using machine-learning algorithms (Ulku
2020). Semantic segmentation is a fundamental problem in the fields of computer vision and machine
learning (Minaee 2020, Zhou Z. 2020, Rippel 2020). Convolutional neural networks (CNNs) proved

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


superior for the problem of seismic facies classification over traditional machine-learning algorithms
(Waldeland 2018, Di 2018), and have since been applied to the seismic facies classification problem (T.
Zhao 2018, Dramsch 2018, Wrona 2018, Liu 2020, Y. M. Alaudah 2019). However, most of the applications
of CNNs use 2D cross-sections and 2D network architectures due to Graphics Processing Unit (GPU)
device memory limitations. Although the results show high accuracy, the networks inherently suffer from
the lack of geologic continuity and consistency in the third dimension. This becomes crucial when certain
geological features such as faults and channels are only visible when the 2D cross-sections are taken along a
preferred direction, as shown in Figure 1. In addition, predictions from 2D networks, when stitched together
to produce the entire 3D seismic volume, might lead to unrealistic geological features due to boundary
effects, which are then post-processed using techniques such as Gaussian smoothing. These problems can be
circumvented by taking cross-sections along specific directions based on prior information or using random
cross-section orientations in the training data. Nevertheless, the ultimate remedy for such issues is to use
3D neural networks that directly consume 3D seismic data.

Figure 1—A channel cut in the subsurface is a potential target during exploration (left).
Channel cuts are visible in 2D slices that are perpendicular to the channel axis (middle) but
cannot be identified in slices taken along the channel axis (right). Source: (McHargue 2011)

Recently, a few studies have been published that used 3D CNNs for geophysical problems, and some
of these are specifically applied for seismic facies classification (Liu 2020, Pradhan 2020). In these works,
the entire seismic cube is divided into small sub-cubes with maximum size 64x64x64 (to a total of 262144
pixes) to overcome the computational challenges. However, the authors mention the need for larger input
sizes, which would provide a large enough receptive field and contextual information to distinguish between
distinct types of facies. In cases where the cube size is larger, the computational time to run 100 epochs of
training using a machine with 128 GB RAM and two 32GB Tesla V100 GPUs is around 48 hours (Pradhan
2020).
In this work, we have used distributed deep learning to address both challenges: large memory
requirements and long compute times. In the remainder of the paper, we will discuss the problems of
3D semantic segmentation, 3D seismic facies classification, and provide an overview of the 3D U-Net
network architecture. We will elaborate on its performance and memory footprint and discuss data-parallel
distributed deep-learning strategies and implementation details on CPU and GPU-based HPC clusters.
URTEC-208344-MS 3

Finally, we will provide functional specifications of Intel, AMD, and NVIDIA HPC clusters on Microsoft
Azure, and report strong scaling and accuracy metrics for experiments performed on these clusters.

Statement of Theory and Definitions


Semantic Segmentation
Semantic segmentation is a fundamental problem in computer vision and machine learning, where the

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


goal is to predict a separate class label to each pixel of the input image. Practical applications of semantic
segmentation include detecting lesions from magnetic resonance imaging (MRI) scans (Ronnenberger
2015), identifying geological features from subsurface seismic images (Waldeland 2018), and detecting
pedestrians, cars and cyclists for self-driving cars (Sagar 2020). On these visual recognition tasks, deep-
learning methods have outperformed traditional methods such as k-means clustering, spectral clustering,
Markov Random Fields, and Conditional Random Fields on several popular benchmarks (Ulku 2020,
Minaee 2020, Liang-Chieh 2017). There have been hundreds of deep learning-based segmentation methods
proposed in the literature where the key difference is the choice of network architecture used for training.
The most widely used network architectures include fully convolutional networks (FCNs) (Long 2015),
encoder-decoder based models (U-Nets) (Ronnenberger 2015), and multi-scale pyramid network-based
models (PSP-Net) (H. a. Zhao 2017).
In several applications, even though the data is inherently 3D, deep-learning training is done using 2D
slices. The 2D slices are often generated by selecting a particular slicing direction as the first dimension,
while the Z-axis, which might represent either time or depth axis in seismic data, is chosen as the second
dimension. This is done mainly because training 3D neural networks often has a memory footprint that far
exceeds the available device memory found on state-of-the-art GPUs. However, 2D convolutions fail to
leverage all available 3D information from adjacent slices, which can be critical for model performance.
Due to these GPU memory limitations, data scientists end up reducing either the model size, the batch
size, or the size of the 3D input samples in order to fit into GPU memory. This hyper-parameter tuning
process can increase turn-around time and often degrades model performance. To overcome these memory
limitations, data-parallel distributed deep-learning strategies are often used, where multiple replicas of
a model are simultaneously trained to optimize a single objective function. Typically, universities and
government research labs use either on-premises HPC clusters or supercomputers like Summit, Bridges2,
Frontera and Stampede2. While large-scale 3D semantic segmentation scaling studies are very sparse in
the literature, there has been recent work on exa-scale 3D segmentation studies on Summit (Houston 2018,
Laanait 2019), which is a world-class GPU-based HPC cluster with 27360 GPUs. On the other hand, AI
startups and enterprises of all sizes use on-demand HPC clusters of different configurations on public cloud
providers like Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP). Even
though cloud computing is the de facto infrastructure for most of the commercial deep-learning workloads,
to the best of our knowledge, large scale studies of 3D U-Net training on cloud do not exist at the time
of this writing.

3D U-Net performance analysis


CNNs are the most widely used network architectures for computer vision tasks. They consist mainly of
three types of layers: (1) convolutional layers, where a kernel of weights is convolved with the image to
extract features; (2) activation layers to apply non-linear transformations on feature maps; and (3) pooling
layers. Encoder-decoder networks are a class of models that learn to map inputs to outputs through a two-
stage process, where the encoder first compresses the input into a latent-space, and the decoder predicts the
output from that latent space representation.
U-Nets are encoder-decoder networks that also belong to a broader category of fully convolutional
networks, or FCNs (Long 2015, Dumoulin 2018). They were first proposed by Ronnenberger et al to tackle
4 URTEC-208344-MS

the problem of medical image segmentation and were shown to perform well with few training images
(Ronnenberger 2015). The key idea was to add convolutional layers to the up-sampling branch, making the
decoder section symmetric to the encoder, which gives the network its "U"-shape. In addition, features from
the decoder were combined with those from the encoder via skip connections, as shown in Figure 2, which
was shown to improve localization.

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Figure 2—Typical 3D U-Net architecture

Memory utilization during training is driven by 3 factors: (1) number of model parameters in the network;
(2) mini-batch size; and (3) sizes of temporary tensors created during forward and back-propagation. In
the case of U-Nets, assuming the feature maps have fixed sizes, the number of training parameters in the
model will be closely related to the network depth, defined here as the number of "levels" with symmetric
encoder/decoder blocks linked by skip connections (Figure 2). As their name suggests, skip connections
feed output from early layers in the network into deeper layers, thus skipping over certain operations. This
has been shown to significantly alleviate issues of accuracy degradation and gradient vanishing in deep
neural networks (He 2015).
Figure 3a shows how peak memory utilization increases with network depth while training U-Nets using
images of size 256x256x256 pixels (the batch size was fixed at one). It also shows the wall-clock time (in
seconds) to process one sample of size 256x256x256 pixels as a function of network depth. On the other
hand, Figure 3b shows peak memory and per-epoch wall-clock time during training of a depth-4 U-Net for
the same dataset as functions of input sample size (the batch size was fixed at 4 samples). Notice how peak
memory exceeds the capacity of the 24GB Tesla RTX6000 GPU for sample size 128x128x128 pixels, and
far exceeds even the capacity of the state-of-the-art 32GB Tesla V100 GPU for input size 256x256x256
pixels.
URTEC-208344-MS 5

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Figure 3—(a) Peak memory utilizatioSGDn, number of parameters and wall-clock time per sample during training
of multiple U-Net models of different depths with 2563 3D images. (b) Peak memory utilization and per-epoch wall-
clock time during training of a depth-4 U-Net model as functions of the input image size (the batch size is fixed at 4).

The results above demonstrate how infeasible it can be to run large-scale 3D segmentation problems on
single-GPU nodes, or even on small multi-GPU clusters, due to their limited device memory. In the next
section, we describe the distributed deep-learning solution we developed to address this challenge and show
how it enables massively parallel deep-learning jobs on HPC clusters with hundreds or thousands of CPU
or GPU cores.
Distributed Deep-Learning Solution. Our solution consists of several software components that
work together to allow seamless and efficient execution of parallel deep-learning workloads on cloud
infrastructure.
One of the most widely used techniques for performing distributed deep-leaning training is the so-called
data parallel strategy, in which identical copies of the model are simultaneously trained by independent
workers to minimize a common objective function (Ben-nun 2018). Such functions (e.g., root-mean square
error, binary cross-entropy, etc) compute an error between the network output and the expected (target)
data. For this to be possible, the training data samples (and their corresponding labels, in case of supervised
learning) must be equally split among the workers. Since stochastic optimization-based training already
entails splitting the data into mini-batches, this means one must further split the mini-batches into local mini-
batches, which are then asynchronously processed via forward and back-propagation steps. Local gradients
are computed by each worker and collectively averaged via an all-reduce operation. Once each worker
possesses the global gradient vector, they invoke the optimizer to update their local network parameters,
which are now in-sync with every other worker. This process is illustrated in Figure 4.
6 URTEC-208344-MS

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Figure 4—Data-parallel distributed deep learning, where each worker
processes a local subset of the global mini-batch during training.

Care must be taken, however, to ensure that results are independent of the number of workers utilized,
which is an important tenet of high-performance computing. To accomplish that, we start by augmenting
the dataset to make the total number of training samples Ns divisible by the number of workers p. Then,
each global mini-batch of size bs is divided into p equal parts, which become the local mini-batches to be
dispatched to the p workers, as shown in Figure 5. This ensures that the union of the local mini-batches
across all workers will be identical to the (global) mini-batch of the corresponding single-processor run.
Module rounding errors during gradient communication, the above scheme thus guarantees that the solution
will be independent of the number of workers. It also follows from the arithmetic that, for any global mini-
batch size bs chosen, the local mini-batches processed by workers at any given time will have the same size,
thus optimizing load balance.

Figure 5—Data splitting across workers in a parallel run: local mini batches are guaranteed
to always have identical sizes at any given time, promoting optimal load balance.

Our data distribution system contains several other features that have been shown to improve training
performance or accuracy, some of the most important being:

• Shuffling: the training dataset is randomized before being split into local mini-batches and
distributed to workers. A-priori data shuffling has been shown to improve convergence time and
model generalization for Stochastic Gradient Descent (SGD)-based training.
• Pre-loading: optionally, the local mini batches can be read upfront from disk and cached into
memory for the duration of the training process. The memory footprint of caching a subset of the
data is usually small on modern machines compared to the memory required during training, while
the runtime benefits of I/O-free execution can be significant.
URTEC-208344-MS 7

Hybrid Parallel Distribution Engine. Our parallelization strategy leverages both distributed-memory
Message Parsing Interface (MPI)-based communication primitives, which handle data transfer across
processes, as well as shared-memory Open Multi-processing (OpenMP) or Compute Unified Data
Architecture (CUDA)-based multi-threading, which exploits parallelism within a node. This combination
of shared-memory and message-passing paradigms within the same application is known as hybrid
programming (Duy 2012), and is illustrated in Figure 6. In the specific case of our deep-learning software,
MPI collective all-reduce calls are invoked to handle gradient communication and average across workers.

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


They make use of the ring-allReduce algorithm (Nouamane 2019), which has a complexity of O(Nw +
log(p)), where Nw is the number of model parameters. Since Nw>> p, we expect the communication
complexity to be almost independent of the cluster size. On the other hand, the engines we use internally
to execute forward and back-propagation can spawn their own Open-MP or CUDA threads, which
communicate only with other threads within the same MPI process. Since MPI communication only happens
outside critical multi-threaded regions, our parallelization strategy can be said to model the process-to-
process hybrid paradigm. The number of processes launched per node and the maximum number of threads
spawned by each process will depend on the specs of the cluster and details of the experiment, and are
chosen in such a way as to maximize resource utilization, minimize communication overhead and fulfill
memory requirements.

Results and Discussion


In this section, we demonstrate how our data-parallel distributed deep-learning strategy using HPC clusters
on the cloud can resolve the issue of memory constraints during large-scale training, reduce turn-around
times that would otherwise discourage hyper-parameter tuning, and highlight the relevance of CPU-based
clusters as a compelling alternative when GPU device memory is insufficient to accommodate even a single
data sample during forward and backward propagation.

Cluster Specifications
HPC based ML/AI workflows demand premier computational performance, scalability and cost efficiency.
Microsoft Azure provides supercomputing-grade infrastructure built to optimize time to solution and
minimize costs by leveraging cloud-leading flexibility and scalability. Azure HPC Virtual Machines (VMs)
can often match or even outperform on-premises cluster capabilities due to their adoption of leading-
edge HPC technologies, e.g., performance optimized hardware, NVIDIA GPU acceleration and InfiniBand
networking. We have performed our 3D semantic segmentation studies using the following Intel, AMD and
NVIDIA-based HPC clusters as discussed below.

Intel HPC Clusters


Intel based clusters are built with HC-series VMs optimized for compute bound applications. HC44rs VMs
provide 44 non-hyperthreaded Intel Xeon Platinum 8168 processor cores and 8 GB of RAM per CPU core
and a maximum memory bandwidth of 190 GB/s. High-performance interconnection is provided by 100
Gb/sec Mellanox InfiniBand fabric with non-blocking fat tree topology.

AMD HPC Clusters


For AMD powered clusters we employed HBv2-series VMs optimized for memory bound applications
thanks to a maximum memory bandwidth of 340 GB/s. HB120rs_v2 VMs provide 120 AMD EPYC 7742
processor cores with no hyperthreading and 4 GB of RAM per CPU core. High-performance interconnection
is provided by 200 Gb/sec Mellanox HDR InfiniBand fabric with non-blocking fat tree topology.
8 URTEC-208344-MS

NVIDIA HPC Clusters


While Azure provides many GPU accelerated solutions, NDv2-series VMs are specifically designed to
support scale-up and scale-out deep-learning training applications. The ND40rs\_v2 VMs are powered by 8
NVIDIA Tesla V100 GPUs interconnected with NVLink (NVIDIA's proprietary inter-GPU communication
protocol) and with 32 GB of GPU memory each. The host is equipped with 40 non-hyperthreaded Intel
Xeon Platinum 8168 cores and 16.8 GB of RAM per CPU core. Maximum scale-out performance is ensured
by 100 Gb/s InfiniBand fabric in a non-blocking fat tree topology.

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Strong Scaling Studies
We have carried out two major sets of strong scaling studies: (1) training a depth-4 3D U-Net model on a
large NVIDIA GPU-based cluster, and (2) training a large depth-7 3D U-Net model with 1.3B parameters
using seismic data on Intel and AMD HPC clusters. Both studies were performed on Microsoft Azure cloud
infrastructure.

3D U-Net model training on GPU cluster


The first set of experiments were performed on a GPU cluster of NDv2-series VMs, each containing 8
NVIDIA Tesla V100 GPUs with 32GB of memory per device. The training dataset consisted of 5000 blocks
of size 240 × 320 × 32. We trained a depth-4 3D U-Net with 13.6M parameters using the SGD-based Adam
optimizer (Kingma 2015). The local mini-batch size was fixed at 4 to maximize resource utilization (each
sample required ~8GB during training).
Figure 7 shows the wall-clock time per epoch, as well as the corresponding speedup, for cluster sizes
going from one all the way to 512 GPUs on 64 nodes with 8 GPUs/node. It demonstrates the ability of our
distributed deep-learning solution to scale almost linearly all the way to 512 GPUs, reducing the runtime
per epoch from 1h 40mins to only 19secs (a ~312x speedup).

Figure 4—Process-to-process hybrid distribution paradigm: processes


communicate via MPI and spawn local threads that exploit intra-node parallelism.
URTEC-208344-MS 9

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Figure 7—Srtong scaling for training a depth-4 3D U-Net model on a cluster of Tesla V100 GPUs.
The labels above the bars indicate the number of nodes and the number of processes per node.

Massive 3D model training on CPU clusters


We also performed strong scaling studies on CPU-based HPC clusters. The main advantage of CPUs over
GPUs is their vastly larger RAM memory, which allows one to train deeper networks with larger datasets.
In fact, in the extreme case where processing a single data sample (e.g., one block of seismic data) already
requires more memory than what is available on a GPU, clusters of CPUs might be the only way to go. Our
training data consisted of 5000 blocks of size 64 × 64 × 256 pixels obtained by splitting a single volume
of seismic data into multiple blocks with varying degrees of overlap. We trained a large depth-7 3D U-Net
with 1.23B parameters using the SGD-based Adam optimizer. The local mini-batch size was fixed at 32,
which requires 205GB of memory during training.
The Intel-based cluster we used consisted of HC44rs-series VMs with 44 Intel Xeon Platinum cores and
352GB total memory. The AMD-based cluster had HB120rs-series VMs with 120 EPYC-7742 cores and
480GB total memory. Figure 8 shows strong scaling results for training on those two platforms for clusters
of size one all the way to 64 nodes (to a total of 2816 Intel CPU cores and 7680 AMD CPU cores). Speedup
is also excellent all the way to 128 processes on both CPU clusters. A more detailed comparison reveals that
epoch wall-clock times are 2-3X lower on the Intel-based cluster, due to optimizations of MKLDNN (Math
Kernel Library for Deep Neural Networks) for Intel instruction sets. However, scaling is slightly better on
the AMD system at large cluster sizes, due to the superior memory bandwidth of the EPYC architecture and
the 200Gb/sec InfiniBand network speed on those machines.
10 URTEC-208344-MS

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Figure 8—Strong scaling results for training a 3D U-Net model with 1.23B parameters on CPU clusters: Intel (left)
and AMD (right). The labels above the bars indicate the number of nodes and the number of processes per node.

Datasets and deep learning hyperparameters


Our experiments were performed on the labelled Parihaka seismic dataset introduced during the 2020
Machine Learning and Interpretation Workshop of the Society of Exploration Geophysicists (SEG). We used
asymmetric input cuboids of sizes 64 × 64 × 256 pixels and 64 × 64 × 512 pixels to account for z-direction
variations in geological properties during training (Figure 9). Through data augmentation, we generated
training samples from a single volume using non-uniform overlaps along the depth direction. We posed this
problem as a multi-class classification task where each label corresponds to a different geological feature.
The Negative Log-Likelihood (NLL) loss function was used with and without class weights to address class
imbalance. We have trained a depth-4 3D U-Net architecture with 13.6M parameters using a mini-batch
size of 64 for 50 epochs using the Adam optimizer with a learning rate = 10−4 on a 4-node (16-process) HC-
series CPU cluster for all our experiments. Since 3D U-Net is a fully convolutional network, even though
the training is done on smaller cuboids, inference can be done on the entire 3D test volume. On Table I we
summarize our choices of loss function, sample size and network architectures used for training.
URTEC-208344-MS 11

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Figure 9—3D seismic volume labeled with seismic facies information. Each color represents a
distinct seismic facies type. The black outline shows a sub-cube extracted from the main volume.

Table I—Multiple choices of loss function, data augmentation strategy, input sample size, and
network architecture used for training a depth-4 3D U-Net with the Parihaka seismic dataset.

Knobs Values

Loss functions NLL, Dice, Jaccard, Focal, Lovasz

Training data generation Increasing z-overlap, Constant 90% z-overlap, Random overlap

Sample size (in pixels) (64 × 64 × 256), (128 × 32 × 256), (128 × 32 × 512), (256 × 16 × 512)

Number of training samples 1288 to 4160

Network architecture 3D U-Net with different depths; 3D U-Net with different backbones –
Resnet, EfficientNet

On Table II, we show the resulting precision, recall, and F1-score (Dice coefficients) for different classes
achieved by training 4160 samples of size 64 × 64 × 256 pixels using NLL loss function and 3D U-Net
architecture with Resnet backbone. In Figure 10, we show the confusion matrix for the test volume using
the model trained with 64 × 64 × 256 pixels input cuboids. Inference was performed on the entire 1006 × 78
× 590 test block and took 50 secs on a single HC-series VM. Due to high-class imbalance in labeled data,
we observe higher accuracy for classes 0-3 compared to classes 4 and 5. In Figure 11, we show that both
majority and minority classes are predicted well for a slice at × = 723 in the test volume.
12 URTEC-208344-MS

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Figure 10—Confusion matrix for the test dataset in seismic facies classification. The first four seismic facies
types have higher accuracy compared to the last two types, which are underrepresented in the dataset.

Figure 11—True vs. predicted seismic facies corresponding to cross-sections at □ =732 in the test volume

Table II—Highest Precision, Recall, F1-score (Dice coefficient) results were achieved by training 3D U-Net architecture with
Resnet backbone on 4160 samples of size 64 × 64 × 256 with constant 90% overlap in z-direction and using NLL loss function.

Class Number Precision Recall F1-score Support

0 0.94 0.84 0.89 3464363

1 0.89 0.93 0.91 13806442

2 0.91 0.90 0.91 1012693

3 0.97 0.96 0.97 12163243

4 0.80 0.69 0.74 537698

5 0.96 0.93 0.94 5584649


URTEC-208344-MS 13

Class Number Precision Recall F1-score Support

Accuracy 0.93 36569088

Macro Avg 0.91 0.88 0.89 36569088

Weighted Avg 0.93 0.93 0.93 36569088

Conclusions

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


Designing an optimum 3D CNN to solve a 3D seismic facies classification problem with high accuracy
requires significant efforts in terms of compute time and memory requirements. In this work, we used
distributed deep learning to significantly reduce the compute time and memory requirements in designing
an optimum 3D network. This helped in performing rigorous hyper-parameter tuning to obtain the best-in-
class results for the seismic facies classification problem using 3D CNNs. The strategy for the design of
experiments to optimize the subset of hyper-parameters in the 3D CNNs that have the highest influence
on the results was done using the hyper-parameter tuning results of comparable 2D CNNs. Distributed
deep learning also significantly helped in reducing the inference time for the prediction of results. The
optimized 3D network obtained an accuracy of 93% on the test dataset. Several problems in modeling
and understanding unconventional reservoirs would require the analysis of 3D data using deep-learning
techniques. In specific, identification of subtle stratigraphic changes in unconventional oil shale and shale
gas reservoirs would require experimentation with different FCNs like Pyramid Scene Parsing Networks
(PSP-Net) that can have 100s of millions of model parameters. In these cases, a distributed deep learning
solution described in this work would become essential to run large-scale experiments to perform hyper-
parameter search and arrive at an optimal 3D deep learning model. As future work, we will experiment on
other datasets including unconventional reservoirs, new FCN architectures, and compare the results with
2D state-of-the-art performance.

References
Alaudah, Y., Michałowicz, P., Alfarraj, M., and AlRegib, G. 2019. "A machine-learning benchmark for facies
classification." Interpretation, 7(3) SE175–SE187.
Alaudah, Y., Michałowicz, P., Alfarraj, M., and AlRegib, G. 2019. "A machine-learning benchmark for facies
classification." Interpretation 7(3) SE175–SE187.
Ben-nun, T. and Hoefler, T. 2018. "Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency
Analysis." arXiv preprint arXiv:1802.09941v2.
Di, H., Wang, Z., and AlRegib, G. 2018. "Why using CNN for seismic interpretation? An investigation." In SEG Technical
Program Expanded Abstracts 2018, 2216–2220. Society of Exploration Geophysicists.
Dramsch, J.S. Lüthje, M. 2018. "Deep-learning seismic facies on state-of-the-art CNN architectures." Society of
Exploration Geophysicists. 2036–2040.
Dumoulin, V. and Visin, F. 2018. "A guide to convolution arithmetic for deep learning." arXiv:1603.07285v2.
Duy, T., Yamazaki, K., Ikegami, K., and Oyanagi, S. 2012. "Hybrid MPI-OpenMP Paradigm on SMP clusters: MPEG-2
Encoder and n-body Simulation." arXiv preprint arXiv:1211.2292.
He, K., Zhang, X., Ren, S., Sun, J. 2015. "Deep Residual Learning for Image Recognition." arXiv:1512.03385.
Houston, T.K., Treichler, S., Romero, J. et al. 2018. "Exascale Deep Learning for Climate Analytics." arXiv 1810.01993.
Kingma, D. P. and Ba, J. 2015. "Adam: A method for stochastic optimization." Proc. Int. Conf. Learning Representations
(ICLR), 2015.
Laanait, N., Romero, J., Yin, J. et al. 2019. "Exascale Deep Learning for Scientific Inverse Problems." arXiv preprint
arXiv:1909.11150v1.
Liang-Chieh, C., Papandreou, G., Schroff, F., Adam, H. 2017. "Rethinking Atrous Convolution for Semantic Image
Segmentation." arXiv:1706.05587.
Liu, Mingliang and Jervis, Michael and Li, Weichang and Nivlet, Philippe. 2020. "Seismic facies classification using
supervised convolutional neural networks and semi-supervised generative adversarial networks." Geophysics 85(4)
O47–O58.
14 URTEC-208344-MS

Long, Jonathan and Shelhamer, Evan and Darrell, Trevor. 2015. "Fully Convolutional Networks for Semantic
Segmentation." arXiv preprint arXiv:1411.4038v2.
McHargue, T., Pyrcz, M.J., Sullivan, M.D., Clark, J.D., Fildani, A., Romans, B.W., Covault, J.A., Levy, M., Posamentier,
H.W. and Drinkwater, N.J. 2011. "Architecture of turbidite channel systems on the continental slope: patterns and
predictions." Marine and Petroleum Geology, 28(3) 728–743.
Minaee, S., Boykov, Y., Porikli, F. et al. 2020. "Image segmentation using deep learning: A survey." arXiv preprint
arXiv:2001.05566v5, 2020.
Nouamane, L., Romero, J., Yin, J. et al. 2019. "Exascale Deep Learning for Scientific Inverse Problems." arXiv preprint

Downloaded from http://onepetro.org/URTECAP/proceedings-pdf/21APUR/1-21APUR/D012S001R026/2528855/urtec-208344-ms.pdf/1 by Algerian Petroleum Inst. user on 08 May 2022


arXiv:1909.11150v1.
Pradhan, A. and Mukerji, T. 2020. "Seismic inversion for reservoir facies under geologically realistic prior uncertainty
with 3D convolutional neural networks." SEG Technical Program Expanded Abstracts 2020. Society of Exploration
Geophysicists. 1516–1520.
Rippel, O., Weninger, L., and Merhof D. 2020. "Auto-ML segmentation for 3d medical image data: Contribution to the
MSD challenge 2018." arXiv preprint arXiv:2005.09978v1, 2020.
Ronnenberger, O., Fischer, P., and Brox, T. 2015. "U-Net: Convolutional Networks for Biomedical Image Segmentation."
arXiv preprint arXiv:1505.04597v1.
Sagar, A, and Soundrapandiyan, R. 2020. "Semantic Segmentation With Multi Scale Spatial Attention For Self Driving
Cars." arXiv preprint arXiv:2007.12685.
Ulku, I. and Akagunduz, E. 2020. "A survey on deep learning-based architectures for semantic segmentation on 2d
images." arXiv preprint arXiv:1912.10230v2, 2020.
Waldeland, A., Jensen, A.C., Gelius, L, and Solberg, A. 2018. "Convolutional neural networks for automated seismic
interpretation." The Leading Edge, Vol. 37, Num. 7 529–537.
Wrona, T., Pan, I., Gawthorpe, R., and Fossen, H. 2018. "Seismic facies analysis using machine learning." Geophysics
83(5) O83–O95.
Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya. 2017. "Pyramid Scene Parsing
Network." arXiv preprint arXiv:1612.01105.
Zhao, Tao. 2018. "Seismic facies classification using different deep convolutional neural networks." SEG Technical
Program Expanded Abstracts 2018. Society of Exploration Geophysicists. 2046–2050.
Zhou Z.,, Rahman M., Tajbakhsh N.,, and Liang, J. 2020. "Unet++: Redesigning skip connections to exploit multiscale
features in image segmentation." arXiv preprint arXiv:1912.05074v2, 2020.

You might also like