OsokeyServerlessComputingSeismicWhitepaperAWS 2019

Seismic Data Management and beyond
on Amazon Web Services (AWS)
Osokey

Published: 15th September 2019
Updated: 30th October 2019
Contact: James Selvage (james@osokey.com)

Visit: https://osokey.com
1 / 24
Copyright 2019 Osokey Ltd. All Rights Reserved.

Contents
Abstract 3
1. Introduction 4
a. What is seismic data? 5
2. AWS Services used in the Architecture 6

a. Amazon S3 7
Upload 7
Multipart objects 8
Search 8
Archiving 8
Security, Permissions & Activity Logging 9
Durability 10
Events 10
b. Amazon Athena 10
c. AWS Lambda 11
d. Amazon DynamoDB 12
3. Implementation 15
4. Performance 16
5. Extending the Architecture 17

a. Dashboard Metrics - Amazon CloudWatch 17
b. Data Management Portal - Amazon API Gateway and AWS Lambda 17
c. Map Based Portal - Amazon API Gateway and AWS Lambda 18
d. Viewing Post-stack Seismic - Amazon API Gateway and AWS Lambda 19
e. Viewing Pre-stack Seismic - Amazon API Gateway and AWS Lambda 19
f. Duplicate Detection - Amazon S3 Batch Operations 20
g. Transforming SEG-Y data - AWS Lambda 20
6. Conclusion 22

2 / 24

Abstract
SEG-Y and SEG-D are oil & gas industry standard file formats for seismic data. This
whitepaper describes a serverless solution for cloud-based management of seismic
data that enables a lift and shift of the SEG-Y or SEG-D format data into Amazon
Simple Storage Service (S3). The event-driven architecture can ingest seismic data
at any scale and automatically generates a file inventory that can be searched using
Amazon Athena. Each seismic data file progresses through custom code running on
AWS Lambda that automatically captures metadata and stores it in Amazon
DynamoDB. For each SEG-Y file, a trace level index is created to enable the
architecture to be extended beyond data management, e.g. viewing seismic sections
and gathers or transforming seismic data into a streaming format for on-premise
geoscience applications. Raw read performance from multiple AWS Lambda
functions reading the same 1TB SEG-Y file achieved an aggregate read performance
of 42 GB/s. The architecture enables on-demand compression of seismic data using
parallel AWS Lambda functions to perform read, compress and write operations,
achieving a rate of 2.8 GB/s. It is shown that Amazon S3 Batch Operations provides
a cost effective way to bulk process files and it is used to perform duplicate
detection of 592,921 SEG-D files, with a total AWS cost of less than $15 USD.
3 / 24

1. Introduction
SEG-Y and SEG-D are oil & gas industry standard file formats. A major oil & gas
company is likely to have seismic data spanning decades and consuming Petabytes
(PBs) of storage. Furthermore, this seismic data will be stored across a variety of
different storage mediums depending on operational requirements. Table 1 shows
the typical storage mediums that are used:

Storage medium Operational requirement
Network-Attached Storage (NAS) Live frequently accessed data
Magnetic tapes Archived infrequently accessed data

Offsite storage
Backups
USB devices Sharing data internally and with third-parties

Magnetic tapes
Table 1 - Seismic data is typically stored on different storage mediums depending on
the operational requirement.

This data management approach has led to seismic data duplication at both an
individual company and an industry scale. Some of this data duplication is for vital
operational requirements, like data backup. However, data duplication also occurs
because of inefficiencies caused by disconnected storage medium, manual data
management practices and the inability to reliably know what data the company has.

The main challenges facing companies are:
● Knowing what seismic data is available within a geographic location
● Knowing what the data is that has been archived
● Insufficient metadata to reliably determine whether seismic meets a
geoscientists’ requirements

This whitepaper focuses on Osokey’s serverless solution for addressing these
challenges. The Amazon Web Services (AWS) Architecture can ingest seismic data
at any scale, from individual files to enterprise scale, by deploying four main
components:
1. Storage: Amazon S3 for storing SEG-Y and SEG-D as objects.
2. Compute: AWS Lambda for extracting metadata from the seismic objects.
3. Database: Amazon DynamoDB for storing the extracted metadata.
4. Search: Amazon Athena, Amazon DynamoDB and AWS Lambda.
4 / 24

a. What is seismic data?

SEG-Y and SEG-D are oil & gas industry standard formats, developed by the Society
of Exploration Geophysicists (SEG), for storing geophysical data. In general, SEG-D is
used for field recordings of seismic data, and SEG-Y is intended for “seismic data
exchange”. This means that SEG-Y is the one format that virtually all seismic related
software reads.

SEG-Y is widely criticised for being just a big binary blob of data that limits fast
reading and writing to one direction out of 3, 4 or 5 orthogonal directions (inline,
crossline, depth/time, offset, azimuth). This has resulted in duplication of the data as
most software has written alternate formats, which improve performance for reading
in a subset of those directions. This has generally only improved the performance of
that specific application, because other applications did not read that closed format,
and this has caused repeated duplication of data.

Hardware innovations have made Solid State Hard Drives (SSDs) more common and
the massive parallel capabilities of object storage systems, like Amazon S3, mean
that accessing random parts of a SEG-Y file can be efficient. In this architecture we
create a trace level index for each SEG-Y or SEG-D file. This is a lookup that enables
retrieval of inline, crossline, depth/time, offset or azimuth from a given file using
parallel read operations.

5 / 24

2. AWS Services used in the Architecture

Figure 1 shows a summary of the AWS Architecture described in this whitepaper.
The architecture connects together Amazon S3, Amazon Athena, Amazon
DynamoDB and AWS Lambda to provide a serverless solution for the management
and usage of seismic data. Seismic data is stored on Amazon S3 and can be
archived in place. Inventory reports are automatically created and can be searched
using Amazon Athena. A database of pertinent metadata for seismic data files is
automatically created and stored in Amazon DynamoDB. This provides trace level
indexing of the seismic data and makes the data ready to use within the cloud.

This architecture is deployed using AWS CloudFormation templates. CloudFormation
templates simplify provisioning and management on AWS. The templates define the
AWS services required by the architecture, allowing you to treat your infrastructure
as code and reliably deploy to different AWS Regions.

Figure 1 - The architecture described in this whitepaper connects different AWS
Services to create a serverless seismic data management solution.
6 / 24

a. Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers
industry-leading scalability, data availability, security, and performance. In this
section we will see that S3 is more than just storage and provides many benefits for
seismic data management.
Upload
Each SEG-Y or SEG-D file is stored as an object on S3 within an S3 Bucket. Seismic
data can be added to the S3 bucket via the internet or by using an AWS Snowball.

Standard company folder structures can be mirrored to help organise files within the
S3 Bucket. For example, by storing SEG-Y data with the following prefixes:

<country>/<survey name>/<attribute>/filename.segy

It is possible to drill-down to files by prefix directly from the AWS Management
Console for S3. Figure 2 shows an example of public domain seismic data from the
Equinor Volve Dataset stored in an S3 Bucket, 9f7f65067d31-oso-segy, under the Key
prefix:
nor/equinor/ST0202vsST10010_4D/Stacks/<filename>.segy

The combination of bucket and key define a unique URL for a given object, e.g.
https://9f7f65067d31-oso-segy.s3-eu-west-1.amazonaws.com/nor/equinor/S
T0202vsST10010_4D/Stacks/ST0202ZDC12-PZ-PSDM-KIRCH-FULL-D.MIG_FI
N.POST_STACK.3D.JS-017534.segy

Figure 2 - Example from AWS Management Console of S3 showing SEG-Y files. These
files are stored in the S3 Glacier Storage Class.

7 / 24

The S3 Bucket, 9f7f65067d31-oso-segy, is not publicly accessible so following the
above link will result in an Access Denied Error. In this architecture seismic data
within this bucket is kept secure by limiting access with AWS Identity and Access
Management Permissions, which will be described in more detail in the Security,
Permissions & Activity Logging and Implementation sections. The objects are also
encrypted at rest and in transit.
Multipart objects
Amazon S3 supports multipart uploads of SEG-Y files, which means that large
objects are stored in many smaller parts. The number of parts can be determined
from the object’s ETag:

For multipart uploads the ETag is the MD5 hexdigest of each part’s MD5 digest
concatenated together, followed by the number of parts separated by a dash.

So the number of parts can be determined from the ETag. For example, a 119 MB
SEG-Y in our S3 Bucket has the ETag, a8abbeb338a3e0f689186ef78f95e904-8. The -8
indicates that this object is made up of 8 parts. A 1TB SEG-Y file in our S3 Bucket is
made up of 8097 parts (ETag, 29014b75882070511aa863b2f90b2e37-8097). This is
transparently handled by Amazon S3, the files appear as single objects, and it means
that the SEG-Y is effectively “bricked” automatically. Therefore, Amazon S3 natively
supports multiple parallel reads of SEG-Y format objects. This will be shown in more
detail in the Extending the Architecture - Transforming SEG-Y data and Performance
sections.
Search
To enhance the search capabilities of the S3 Bucket the architecture utilises daily S3
Inventory reports that can be queried using AWS Athena. This means that individual
SEG-Y files can be found based on keywords in the object’s key, size and storage
class using SQL Queries. The search results can be downloaded as a CSV file.
Archiving
The storage class of an individual object can be changed depending on usage
requirements. In this architecture a mixture of S3 Standard, S3 Glacier and S3 Glacier
Deep Archive are utilised to accommodate the different operational requirements of
seismic data (Table 2). Using Amazon S3 in this way eliminates the need for
magnetic tape storage.

8 / 24

S3 Storage Class Operational requirement
S3 Standard Live frequently accessed data
S3 Glacier Archived infrequently accessed data

S3 Glacier Deep Archive Offsite storage
Backups
AWS Snowball Sharing data internally and with third-parties

S3 Standard
Table 2 - Amazon S3 supports different Storage Classes that be aligned with different
operational requirements.

The Object Lifecycle Management features of S3 enable the transition of an object’s
storage class based on tags that are directly associated with an object (Figure 3). In
this architecture if an archive tag is added to an object it is automatically
transitioned to the S3 Glacier Storage Class for archive. Using a tag to manage
archiving of objects provides greater flexibility than an object age-based approach.

Archived objects can be restored back to S3 Standard Storage within hours and for a
specified time period before they are transitioned back to Glacier.

Figure 3 - Key-value tags can be added to objects. The osoArchive tag is used by a
Lifecycle Rule to transition the object to the Glacier Storage Class.

Security, Permissions & Activity Logging

In this architecture, all objects that are stored on Amazon S3 are encrypted at rest.
This means that Amazon S3 encrypts an object before saving it to disk in the data
centres. The architecture can also use Server Access Logging and Object-Level
logging to monitor activity on the S3 bucket. These provide detailed access logs that
can be used to audit access and identify anomalous access patterns.

The default S3 bucket policies prevents public access to objects within the bucket. A
bucket policy is a resource-based AWS IAM policy. You can add a bucket policy to a
bucket to grant other AWS accounts or IAM users access permissions for the bucket
9 / 24

and the objects in it. In this architecture, IAM roles with policies that enable access
to the bucket are used to enable the Osokey AWS Account to read seismic data in a
customer’s AWS Account. This approach is described in the Implementation section.
Durability
Amazon S3 helps to ensure data durability by synchronously storing your data
across multiple facilities. In Figure 2, the bucket is located in the AWS Region,
eu-west-1. This region has three isolated availability zones and the seismic data in
the S3 Standard or S3 Glacier Storage Classes is redundantly stored within each
zone. This benefit is included in the cost of Amazon S3 per GB pricing.

S3 Batch Operations
Amazon S3 Batch Operations provides a way to process objects in bulk that are
stored on S3. For example, you can copy each object to another bucket, set tags on
each object, restore each object from Glacier or invoke an AWS Lambda function on
each object. The latter operation can be used to perform a consistent data operation
on a seismic file. Osokey recently performed SEG-D duplicate detection of 592,921
SEG-D files using Batch Operations. This is described in the Extending the
Architecture - Duplicate Detection section.
Events
The Amazon S3 notification feature enables you to take actions whenever specific
events happen on your buckets. In this architecture the events in Figure 4 are used to
trigger a Lambda function whenever objects with extensions .segy, .SEGY, .sgy or
.SGY are created in the bucket. The Lambda functions are in the Osokey AWS
Account and run custom Python code to automatically ingest the new SEG-Y files.
This includes the automatic extraction of pertinent metadata and creating a trace
level index.

Figure 4 - The S3 notification feature is used to trigger a Lambda function whenever a
new SEG-Y file is added to the S3 bucket.
b. Amazon Athena
Amazon Athena is a serverless, interactive query service that can analyse data in
10 / 24

Amazon S3 using standard SQL. In this architecture AWS Athena is used to enable
the searching of the daily S3 Inventory reports. For example, the results in Figure 5
are formatted for display from an AWS Athena query where the object key contains
“P000” and was performed using an SQL query of:

> SELECT * FROM <Athena Table> WHERE key LIKE '%P000%'

Figure 5 - The results from an Athena query based on a keyword search of seismic
filenames. The .CSV file has been formatted for display in a web browser.

Figure 6 shows a CSV file downloaded from the result of the Athena query:

> SELECT storage_class, count(*) FROM <Athena Table> GROUP BY
storage_class

This provides a way to Audit how many objects are in a given storage class.

Figure 6 - A CSV downloaded from an AWS Athena query that shows data by Amazon
S3 Storage Class.
c. AWS Lambda
AWS Lambda lets you run code without provisioning or managing servers. In this
architecture AWS Lambda functions run custom Python code and are triggered by
events, e.g. when a new seismic file is uploaded to the AWS S3 Buckets (Figure 7).
Multiple AWS Lambda functions are chained together in the ingestion pipeline shown
below. Multiple SEG-Y files are processed in parallel.

This provides a highly scalable and automated metadata extraction approach. It
creates the required metadata to start utilising the ingested seismic data. For
example, Osokey recently ingested over 500,000 SEG-D files uploaded using an AWS
11 / 24

Snowball with this architecture. These SEG-D files can be searched using AWS
Athena and transitioned to the Amazon S3 Glacier or Deep Glacier Storage Classes.

These high levels of automation enable Osokey to offer this seismic ingestion
service on a pay-as-you-go basis, starting at a cost of 0.24 USD per GB.

Figure 7 - Scalable SEG-Y ingestion service using AWS Lambda to custom code.
Results from the Lambda functions are stored in the customer’s AWS Account.
d. Amazon DynamoDB
Amazon DynamoDB is a key-value and document database that delivers single-digit
millisecond performance at any scale. In this architecture Amazon DynamoDB
provides a flexible metadata store for ingesting seismic data. It is used for both
transient metadata created during ingestion and for the permanent storage of
metadata that enhances search capabilities and enables trace level indexing of a
SEG-Y or a SEG-D file. Like Amazon S3, all data that is stored in DynamoDB is
encrypted at-rest.

Figure 8 - Early metadata added to the seismic ingestion DynamoDB table. This
metadata is updated as the seismic data progresses through the ingestion pipeline.
12 / 24

Figure 8 shows some of the early metadata that is added per ingested files. Each file

is given a UUID and this is the partition key of the main table. For this table we do not
use a sort key.

As a file progresses through the ingestion pipeline additional metadata is
automatically added. Figure 9 shows metadata from a SEG-Y file binary header block
added to the item. DynamoDB provides the flexibility to add extra metadata to items
as required.

Figure 9 - As SEG-Y or SEG-D file progresses through the ingestion processes
additional metadata is captured in DynamoDB. Metadata captured from the Binary
Header is shown above.

To support different types of queries across metadata in the DynamoDB table Global
Secondary Indexes (GSIs) are used. A GSI can contain a selection of attributes from
the main table, but organised by a different primary key. U
p
to 20 global secondary
indexes (default limit) can be created per table.

In Figure 10 a GSI is created based on the Bucket and Key attributes and projects a
subset of attributes from the table. This enables a query for SEG-Y files that begin
with “aus/Gippsland/”.
13 / 24

Figure 10 - Global Secondary Indexes (GSIs) are used to enable different types of
queries.

In Figure 11 a seismic data management table is formatted based on another GSI
query for SEG-Y data ingested between 15th December 2018 to 22nd December
2018. This GSI has a primary key and sort key based on the bucket and timestamp of
when a seismic data file was added.

Figure 11 - Global Secondary Index (GSI) query results can be formatted to produce
useful tables for seismic data management.

14 / 24

3. Implementation
The implementation of the architecture separates Osokey's code from our
customer's seismic data. This enables Osokey to update our code for all customers
and enables each customer to retain control of their data in their own AWS Account
(Figure 12).

Figure 12 - The architecture separates Osokey’s code from customers data by
connecting separate AWS Accounts using IAM permissions.

This permissions model is enabled through the use of AWS Identity and Access
Management (IAM). In each customer cloud account IAM roles are used to grant
cross-account access to Osokey. These roles have policies that limit access, by
Osokey, to the minimum permissions needed for Osokey to provide the seismic
ingestion service, i.e. the AWS Services and data that Osokey code can access.
Whenever Osokey’s code is invoked by a customer’s cloud account the appropriate
role is adopted to service the request and return metadata to the customer’s
Amazon S3 buckets and Amazon DynamoDB tables.

Figure 13 shows a summary from the IAM console of the Admin role that can be
adopted by Osokey to perform operations on the customer’s AWS Account. The Last
Accessed column is shown and the IAM policies that are granting access to a given
AWS Service, e.g. Amazon DynamoDB.

15 / 24

Figure 13 - Customer’s can use the IAM Access Advisor for visibility on when the
Osokey AWS Account is accessing AWS Services in their AWS Account.
4. Performance
The architecture enables SEG-Y files to be read in parallel by multiple AWS Lambda
functions. For example, the 1TB SEG-Y file made up of 8097 parts was read with 740
Lambda functions. At peak, 500 Lambda functions were concurrently reading from
the SEG-Y object and achieved a peak aggregate read performance of 42 GB/s
(gigabytes per second).

If the submission of the Lambda functions is included then the total system time to
have the entire SEG-Y file available in memory to perform operations on was 52
seconds, this equates to a system read performance of 19.7 GB/s.

16 / 24

5. Extending the Architecture

In this section we illustrate how the seismic data, metadata and trace level indexing
can be used to extend the architecture. This is achieved through integration with
additional AWS Services or by the deployment of additional custom code on AWS
Lambda.
a. Dashboard Metrics - Amazon CloudWatch

Amazon CloudWatch metrics extracted by a scheduled AWS Lambda function can
be used to produce a historical overview of data per Amazon S3 storage class. In
Figure 14 these metrics are used to create a graph dashboard component that
shows size in GB against date for S3 Standard and Glacier Storage Classes.

Figure 14 - CloudWatch metrics can be used to produce custom dashboard
components.

b. Data Management Portal - Amazon API Gateway and AWS

Lambda
The metadata created during ingestion of the seismic data means that individual
traces can be extracted from the S3 objects. For example, the familiar displays of
inline, crossline, timeslice or gather can be created. In Figure 15 a web-based
serverless portal to view seismic trace headers, the live trace outline and a timelice
is shown. This portal uses AWS API Gateway and AWS Lambda, to connect with
Amazon S3 and Amazon DynamoDB. The trace headers and data are extracted
on-the-fly.
17 / 24

Figure 15 - The seismic metadata and trace index capabilities are used to construct a
web-based data management portal on-the-fly.
c. Map Based Portal - Amazon API Gateway and AWS Lambda

A subset of the metadata extracted provides a spatial location, this information is an
initial search criteria as seen in many existing map interfaces. In Figure 16 a
web-based serverless portal to view seismic trace locations on a map is shown. This
portal uses AWS API Gateway and AWS Lambda, to connect with Amazon S3 and
Amazon DynamoDB. This initial search can then be refined using the other metadata
in Amazon DynamoDB and inventory information from S3 using Amazon Athena.
Figure 16 - Map portal showing seismic trace outlines and other spatial information.
Spatial information is associated with each seismic data file during ingestion.
18 / 24

d. Viewing Post-stack Seismic - Amazon API Gateway and

AWS Lambda
The spatial location and meta data only reveal a limited amount of information about
the seismic dataset, to know more requires viewing the data. Figure 17 shows a
web-based serverless seismic viewer that uses a combination of Amazon API
Gateway, AWS Lambda, Amazon S3 and Osokey trace level indexing.
Figure 17 - Seismic section and metadata opened from the map to review. This uses
the trace level indexing to produce the familiar inline, crossline, random line, gather or
timeslice displays.
e. Viewing Pre-stack Seismic - Amazon API Gateway and AWS

Lambda
The trace level indexing and read performance of Amazon
S3, make it possible to efficiently retrieve subsets of a
seismic dataset. In Figure 18 three gathers are displayed in
a web browser by sending three requests to API Gateway,
each request:
● is authenticated and is authorised,
● starts a Lambda function,
● the trace level indexing is used to read the requested
gather from the pre-stack seismic object,
● the gather is compressed on-the-fly,
● the gather is returned to the browser and displayed.

Figure 18 - Seismic gathers can be viewed on-the-fly.
19 / 24

f. Duplicate Detection - Amazon S3 Batch Operations

Amazon S3 Batch Operations can be used to invoke a custom AWS Lambda function
to perform a consistent operation in bulk on objects in a bucket. Osokey created an
AWS Lambda function with custom Python code that constructs a cryptographic
hash from SEG-D files. The Lambda function reads a subset of headers and trace
data from each SEG-D file to calculate the cryptographic hash. This hash will be
identical for duplicate SEG-D files.

Figure 19 - Invocations of the AWS Lambda function against time for the duplicate
detection using S3 Batch Operations.

S3 Batch Operations was used to apply this Lambda function to 592,921 SEG-D files
(7,264 GB) to detect duplicates. Figure 19 shows a graph of the Lambda function
being invoked against time. It took less than 25 minutes for the Batch Operations job
to complete and a peak of 916 Lambda functions were run in parallel. Each hash was
stored in a DynamoDB table and a query across this table discovered that there were
56,154 duplicate SEG-D files. The combined AWS costs of Amazon S3 Batch
Operations, Amazon DynamoDB and AWS Lambda were less than $15 USD.
g. Transforming SEG-Y data - AWS Lambda

The parallel read performance of Amazon S3 multipart objects and trace level
indexing of SEG-Y data make it possible to efficiently present seismic data in
different formats. These ephemeral formats can be used to integrate seismic data
with on premise geoscience applications, use seismic data directly within the web
browser or within a Jupyter Notebook. This reduces the complexity of data
management by removing the need to create copies of the datasets in multiple
permanent data formats for different applications.

20 / 24

In Figure 20, an inline from a 1TB 3D seismic dataset has been streamed into a
Jupyter Notebook. An ephemeral streaming format was generated from the SEG-Y
data using AWS Lambda functions running in parallel. The custom code compresses
the seismic data and stores it as 293,662 separate parts consuming between 74GB
and 166GB depending on the chosen compression quality. It took approximately 360
seconds to read, compress and write, which corresponds to a rate of ~2.8 GB/s. This
means that these files can be cost effectively recreated and removed based on
usage patterns.

Figure 20 - SEG-Y data can be transformed on-the-fly into ephemeral formats. An inline
from a 1TB 3D Seismic dataset has been streamed into a Jupyter Notebook and
converted to a NumPy array.

21 / 24

6. Conclusion
Osokey’s experience has been that a layered approach to development is an
effective approach to starting with cloud. AWS provides a foundation with a scalable,
reliable and global cloud infrastructure (Figure 21). In this whitepaper Amazon S3,
Amazon DynamoDB and Amazon Athena have been connected using AWS Lambda
into a serverless seismic data management solution, which Osokey call the data
layer.

Figure 21 - Osokey have found it effective to take a layered approach to development,
starting with the AWS foundation of a scalable, reliable and global cloud infrastructure.

AWS CloudFormation is used to simplify the deployment of AWS Resources in any
AWS Region. AWS Identity and Access Management (IAM) is used to separate
Osokey’s custom AWS Lambda functions from customers seismic data. A customer
can utilise the monitoring capabilities of AWS to audit Osokey’s access.

Amazon S3 Lifecycle Management Rules and object tags are used to transition
seismic data from S3 Standard Storage Class to S3 Glacier Storage Class. This
simplifies archiving because the location of the seismic data does not change. By
using Amazon Athena with Amazon S3 Inventory Reports, a data manager can
quickly establish which seismic data is archived. When operational requirements
change seismic data can be restored to the S3 Standard Storage within hours.

The AWS Lambda ingestion pipeline is triggered automatically by new SEG-Y or
SEG-D files. The custom code extracts and identifies pertinent metadata and stores
this in Amazon DynamoDB in the customer’s AWS Account. The massive parallel
read performance of Amazon S3 makes it viable to re-extract additional metadata
on-demand and, with the flexibility of DynamoDB, permanently store this metadata to
enhance search capabilities. Amazon DynamoDB Global Secondary Indexes (GSIs)
can be created to query this metadata.
22 / 24

The ingestion pipeline also creates a trace level index for each file, with this you can
read in parallel from many parts of the file without contention or slow down. By
utilising the stored metadata and trace index capabilities, access and analytics
layers can be added to the solution. Osokey utilise the outputs from this architecture
to provide a Software as a Service (SaaS) solution that delivers cloud-based data
management, collaboration & data analysis for seismic data stored on AWS (Figure
22). Seismic data can be quickly located, viewed with a few clicks and streamed
globally.

The performance of Amazon S3 enables on-demand transformation of SEG-Y data
and a read, compress and write rate of 2.8 GB/s was achieved from a 1TB seismic
file. The ETag of this file (29014b75882070511aa863b2f90b2e37-8097) shows that
it is stored as 8097 parts on S3. This performance also enables services like S3
Batch Operations to be used to process files in bulk, and an example of finding
duplicates amongst 592,921 SEG-D files was completed within 25 minutes, with total
AWS costs of less than $15 USD.

This architecture and approach simplifies the adoption of cloud for seismic data
because there is no need to transcribe your data before ingestion. Moving to cloud
can be a lift and shift of the SEG-Y or SEG-D format data rather than a read, identify
and convert before uploading. Once the data is in the cloud other AWS Services can
be integrated to deliver continuous innovation.
23 / 24

Figure 22 - The outputs from the seismic ingestion pipeline can be integrated into a
web-based seismic data management, viewing and collaboration solution.

24 / 24

OsokeyServerlessComputingSeismicWhitepaperAWS 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OsokeyServerlessComputingSeismicWhitepaperAWS 2019

Uploaded by

Copyright:

Available Formats

2. AWS Services used in the Architecture 6

5. Extending the Architecture 17

Network-Attached Storage (NAS) Live frequently accessed data

Magnetic tapes Archived infrequently accessed data

USB devices Sharing data internally and with third-parties

a. What is seismic data?

2. AWS Services used in the Architecture

S3 Storage Class Operational requirement

S3 Standard Live frequently accessed data

S3 Glacier Archived infrequently accessed data

AWS Snowball Sharing data internally and with third-parties

Security, Permissions & Activity Logging

Figure 8 shows some of the early metadata that is added per ingested files. Each file

5. Extending the Architecture

a. Dashboard Metrics - Amazon CloudWatch

b. Data Management Portal - Amazon API Gateway and AWS

c. Map Based Portal - Amazon API Gateway and AWS Lambda

d. Viewing Post-stack Seismic - Amazon API Gateway and

e. Viewing Pre-stack Seismic - Amazon API Gateway and AWS

f. Duplicate Detection - Amazon S3 Batch Operations

g. Transforming SEG-Y data - AWS Lambda

You might also like