06 - IBM Watsonx - Data Competitive Insights

IBM watsonx.
data
Competitive Insights
Content by:
Danny Arnold
Principal, Learning Content Development | Data & AI
darnold@us.ibm.com
Presenter:
Farah Auni Hisham
Technical Enablement Specialist | Data & AI
farah.hisham@ibm.com
Seller guidance Slides in this presentation marked as "IBM
and Business Partner Internal Use Only" are
for IBM and Business Partner use and
References in this presentation to IBM
products, programs, or services do not
imply that they will be available in all
and legal should not be shared with clients or anyone

else outside of IBM or the Business
Partners’ company.
countries in which IBM operates.
Product release dates and/or capabilities
referenced in this presentation may change
disclaimer
at any time at IBM’s sole discretion based
© IBM Corporation 2023. on market opportunities or other factors
All Rights Reserved. and are not intended to be a commitment to
future product or feature availability in any
The information contained in this way. Nothing contained in these materials is
publication is provided for informational intended to, nor shall have
purposes only. While efforts were made the effect of, stating or implying that any
to verify the completeness and accuracy activities undertaken by you will result
of the information contained in this in any specific sales, revenue growth,
IBM and Business Partner publication, it is provided AS IS without or other results.
warranty of any kind, express or implied.
Internal Use Only In addition, this information is based on All client examples described are presented
IBM’s current product plans and strategy, as illustrations of how those clients have
which are subject to change by IBM without used IBM products and the results they may
notice. IBM shall not be responsible for any have achieved. Actual environmental costs
damages arising out of the use of, or and performance characteristics may vary
otherwise related to, this publication or any by client.
other materials. Nothing contained in this
publication is intended to, nor shall have All statements in this report attributable to
the effect of, creating any warranties or Gartner represent IBM’s interpretation of
representations from IBM or its suppliers or data, research opinion or viewpoints
licensors, or altering the terms and published as part of a syndicated
conditions of the applicable license subscription service by Gartner, Inc., and
agreement governing the use of IBM have not been reviewed by Gartner. Each
software. Gartner publication speaks as of its original
publication date (and not as of the date of
this presentation). The opinions
expressed in Gartner publications are not
representations of fact and are subject to
change without notice.”
watsonx.data •1 Types of competitors
• Primary competitors
Competitive Insights • Background
Agenda • Key strengths and weaknesses
• Summary
• Secondary competitors
• Background
• Key strengths and weaknesses
• Summary
• Competitive positioning
• Watsonx.data differentiators
• Latest Updates
• Objection handling
• Setting traps for competitors
watsonx.data Competitors
1. Primary Competitors 2. Secondary Competitors
Augmented Data Data Lake(house)

Data Lakehouse
Warehouse Offerings
IBM and Business Partner – Internal Use Only

Primary competitors

Data Lakehouse
Warehouse Offerings

Types of (Primary) Data lakehouse Augmented data
competitors warehouse competitors
Competitors
• Designed for both • Primarily designed for
A data lakehouse combines the best features of
structured and structured and semi-
data warehouses and data lakes to provide cost
unstructured data. structured data.
optimization for clients. Compute and storage
are separated so that data can be accessed
• Based on open • Uses proprietary or
from different engines.
table and data open file formats,
formats for but supports open
An augmented data warehouse may be able to
data storage. table format for
access different types of data, but all compute
data storage.
processing is performed through the data
• Fit for purpose query
warehouse engine. There is no compute cost
engine for different • Query processing
optimization allowing different engines, only the
use cases. uses data
compute amount used by the data warehouse
warehouse engine.
engine can be adjusted.
• Cost optimization
for compute engine • Compute and
and storage. storage are not
completely separated.
• Separate compute
and storage.
watsonx.data Competitor Overview
Databricks Databricks pioneered the term “Lakehouse”. It’s currently positioned as a leader in
the emerging Lakehouse market. Founded in 2013 by the creators of Apache Spark,
Data Lakehouse Competitors

Competitor Databricks offers a unified analytics platform for a variety of use-cases such as data
engineering, machine learning, data science, and AI.
Overview
Dremio Dremio has gained significant recognition for its proprietary Dremio query engine
(Weaknesses (Sonar) technology. It is positioned as a leader in the data lakehouse market, with a
growing market share. Founded in 2015, Dremio offers an open, cloud-native data
not mentioned) lakehouse engine that simplifies and accelerates data processing and analytics.
Starburst Starburst has made a name for itself in the data access and analytics space. It is
positioned as a leader in the enterprise data access market, with a growing market
share. Founded in 2017, Starburst offers a cloud-native platform that enables fast
and easy access to data across a range of sources.
Amazon Athena Amazon Athena was first released in 2016 and is the AWS data lakehouse offering that
All competitors are utilizes Apache Spark for analytics on data in open file formats and the Trino engine for
relatively new companies SQL queries. Amazon Athena combines with other AWS services, like AWS Lake
(within the past decade) Formation, for data governance to build a complete lakehouse solution.
and are rapidly growing in
the public cloud market Snowflake Founded in 2012, Snowflake has made significant strides in the cloud data
warehousing market and is currently positioned as a leader in the cloud data platform
space. Snowflake supports open table formats but locks clients into the Snowflake
environment that is locked and controlled (not open-source based). It has a single
SQL query engine and a limited ability to access data outside of a Snowflake data
Others
warehouse, and no hybrid cloud capability.
**Augmented data
Amazon Redshift Spectrum Amazon Redshift Spectrum is a Redshift service that allows direct queries on data
warehouse competitor stored in Amazon S3 files without having to load the data into an Amazon Redshift data
warehouse. Amazon Redshift Spectrum requires an active Amazon Redshift data
warehouse cluster to execute queries, so it is tightly integrated with Amazon Redshift
and extends the data warehouse to access external tables in Amazon S3.

Primary Competitors
Background
Competitor Background information
watsonx.data
Databricks • Databricks has partnered with AWS, Microsoft Azure, and Google Cloud Platform,
but has special optimizations with Microsoft Azure, including tight integrations
with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics,
Competitors Power BI and other Azure services.
• Databricks focuses on a cloud-only solution, with no on-premises product.

Background • Databricks introduced its Photon engine to speed up SQL queries over their
(1 of 3) previous engine for data warehouse workloads. Photon is a compiled C++ query
engine for SQL and DataFrame API workloads and delivers 3x to 8x performance
gains over the previous query engine.
• It’s built on open-source Delta Lake and Delta Sharing components, but Databricks
has proprietary extensions to move clients to Databricks versus open-source.
• In May 2023, Databricks announced an investment in the privacy and access

control market. They invested in the modern cloud focused protection platform,
Immuta, and acquired Okera, a data access and governance platform.
Dremio • Based on community driven standards such as Apache Arrow, Apache Iceberg,
and Apache Parquet.
• Has the Dremio Sonar query engine to assist in providing self-service analytics
along with a shared semantic layer for governed, self-service data access to
provide a consistent view of the data along with transparent query acceleration.
• Dremio provides deployment flexibility as it can be deployed within the public

cloud or deployed standalone on-premises.

watsonx.data
Competitors Starburst • Starburst has expanded the focus from public cloud-only to include an on-premises
deployment option with Starburst Enterprise.
Background • Starburst has connectors to many data sources including data lakes (Hadoop HDFS,
Ceph, MinIO, and Dell/EMC ECS), data warehouses (Teradata, Oracle Exadata, IBM
(2 of 3) Netezza Performance Server (NPS)), message queues/NoSQL (Apache Kafka,
MongoDB, and Elastic), and current and legacy databases (MySQL, PostgreSQL,
IBM Db2, Microsoft SQL Server, and Oracle Database).
Amazon Athena • Amazon Athena includes both Apache Spark and the AWS Trino engines and
is central to the AWS data lakehouse strategy. Combined with AWS Glue as the AWS
serverless data integration service, AWS provides a strong data lakehouse offering.

watsonx.data
Snowflake • Snowflake has an extensive array of partners (200+) that help it provide services and
functionality for data fabric and other use case capabilities that Snowflake lacks.
Competitors • Snowflake provides many data sets and applications from partners in the
Snowflake Marketplace. However, rather than providing data sets for usage,
Snowflake includes the data sets as part of a Snowflake data warehouse, driving
Background additional data warehouse usage and a larger number of deployed data warehouses.
(3 of 3) • Snowflake is moving into transactional data support with their Unistore hybrid table
offering (in private preview as of September 2023) to be able to accommodate more
types of client workloads.
Amazon Redshift Spectrum • Amazon Redshift Spectrum is a component of the Amazon Redshift data warehouse
offering.
• Redshift Spectrum provides the ability for a client to query data within Amazon S3 files
directly, without having to move the data into Amazon Redshift.
• Query processing charges within Amazon Redshift Spectrum are based on the
amount of data processed by the query (not the query result).
• Amazon Redshift Spectrum by itself is not a data lakehouse solution as it can only
work with data files within Amazon S3.
Other Competitors
Primary Competitors:
Strengths & weaknesses

Primary Competitive landscape
Data lakehouse competitors
Details
Deployment options Public cloud only Public cloud & on-premises Public cloud & on-premises AWS only
Query engines • Apache Spark • Dremio Sonar (proprietary) • Starburst (Trino-based) • Apache Spark
• Photon • Amazon
(Trino-based)
Open table format support • Delta Lake • Apache Iceberg • Apache Iceberg • Apache Iceberg
• Delta Lake (Parquet only)
Others (Augmented data warehouse competitors)
Details
Deployment options Public cloud only AWS only

Query engines • Snowflake • Amazon Redshift
Open table format support • Supports Apache Iceberg but • None, uses Amazon S3 files
primarily uses Cloud Object defined as external tables
Storage (COS)

Primary Competitors at
a glance
watsonx.data
Primary competitors at a glance
Hybrid cloud
Only single cloud Multi-cloud Both hybrid cloud

deployment option deployment and multi-cloud
options deployment options
Amazon Amazon
Athena Redshift
Spectrum Dremio
Worst Best

watsonx.data
Multiple query engines
Single data Single data Two query engines Three or more

warehouse engine lakehouse engine query engines
Amazon Dremio Amazon

Redshift Athena
Spectrum
Worst Best
watsonx.data
Open-source based
Multiple open-source Multiple

NO open-source Single open-source open-source
components and
components component components and
single vendor
focused community strong community
Dremio Amazon
Athena
Amazon
Redshift
Spectrum
Worst Best
watsonx.data
Data governance
Limited data Enterprise data

NO data Data governance
governance governance for all
governance requires a separate
capability assets in the data
capability product or service
lakehouse
Dremio
Amazon Amazon
Redshift Athena
Spectrum
Worst Best
watsonx.data
Market presence
Developing market Limited market Leading market

presence presence presence
(GA July 2023) Amazon Amazon

Athena Redshift
Spectrum
Dremio
Less More

Secondary
competitors

Data Lakehouse
Warehouse Offerings

Microsoft OneLake overview
Microsoft OneLake
• The “OneDrive for data”

• Separates the data lake into workspaces
• All engines can access the data in OneLake
• Primarily supports Microsoft Azure but does
support Amazon S3 files
Follows the Azure Databricks standards

for open table format to define “tables”
within the data lake
• Delta Lake with Parquet files
Other data sources must be defined as

”shortcuts” (connections) and are
treated as “files” versus “tables”

Microsoft OneLake overview
Microsoft OneLake
• Can be accessed by Azure Data Lake Storage

(ADLS) compatible applications
• Azure Databricks and Azure HDInsight are two

of the applications that can access OneLake
OneLake is divided into workspaces
• Allows data to be accessed easily while

providing separation and security
between different groups of data sources
OneLake allows multiple data accesses

within each workspace
• Lakehouses, data warehouses, and other

data sources (shortcuts)
Teradata VantageCloud overview
Teradata VantageCloud - two offerings
• VantageCloud Enterprise – a rename of their

earlier fully managed cloud solution
• VantageCloud Lake – a new offering

(currently only on AWS) that allows database
files to reside on Amazon S3
Targeted at cloud data warehouses
• Separate compute and storage
• Compute that scales in small increments,

but can scale very large
• Ability to limit autoscaling to ensure a

client does not overspend their budget
Teradata VantageCloud overview
ClearScope Analytics
• Rebranded analytics capabilities that includes

new features and functions
• IBM has comparable capabilities within the
data warehouse engines and AI capabilities
Data Fabric
• Teradata data fabric is comprised of

QueryGrid, data discovery and catalog,
and Teradata Industry Models
Object Storage
• Ability to use cloud object storage for data

warehouse data storage (currently only
Amazon S3 storage on AWS) –
VantageCloud Lake offering
Google Cloud Platform Data
lakehouse architecture overview
Google Cloud Platform (GCP)
data lakehouse architecture
• Rather than a specific product to

deliver a lakehouse solution,
GCP provides an architecture
At a high-level, a data lakehouse should be

able to perform a variety of workloads (BI,
reports, data science, and ML)
• Data resides in the data lake
• Consists of structured, semi-structured, and

unstructured data
• All data sources exist in the single lakehouse

with metadata, caching, and indexing
existing within the data lake infrastructure
Google Cloud Platform Data
lakehouse architecture overview
Google Cloud Platform (GCP)
data lakehouse architecture
• GCP provides 4 query engines

• Dataproc – Hadoop and Spark engine
• Vertex AI – Unified MLOps platform to
enable large scale model building with
limited coding
• BigQuery – SQL query engine
• Serverless Spark – Spark engine
Partner products that provide data

lakehouse capabilities are available from
• Databricks
• Starburst

Oracle Data lake overview
Oracle data lakehouse offerings
• This diagram highlights the various Oracle

components (”O” in upper right of box)
• Oracle supports open file formats, but does

NOT support open table formats such as
Apache Iceberg, Delta Lake, or Hudi
• MySQL HeatWave Lakehouse can query up

to 400 TB of data across HeatWave clusters
of up to 512 nodes
• Query engines are Oracle Autonomous

Database, MySQL Heatwave, and Apache
Spark
• MySQL HeatWave Lakehouse supports OCI

and AWS to provide multi-cloud deployment
options but NOT hybrid cloud deployment
Oracle Data lake overview
Oracle data lakehouse offerings
• Oracle has the new MySQL HeatWave

Lakehouse offering or clients can combine to
create their own data lake
• OCI Data Catalog is a metadata

management service that allows clients to
discover various (Oracle) data sources within
the cloud or on-premises
• The data repositories are either Oracle

Autonomous Database, MySQL HeatWave
Lakehouse, or Oracle BigData (Apache
Hadoop with Spark)
• Files in cloud object storage can be accessed

by MySQL HeatWave Lakehouse for open file
formats (Parquet, CSV, and others)
Secondary Competitors:

Secondary Competitors
at a glance
watsonx.data
Secondary competitors at a glance
Hybrid cloud
Only single cloud Multi-cloud Both hybrid cloud

deployment option deployment and multi-cloud
options deployment options
Worst Best

watsonx.data
Multiple query engines
Single data Single data Two query engines Three or more

warehouse engine lakehouse engine query engines
Worst Best
watsonx.data
Open-source based
Multiple open-source Multiple

NO open-source Single open-source open-source
components and
components component components and
single vendor
focused community strong community
Worst Best
watsonx.data
Data governance
Limited data Enterprise data

NO data Data governance
governance governance for all
governance requires a separate
capability assets in the data
capability product or service
lakehouse
Worst Best
watsonx.data
Market presence
Developing market Limited market Leading market

presence presence presence
(GA July 2023)
Less More

Competitive positioning
Competitive positioning options
Primary competitors
Competitive (everything or nothing) strategy
• Complete data lakehouse solution
Coexistence (surround) strategy

• Incomplete data lakehouse solution
• Limited scope offerings

Competitive positioning options
Secondary competitors
Competitive (everything or nothing) strategy
• Use for ALL completely new competitive

opportunities where no competitor currently has a
lakehouse solution in place
Coexistence (surround) strategy
• If a client is already using a secondary

competitor’s lakehouse solution, then
a coexistence strategy is best

watsonx.data
Differentiators
watsonx.data
Differentiators
• No other data lakehouse offering has integrated • Other data lakehouse competitors do NOT have
data warehouse engines in addition to the the level of experience with mission critical
Apache Spark and open-source query engines applications, and level of research in query
optimization and query processing, as IBM
• The cloud hyperscalers (AWS, Microsoft Azure,
and GCP) along with Databricks provide no • Watsonx.data plus other IBM data sources
hybrid cloud deployment capability (Netezza Performance Server and Db2) deliver
a query performance spectrum not offered by
• Deployment flexibility in other clouds – no other data lakehouse competitors
other data lakehouse offering can be deployed
as easily across different cloud platforms • Watsonx.data and its selection of Apache
Iceberg and Presto delivers an open solution
versus a single contributor open-source lock-in

Latest Updates

Objection handling
From the competitor’s background and strength,

client could materialize those as objections towards
watsonx.data.
Let’s see some an example of how to handle

objections and craft IBM response.
Objection handling against Databricks
Objection IBM response
Databricks has partnerships with all three major

cloud vendors (AWS, Microsoft Azure, and GCP).
IBM watsonx.data is not available on all clouds
as a fully managed service, and it is important to
have the flexibility to choose any of these
cloud providers.
Competitive objections
watsonx.data
Databricks • Databricks has partnered with AWS, Microsoft Azure, and Google Cloud Platform,
but has special optimizations with Microsoft Azure, including tight integrations
with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics,
Competitors Power BI and other Azure services.
• Databricks focuses on a cloud-only solution, with no on-premises product.

Background • Databricks introduced its Photon engine to speed up SQL queries over their
(1 of 3) previous engine for data warehouse workloads. Photon is a compiled C++ query
engine for SQL and DataFrame API workloads and delivers 3x to 8x performance
gains over the previous query engine.
• It’s built on open-source Delta Lake and Delta Sharing components, but Databricks
has proprietary extensions to move clients to Databricks versus open-source.
• In May 2023, Databricks announced an investment in the privacy and access

control market. They invested in the modern cloud focused protection platform,
Immuta, and acquired Okera, a data access and governance platform.
Dremio • Based on community driven standards such as Apache Arrow, Apache Iceberg,
and Apache Parquet.
• Has the Dremio Sonar query engine to assist in providing self-service analytics
along with a shared semantic layer for governed, self-service data access to
provide a consistent view of the data along with transparent query acceleration.
• Dremio provides deployment flexibility as it can be deployed within the public

cloud or deployed standalone on-premises.

(1 of 6)
Has a difficult learning curve
Market leader in the data
and setup complexity, it
lakehouse market and has
takes time for clients to be
mindshare of clients
productive with Databricks
Unity catalog for metadata High cost for poorly

management and data optimized workloads in
governance of all Databricks data production resulting in
assets including ML models higher-than-expected costs
for clients
Has published and certified No hybrid cloud or on-premises

100 TB TPC-DS benchmark to deployment option, ONLY
validate query performance public cloud

watsonx.data
Differentiators

Databricks has partnerships with all three major • IBM watsonx.data can be deployed on any cloud
cloud vendors (AWS, Microsoft Azure, and GCP). provider that supports Red Hat OpenShift or
IBM watsonx.data is not available on all clouds within a private cloud or on-premises
as a fully managed service, and it is important to environment as a self managed solution.
cloud providers. • Although IBM watsonx.data is not available on
all three cloud vendors (AWS, Microsoft Azure,
and GCP) as a fully managed service,
watsonx.data can be deployed on all three cloud
providers and IBM will provide that option in the
future as client demand dictates.
Objection handling

Setting traps
Sellers could set traps for clients during

watsonx.data discussions using:
✓ Competitor’s weaknesses
✓ Watsonx.data differentiators
Let’s see some an example of how to set your trap

questions.
Setting traps Trap to set/question to ask Trap to set/question to ask
against Databricks
Reason Reason

(1 of 6)

for clients


against Databricks Ask the client if they want simple,
easy to understand pricing for
their data lakehouse.
Reason Reason
Databricks measures
consumption and calculates cost
through Databricks (consumption)
Units (DBUs). The cost per DBU
varies based on different use
cases (Databricks SQL, Databricks
All Purpose Compute for
Interactive Workloads, Delta Live
Tables (DLT), and others). This
makes Databricks pricing complex
and makes it difficult for clients to
understand what a Databricks
Data Lakehouse will cost.

Link to watsonx Packaging and Pricing (owner: Jason Foss)
RECAP
watsonx.data
Differentiators

against Databricks Ask the client if they want simple, Ask the client if they want
easy to understand pricing for their data lakehouse SQL
their data lakehouse. workloads to be as performant
as SQL workloads within a
Reason data warehouse.
Databricks measures
consumption and calculates cost Reason
through Databricks (consumption) Although Databricks has improved
Units (DBUs). The cost per DBU their SQL query performance with
their new Photon engine, it will
still not be as performant as what
IBM offers with their Netezza and
All Purpose Compute for Db2 Warehouse engines today.
Interactive Workloads, Delta Live IBM watsonx.data provides
Tables (DLT), and others). This multiple engines to provide fit-for-
makes Databricks pricing complex purpose performance for data
and makes it difficult for clients to warehouse workloads.

The integrated IBM watsonx.data ecosystem for maximum 1 Analyze Z data
workload coverage and optimal price-performance easily and securely
with Db2 for z/OS
Data Gate
IBM watsonx.data functionality Integrations at GA
2 Warehouses can
access data in
the lakehouse
Db2
Db2W Netezza Spark Presto
z/OS 3 The lakehouse
1
can access data
residing in
5 Db2/Netezza
4 Easily Promote
Watsonx.data IBM Knowledge
6 data between
Metadata Store Catalog
the warehouse
and lakehouse
5 Query routing service,

2 4 multiple engines can
access same data
lake data
Object storage Object storage KC policies

6
enforced by the
3 lakehouse via
metadata service
RECAP
Setting traps

Summary
• The key difference between an augmented

data warehouse and a data lakehouse is that
the data warehouse is the compute engine for
all queries versus multiple engines that allow • Watsonx.data is a better lakehouse choice
clients to cost optimize their queries. for clients that are already using IBM Db2
or Netezza Performance Server (NPS) than
• Limited ability for a data warehouse to offerings from Databricks, Dremio, and
access external data sources. Starburst due to the integration of
watsonx.data with Db2 and NPS.
• Amazon Athena, Amazon Redshift Spectrum,

and Snowflake provide both competitive • Databricks is the strongest data lakehouse
takeout and coexistence sales opportunities. competitor but does have complex pricing
and high cost compared to other offerings.
• Will provide data access and query
capability for data external to a data • Clients with smaller data lakehouse
warehouse and/or AWS. requirements will find Databricks
too costly.
Additional references
(1 of 3)
Documentation and technical information
• Starburst Enterprise documentation

• Snowflake documentation
• Databricks on Azure documentation
• Dremio documentation
• Amazon Athena documentation
• Amazon Redshift Spectrum documentation
(2 of 3)
Articles and blogs
• Article on a client experience with Dremio

• Blog entry on The Good and Bad of Databricks Lakehouse Platform
• Blog entry on The Good and Bad of Snowflake Data Warehouse
• Blog entry on Data Lakehouse, beyond the hype
• Blog entry on Data Warehouses vs. Data Lakes vs. Data Lakehouses: Which Is Better for Your Business?
(3 of 3)
Articles and blogs
• Blog entry on Amazon Athena Explained: What is it and When Should I Use it?
• Blog entry on Amazon Redshift Spectrum and how it works
• Blog entry on Exploring AWS Lambda Deployment Limits
• Blog entry on What’s the Difference between Trino and PrestoDB?
© 2023 International Business Machines Corporation
Thank you
IBM and the IBM logo are trademarks of IBM
Corporation, registered in many jurisdictions
worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list
of IBM trademarks is available on ibm.com/trademark.
THIS DOCUMENT IS DISTRIBUTED “AS IS” WITHOUT

ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN
NO EVENT, SHALL IBM BE LIABLE FOR ANY DAMAGE
ARISING FROM THE USE OF THIS INFORMATION,
INCLUDING BUT NOT LIMITED TO, LOSS OF DATA,
BUSINESS INTERRUPTION, LOSS OF PROFIT OR
LOSS OF OPPORTUNITY.
Client examples are presented as illustrations of how

those clients have used IBM products and the results
they may have achieved. Actual performance, cost,
savings or other results in other operating
environments may vary.
Not all offerings are available in every country in which

IBM operates.
IBM’s statements regarding its plans, directions, and

intent are subject to change or withdrawal without
notice at IBM’s sole discretion. Information regarding
potential future products is intended to outline our
general product direction and it should not be relied
on in making a purchasing decision. The information
mentioned regarding potential future products is not
a commitment, promise, or legal obligation to deliver
any material, code or functionality. Information about
potential future products may not be incorporated into
any contract. The development, release, and timing of
any future features or functionality described for our
products remains at our sole discretion.
Red Hat and OpenShift are registered trademarks of

Red Hat, Inc. or its subsidiaries in the United States
and other countries.
Appendix
Primary Competitors Strength & Weaknesses
(1 of 6)

for clients


Strengths & weaknesses Dremio
(2 of 6)
Has a proprietary Sonar Star schema design and table
query engine that seems to joins are not optimal, clients
deliver better performance may need to re-write queries
than open-source query and re-design table structures
engines
Has limited marketing

Offers both public cloud presence, clients may not be
and on-premises aware of the Dremio offerings
deployment options
Has competitive pricing with

a low-end, no-cost package Fully managed Dremio is
that allows clients to easily only available on AWS
try Dremio for small projects

(3 of 6)
Has many data connectors for
Starburst is the primary
traditional databases, non-
contributor to Trino making it
relational databases, columnar
less open-source than Presto
databases, and other data sources
Designed for interactive (ad-hoc)

Offers both public cloud and queries, not optimized or
on-premises deployment options designed for long-running queries
for data warehouse workloads
Provides RBAC for table Limited data governance

level, column level, and row capability and no data
level access control lineage capability

Strengths & weaknesses Amazon
Athena
(4 of 6)
Expensive if queries are not

Part of AWS and integrates
optimized and/or using a
well with other AWS services
proper partitioning strategy
Provides an Apache Spark Non-Amazon S3 data sources require

and an Amazon Trino-based programming code and Athena
engine to provide multiple Federated Query using Amazon
query engines Lambda (separate AWS service)
Uses AWS Glue for data Inconsistent performance due to

integration and AWS Lake the serverless design with no
Formation for data governance dedicated compute configuration

(5 of 6)
Offers consumption-based Each T-shirt size scale-up

pricing that appeals to clients increment increases cost by
2x over the prior T-shirt size
Provides complete Limited hybrid cloud capability

separation between the for on-premises data sources,
data on storage and the only supports storage devices
compute engine within that follow the S3 file standard
a virtual warehouse
Automatic scale-out of multiple

Provides dynamic scale-up of virtual warehouse clusters
compute with no downtime during high query concurrency
periods increases cost by 100%
for each scale-out virtual
warehouse cluster
Other Competitors
Strengths & weaknesses Amazon
Redshift
(6 of 6) Spectrum
Requires a running
Infrequently accessed data Amazon Redshift data
can remain in Amazon S3 warehouse cluster to execute
Redshift Spectrum queries
Query processing incurs

Part of AWS, so Amazon Amazon Redshift compute
Redshift integrates with usage plus Redshift Spectrum
other AWS services data processing charges
Can access only Amazon S3

Provides the ability for Amazon
files that have been defined as
Redshift clients to access
external tables within an
Amazon S3 files directly
Amazon Redshift data
warehouse cluster
Other Competitors
Secondary Competitors Strength & Weaknesses
Supports different
Tables must reside in Delta
query engines to allow
Parquet files
compute choice based
on query requirements
Non-Delta Parquet files must be

Allows access to Amazon S3 files manually and explicitly defined
in addition to Microsoft Azure as files using shortcuts
data sources (connections)
Can span geographies No hybrid cloud or multi-

and allows workspaces cloud deployment options
for security and
workload separation

Teradata is a proven data

Teradata supports Parquet
warehouse provider with many
files, but does NOT support an
on-premises clients
open table format like Apache
Iceberg for table storage
Teradata does NOT offer multiple

Teradata is a leader in the cloud query engines to reduce client
database space in the most compute cost
recent Gartner Magic Quadrant
Teradata has strengthened VantageCloud Lake is difficult

their analytics capabilities to size as both a Principal
with their new ClearScope cluster and Compute clusters
Analytics component are required for workloads
using Object File System (OFS)

Google Cloud Platform has Clients must piece together

the components for a client different services to build a
to build a data lakehouse data lakehouse solution
Google Cloud Platform does

Google Cloud Platform has many not have the data lakehouse
tools around ML and AI market presence of
AWS and Microsoft Azure
Google Cloud Platform offers For a more complete data

storage that is optimized for lakehouse solution, clients
open-source storage formats must purchase partner
solutions from Databricks
and Starburst

Oracle offers multiple query Oracle and their lakehouse

engines for their lakehouse; offerings are limited to OCI
Oracle Autonomous and AWS deployments
Database, MySQL Heatwave,
and Apache Spark
Oracle has no hybrid cloud

Oracle tightly integrates the data deployment capability for MySQL
lakehouse offerings with Oracle HeatWave Lakehouse
cloud-based applications
Oracle Cloud Infrastructure Oracle lakehouse does NOT

(OCI) is optimized for Oracle support any open table
data sources and applications formats, so files are limited
to cloud object storage
within OCI and AWS

Latest Updates
watsonx.data
Latest updates
Databricks announcements IBM competitive analysis
• Lakehouse federation capabilities in Unity Catalog to • Unity Catalog update is in private preview
support external data sources • Limited number of public cloud data warehouses
and databases included but NO hybrid cloud
• Delta Lake 3.0 with Universal Format (UniForm) and
Liquid Clustering • Delta Lake UniForm is in public preview and Liquid
Clustering is announced only (no preview)
• UniForm allows Delta Lake to read Apache Iceberg
tables by creating Delta Lake metadata for the table
• Liquid Clustering adjusts the data layout in a Delta
Lake table based on partitioning key selection to
improve query performance
For the complete competitive analysis of

Databricks Data+AI Summit 2023, see
Unity Catalog will support ONLY this write-up on Seismic
these 7 public cloud data sources
June 2023 announcements

watsonx.data
Latest updates
Snowflake announcements IBM competitive analysis
• Managed Apache Iceberg tables • Managed Apache Iceberg tables are in

private preview
• Dynamic Tables • Preliminary information indicates that the
Apache Iceberg table is converted to use the
Snowflake catalog and is no longer managed by
the Iceberg catalog
• Dynamic Tables are enhanced Materialized Views

(MVs) in public preview that bring Snowflake on-par
with other data warehouses on the public cloud, the
goal is to reduce full table rebuilds for data pipelines

Managed Apache Iceberg tables are Snowflake Summit 2023, see this write-
converted to use the Snowflake catalog up on Seismic
June 2023 announcements
watsonx.data
Latest updates Iceberg table support
Snowflake update IBM competitive analysis
• Single Iceberg table type • Snowflake wants clients to use Snowflake catalog for
Iceberg tables
• Two catalog management options • Delivers better query performance
• External (Iceberg) catalog • Locks client into Snowflake query engine
• Managed by Snowflake
• Snowflake claims “catalog integration” if using
• Performance implications for catalog management external Iceberg catalog
• Currently only supports AWS Glue Catalog with
catalog integration
• Can only define Iceberg tables as External Tables for
production Snowflake use as of October 2023

Snowflake Iceberg table support see this
September 2023 update Currently in PRIVATE preview write-up on Seismic
watsonx.data
Latest updates
MySQL HeatWave Lakehouse announcement IBM competitive analysis
• Can access data in open file formats like Apache • No hybrid cloud deployment option, only AWS and OCI
Parquet, CSV, and others are the only supported cloud platforms
• Can access database export files from Oracle • Internal engine processing uses a proprietary format
Database, MySQL, Amazon Aurora, and Amazon
Redshift • Released a non-certified TPC-H benchmark result

MySQL HeatWave Lakehouse, see this
write-up on Seismic
July 2023 announcements
watsonx.data
Latest updates
Serverless
Amazon Web Services announcement IBM competitive analysis
• Amazon Redshift Serverless or Amazon Redshift • Very limited capability in using Iceberg tables from
Spectrum can now query Apache Iceberg tables Amazon Redshift
• Iceberg table access is read-only • Requires Amazon Athena or Amazon EMR to write to
Iceberg tables
• Iceberg table must be cataloged in the AWS
Glue Data Catalog • Not a lakehouse solution and uses proprietary
components within Amazon Redshift
• Iceberg table access is primarily restricted to
tables located within AWS • No time travel or data sharing capabilities within
Amazon Redshift using Iceberg tables.
IN PREVIEW MODE ONLY – August 2023 For more information on Amazon
(non-production use) Redshift Iceberg table support, see this
write-up on Seismic
August 2023 announcements
Objection Handling
Databricks has partnerships with all three major • IBM watsonx.data can be deployed on any cloud
cloud vendors (AWS, Microsoft Azure, and GCP). provider that supports Red Hat OpenShift or
IBM watsonx.data is not available on all clouds within a private cloud or on-premises
as a fully managed service, and it is important to environment as a self managed solution.
cloud providers. • Although IBM watsonx.data is not available on
all three cloud vendors (AWS, Microsoft Azure,
and GCP) as a fully managed service,
watsonx.data can be deployed on all three cloud
providers and IBM will provide that option in the
future as client demand dictates.
Objection handling against Dremio
Dremio provides a simple, fast, and cost-effective • Performance and reliability, Dremio has
platform for analytics, ML, and other data-driven a history of issues with new releases.
applications. What advantages does IBM Watsonx.data is based on open-source Apache
watsonx.data have over Dremio? Spark and Presto which are well-tested and
proven technologies. IBM has a long history
with query engines and query optimization
and will incorporate this knowledge into
watsonx.data to continually improve
performance and query efficiency.
• Enterprise-class capabilities, IBM provides

strong data governance and security around
watsonx.data that Dremio cannot provide.
Objection handling against Starburst
Starburst provides a query engine that can • IBM watsonx.data provides multiple query
efficiently process an analytic workload across engines: Apache Spark and Presto are the
many different data sources. What advantages query engines currently available (along with
does IBM watsonx.data provide over Starburst? the optional Db2 and Netezza Performance
Server (NPS) specialized data warehouse
engines) versus the single primarily SQL
query engine within Starburst.
• Watsonx.data provides data governance,

security, and deep integration with IBM
data sources including the IBM Z platform
that Starburst does not provide without 3rd
party integrations.
• IBM has a decades long history of query engine

research and development including query
optimizations within IBM Research that
Competitive objections Starburst cannot match.
Objection handling against Amazon Athena
(1 of 2)
Amazon Athena has an Apache Spark query • While it is true that both Amazon Athena and
engine and a Trino SQL query engine and IBM IBM watsonx.data provide multiple engine
watsonx.data uses an Apache Spark query support, IBM watsonx.data has an advantage
engine and a Presto query engine. What are the in the number and location of data sources
advantages of IBM watsonx.data over supported.
Amazon Athena?
• IBM watsonx.data has data connectors to many
different data sources across hybrid cloud.
Amazon Athena is designed for Amazon S3 file
access primarily. All other data sources require
the use of Amazon Lambda and coding through
Amazon Athena Federated Query.
• IBM watsonx.data provides flexible deployment

options versus being locked-in to AWS.
Objection handling against Amazon Athena
(2 of 2)
Amazon Athena is well-integrated within the AWS • IBM watsonx.data can provide data lakehouse
ecosystem and AWS is our chosen cloud provider. capabilities for data sources outside of AWS
What is the benefit of adding watsonx.data to our and provides data connectors to many data
product portfolio? sources versus having to utilize Amazon
Lambda and coding for any data source that
is not Amazon S3.
• IBM watsonx.data can be deployed on multiple

clouds or private cloud/on-premises for
flexibility that Amazon Athena cannot provide.
• IBM has more research and development

expertise with query engines and
query optimizations than AWS for
future enhancements.
Coexistence objections
Objection handling against Snowflake
(1 of 2)
IBM response
Objection
• Data lakehouses generally scale by adding
Snowflake scales-up and scales-out. This provides compute nodes to the environment. Snowflake
performance and the ability to scale dynamically compute scale-up is always 2x the current
is important for a data lakehouse platform, as compute configuration.
workloads vary throughout the day.
• This is cost prohibitive as data lakehouse data
volumes grow, a lakehouse environment with
8 compute nodes likely doesn’t need to
automatically scale-up to 16 compute nodes.
• The value of a data lakehouse is to be able to

access less frequently used data at a lower cost
than a typical data warehouse. Without this
lower cost possibility, the solution is not a data
lakehouse but identical to a data warehouse.
• IBM watsonx.data offers true data lakehouse

Competitive objections cost benefits for large volumes of data versus
a data warehouse alone.
Objection handling against Snowflake
(2 of 2)
Snowflake has been selected as our cloud data • Snowflake is a cloud data warehouse and in
platform and we don’t see why we need IBM order to access data, you must do everything
watsonx.data as a data lakehouse platform. within the Snowflake environment and
What would be the advantage of adding ecosystem. Clients are successful only if they
watsonx.data to our environment? can move all data into Snowflake or access all
data through a Snowflake virtual warehouse.
• IBM watsonx.data provides the flexibility to work

with many external data sources directly. Using
its included Presto engine, SQL queries can be
executed against data sources while delivering
granular compute scaling allowing insights into
all types of data at a much lower cost for data
external to Snowflake.
Objection handling against Redshift Spectrum
(1 of 2)
Amazon S3 files are the only source of external • IBM watsonx.data allows you to access
data we need to access, and we already use Amazon S3 files as well as many other data
Amazon Redshift as our data warehouse. Why sources across a hybrid cloud without the
should we choose IBM watsonx.data as our requirement for a companion data warehouse
data lakehouse platform? cluster as requirements change over time.
• IBM watsonx.data uses its own compute nodes

and does not require a data warehouse cluster
(unlike the Amazon Redshift Spectrum and
Amazon Redshift cluster required). Clients pay
only for the compute nodes used by the query
itself versus the additive cost of Amazon
Redshift Spectrum compute plus the Amazon
Redshift data warehouse cluster compute.
Objection handling against Redshift Spectrum
(2 of 2)
Amazon Redshift Spectrum is serverless and • Adding IBM watsonx.data future proofs the data
allows us to query Amazon S3 data without lakehouse solution by allowing all types of data
loading the data into Amazon Redshift. How sources to be supported within the lakehouse.
does it benefit our organization to add IBM
watsonx.data as a data lakehouse platform? • Watsonx.data has no dependencies on other
services to deliver query results for the data
lakehouse with no competition for compute
resources with your data warehouse workload.
• Watsonx.data will allow queries on many

different data sources without limiting the
data lakehouse to only AWS.
Objection handling against Microsoft OneLake
(1 of 2)
Microsoft OneLake is part of Microsoft Fabric and • Microsoft OneLake works with Azure Data Lake
we have selected Fabric as our data fabric Storage (ADLS) only and is not a data lakehouse
solution within Microsoft Azure. solution outside of Microsoft Azure and S3 files
within AWS.
• IBM watsonx.data provides a complete data

lakehouse solution that spans both multi-cloud
and hybrid cloud environments, Microsoft
OneLake does not provide this level of flexibility.
• IBM watsonx.data can work with data that is

outside of Microsoft Azure and Amazon S3 and
uses the more open Apache Iceberg open table
format versus the Databricks controlled Delta
Competitive objections Lake open table format used by Microsoft.
Objection handling against Microsoft OneLake
(2 of 2)
Microsoft OneLake and Microsoft Fabric provides • IBM watsonx.data supports multiple public
us the data lakehouse solution that we need today clouds and private clouds/on-premises
on Microsoft Azure. What is the advantage of deployments which Microsoft OneLake does not
adding IBM watsonx.data to our existing solution? offer as an option.
• IBM watsonx.data can directly query data in

Apache Iceberg files, which is used by several
other data lakehouse vendors (Starburst,
Dremio, Amazon Athena, and others) instead of
creating a shortcut to process non-Delta Lake
files within Microsoft OneLake.
• Presto SQL is more of a standard query language

than the Kusto Query Language (KQL) developed
Coexistence objections by Microsoft and contributed to GitHub.
Objection handling against Teradata VantageCloud
(1 of 2)
Teradata VantageCloud Lake provides the data • IBM watsonx.data uses the Apache Iceberg
lake environment that our organization requires open table format, VantageCloud Lake provides
and is multi-cloud. the ability to use cloud object storage.
• IBM watsonx.data uses many open-source

components to prevent vendor lock-in by a
client. Apache Spark, Presto, and Apache
Iceberg, are used by watsonx.data with Teradata
VantageCloud containing no open-source query
engine.
• IBM watsonx.data provides Presto and Spark as

query engines to reduce compute cost. Teradata
VantageCloud Lake uses the Teradata
Competitive objections proprietary query engine.
Objection handling against Teradata VantageCloud
(2 of 2)
Teradata VantageCloud Lake allows us to use less • The use of cloud object storage reduces the data
expensive cloud object storage, what advantage storage cost only, IBM watsonx.data provides
does IBM watsonx.data provide to justify adding it multiple open-source query engines and
to our environment? specialized data warehouse query engines to
allow clients to cost optimize their query
workloads based on client performance needs.
IBM watsonx.data supports multi-cloud and

hybrid cloud deployment options, Teradata
VantageCloud Lake is only on AWS (June 2023).
IBM watsonx.data supports many more data

sources with connectors and can access any table
that uses Apache Iceberg directly. This provides a
Coexistence objections true data lakehouse capability that VantageCloud
Lake cannot deliver.
Objection handling against Google Cloud Platform (GCP)
(1 of 2)
Google Cloud is our chosen public cloud platform • IBM watsonx.data is a data lakehouse solution
and provides the data lakehouse architecture that versus an architecture that a client must build
meets our requirements. Why should we consider themselves using components.
IBM watsonx.data as a data lakehouse solution?
• IBM watsonx.data supports both open-source
query engines and specialized data warehouse
engines versus the single proprietary SQL query
engine (Google BigQuery).
• IBM watsonx.data uses the Apache Iceberg open

table format that is used by other vendors. This
allows many different data assets to be directly
queried as tables by watsonx.data and there are
connectors to other external data sources
Competitive objections providing a data lakehouse that can span multi-
cloud and hybrid cloud environments.
Objection handling against Google Cloud Platform (GCP)
(2 of 2)
Google Cloud is our data lakehouse platform, what • IBM watsonx.data extends your current data
is the advantage of adding IBM watsonx.data to lakehouse to multi-cloud and hybrid cloud to
our existing data lakehouse environment? include data assets on other clouds and even
IBM Db2 on Z data assets.
• IBM watsonx.data includes multiple open-

source query engines along with support for
existing IBM data warehouse specialized
engines to ensure your data lakehouse is not
locked-in to a single cloud platform.
• IBM watsonx.data uses multiple open-source

components to ensure that clients are not
locked-in to a single vendor solution and have
Coexistence objections flexibility and deployment options.
Objection handling against Oracle
(1 of 2)
Oracle provides MySQL HeatWave Lakehouse on • IBM watsonx.data has many open-source
both Oracle Cloud Infrastructure (OCI) and components to provide flexibility and prevent
Amazon Web Services (AWS). What advantages lock-in to the IBM solution for clients. MySQL
does IBM watsonx.data have over Oracle? HeatWave Lakehouse is a proprietary Oracle
public cloud only offering.
• IBM watsonx.data can be deployed as a self-

managed solution on any public cloud or hybrid
cloud platform that supports Red Hat OpenShift.
MySQL HeatWave Lakehouse deployment is
limited to OCI and AWS.
• IBM watsonx.data uses Apache Iceberg as an

open table format versus MySQL HeatWave
Competitive objections Lakehouse using open file formats. Open table
formats have a catalog and metadata about files.
Objection handling against Oracle
(2 of 2)
MySQL HeatWave Lakehouse is our data • IBM watsonx.data uses Apache Parquet as an
lakehouse solution on OCI. What is are the open table format versus MySQL HeatWave
advantages of adding IBM watsonx.data to our Lakehouse using open file formats. This means
existing lakehouse solution? that the files in cloud object storage do not have
a central catalog and a client must configure OCI
Data Catalog for the catalog component.
• IBM watsonx.data has more deployment options

across multi-cloud and hybrid cloud versus
MySQL HeatWave Lakehouse (AWS and OCI).
• IBM watsonx.data uses open-source Apache

Iceberg as the open table format and Apache
Spark and Ahana Presto as open-source query
Coexistence objections engines. MySQL HeatWave Lakehouse is a
proprietary public cloud only offering by Oracle.
Setting Traps
against Databricks Ask the client if they want simple, Ask the client if they want
easy to understand pricing for their data lakehouse SQL
their data lakehouse. workloads to be as performant
as SQL workloads within a
Reason data warehouse.
Databricks measures
consumption and calculates cost Reason
through Databricks (consumption) Although Databricks has improved
Units (DBUs). The cost per DBU their SQL query performance with
their new Photon engine, it will
still not be as performant as what
IBM offers with their Netezza and
All Purpose Compute for Db2 Warehouse engines today.
Interactive Workloads, Delta Live IBM watsonx.data provides
Tables (DLT), and others). This multiple engines to provide fit-for-
makes Databricks pricing complex purpose performance for data
and makes it difficult for clients to warehouse workloads.

against Dremio Ask the client if they are Ask the client if their query
considering only AWS to host workload will be restricted to
their fully managed data only SQL for their data lakehouse
lakehouse solution. or whether they will be executing
data engineering and/or
Reason ML/AI workloads.
Dremio does not offer a fully
managed solution on any other Reason
cloud provider other than AWS. Dremio’s query engine is designed
for SQL, it is not designed to
handle ML/AI or data engineering
workloads. IBM watsonx.data can
support SQL queries in addition to
other query types.

against Starburst Ask the client if they have Ask the client if they have a
requirements for ML/AI or workload with long-running
data engineering workloads. queries ETL workloads that
they need to execute.
Reason
Starburst’s query engine is Reason
focused strictly on SQL workloads. Starburst was designed for
IBM watsonx.data provides a ad-hoc query workloads and will
Spark engine for efficient not perform well for long-running
processing of ML/AI and query or ETL workloads. IBM
data engineering workloads. watsonx.data has the flexibility
Watsonx.data has multiple to handle different workload
engines to allow clients to choose types without impacting
the correct engine for their lakehouse performance.
intended workload and provide
workload flexibility and
performance optimization that
Starburst cannot deliver.

Setting traps against Trap to set/question to ask Trap to set/question to ask
Amazon Athena Ask the client if the only data Ask the client if they want
source they need to access is predictable lakehouse
contained within Amazon S3 files. query costs.
Reason Reason
Amazon Athena can only directly One of the leading client
access Amazon S3 files. All other complaints about Amazon
data sources require the use of Athena is unpredictable query
Amazon Athena Federated Query costs due to poorly optimized
which utilizes the separate queries and excessive charges
Amazon Lambda service and for long-running.
requires coding to access other
data services. Amazon Lambda
has many restrictions that
potential clients need to
understand prior to making
a lakehouse decision for
Amazon Athena.

against Snowflake Ask the client if cost is a Ask the client if they want a
consideration for their data complete data lakehouse solution
lakehouse solution. or just an augmented data
warehouse solution.
Reason
Snowflake’s 2x difference Reason
between scale-up compute Snowflake is primarily a data
configurations and the need warehouse platform. Although
to scale-out to up to 10 virtual they have some pieces of a data
warehouse clusters to handle lakehouse, Snowflake cannot
peak query concurrency access data everywhere.
results in exponential cost Snowflake’s ability to access on-
increases to clients that they premises data is limited. If a
will not experience with IBM client has legacy data sources
watsonx.data (2x to 3x to incorporate in their data
less expensive). lakehouse, Snowflake will be
unable this data without data
migration to Snowflake.

Amazon Redshift Spectrum Ask the client if they understand Ask the client if all data that they
that Amazon Redshift Spectrum need to query that is external to
requires Amazon Redshift data their Amazon Redshift data
warehouse compute resources warehouse is contained within
in addition to Amazon Redshift Amazon S3.
Spectrum compute charges.
Reason
Reason Amazon Redshift Spectrum can
Amazon Redshift Spectrum only query data stored in Amazon
requires an active Amazon Redshift S3 files that have been defined
data warehouse cluster and uses as external tables in an Amazon
compute nodes within Amazon Redshift data warehouse cluster.
Redshift during query processing. No other data can be queried by
The cost of an Amazon Redshift Amazon Redshift Spectrum.
Spectrum query is the cost of the
data scanned in Amazon S3 (per
TB) PLUS the compute charges
incurred during query execution
in Amazon Redshift.

Microsoft OneLake Ask the client if all the data that Ask the client if most data sources
will be contained in the data within the data lakehouse will be
lakehouse exists on Microsoft stored using the Delta Lake open
Azure and Amazon S3 files. table format?
Reason Reason
Microsoft OneLake can only access Microsoft OneLake can access
data files on Microsoft Azure or tables stored in Azure Data Lake
files contained within Amazon S3 Storage (ADLS) or Amazon S3
files on Amazon Web Services storage that is stored in Delta
(AWS). If a client has other data Parquet format. Only Databricks
requirements, Microsoft OneLake and Microsoft ADLS use the Delta
cannot meet their requirements. Parquet format. More data within
IBM watsonx.data allows data AWS uses Apache Iceberg
access across a hybrid and multi- Parquet including Amazon
cloud environment. Athena, Dremio, Starburst, and
IBM watsonx.data.

Teradata VantageCloud Ask the client if the largest Ask the client if their future data
expense component in their data lakehouse platform will be
warehouse cost is storage or is it exclusively public cloud or will
compute. they still have a need for a hybrid
cloud environment?
Reason
The fact that Teradata Reason
VantageCloud Lake can use cloud Teradata VantageCloud Lake is
object storage for table storage only available on the public cloud
only reduces client cost at the (AWS as of June 2023). Teradata
storage layer of the data lake. The is moving away from on-premises
compute cost for Teradata environments and will not be
VantageCloud will continue to be supported by future product
high for clients. Teradata offers no releases.
cost optimization opportunities for
client compute charges.

Google Cloud Platform (GCP) Ask the client if they want a data Ask the client if their data
lakehouse solution or if they want a lakehouse environment will span
data lakehouse architecture that multiple clouds or require a hybrid
they must implement themselves cloud deployment option.
using individual components?
Reason
Reason GCP solutions run on the Google
Google Cloud does not offer a Cloud with limited multi-cloud
single data lakehouse offering. A capabilities and no hybrid cloud
client is forced to choose the capability. If a client has other
various applicable services and put data sources to include in a data
them together to build their data lakehouse, then GCP is not a good
lakehouse solution. Databricks and choice as a data lakehouse
Starburst offer data lakehouse platform.
solutions on GCP.

Setting traps against Oracle Trap to set/question to ask Trap to set/question to ask
Ask the client if their data Ask the client if they want a data
lakehouse will be restricted to lakehouse solution where the
Oracle Cloud Infrastructure (OCI) query engine is controlled by a
and Amazon Web Services (AWS) single company (Oracle).
or will they need to access external
data assets. Reason
MySQL is an open-source
Reason database, but MySQL HeatWave
MySQL HeatWave Lakehouse, Lakehouse is a proprietary Oracle
which is Oracle’s primary data lakehouse offering that is only
lakehouse offering is available only available in the public cloud on
on OCI and AWS. If other data OCI and AWS.
platforms are required, MySQL
HeatWave Lakehouse is not the
correct choice for the data
lakehouse solution.

06 - IBM Watsonx - Data Competitive Insights

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06 - IBM Watsonx - Data Competitive Insights

Uploaded by

Copyright:

Available Formats

IBM watsonx.

and legal should not be shared with clients or anyone

1. Primary Competitors 2. Secondary Competitors

Augmented Data Data Lake(house)

IBM and Business Partner – Internal Use Only

1. Primary Competitors 2. Secondary Competitors

Augmented Data Data Lake(house)

IBM and Business Partner – Internal Use Only

Data Lakehouse Competitors

warehouse, and no hybrid cloud capability.

IBM and Business Partner – Internal Use Only

• Databricks focuses on a cloud-only solution, with no on-premises product.

• In May 2023, Databricks announced an investment in the privacy and access

• Dremio provides deployment flexibility as it can be deployed within the public

Data Lakehouse Competitors

Data Lakehouse Competitors

IBM and Business Partner – Internal Use Only

Others (Augmented data warehouse competitors)

Deployment options Public cloud only AWS only

IBM and Business Partner – Internal Use Only

Only single cloud Multi-cloud Both hybrid cloud

IBM and Business Partner – Internal Use Only

Single data Single data Two query engines Three or more

Amazon Dremio Amazon

Multiple open-source Multiple

Limited data Enterprise data

Developing market Limited market Leading market

(GA July 2023) Amazon Amazon

IBM and Business Partner – Internal Use Only

1. Primary Competitors 2. Secondary Competitors

Augmented Data Data Lake(house)

IBM and Business Partner – Internal Use Only

• The “OneDrive for data”

Follows the Azure Databricks standards

• Delta Lake with Parquet files

Other data sources must be defined as

IBM and Business Partner – Internal Use Only

• Can be accessed by Azure Data Lake Storage

• Azure Databricks and Azure HDInsight are two

OneLake is divided into workspaces

• Allows data to be accessed easily while

OneLake allows multiple data accesses

• Lakehouses, data warehouses, and other

Teradata VantageCloud - two offerings

• VantageCloud Enterprise – a rename of their

• VantageCloud Lake – a new offering

Targeted at cloud data warehouses

• Separate compute and storage

• Compute that scales in small increments,

• Ability to limit autoscaling to ensure a

• Rebranded analytics capabilities that includes

• Teradata data fabric is comprised of

• Ability to use cloud object storage for data

• Rather than a specific product to

At a high-level, a data lakehouse should be

• Data resides in the data lake

• Consists of structured, semi-structured, and

• All data sources exist in the single lakehouse

• GCP provides 4 query engines

Partner products that provide data

IBM and Business Partner – Internal Use Only

Oracle data lakehouse offerings

• This diagram highlights the various Oracle