Download as pdf or txt
Download as pdf or txt
You are on page 1of 113

IBM watsonx.

data
Competitive Insights

Content by:
Danny Arnold
Principal, Learning Content Development | Data & AI
darnold@us.ibm.com

Presenter:
Farah Auni Hisham
Technical Enablement Specialist | Data & AI
farah.hisham@ibm.com
Seller guidance Slides in this presentation marked as "IBM
and Business Partner Internal Use Only" are
for IBM and Business Partner use and
References in this presentation to IBM
products, programs, or services do not
imply that they will be available in all

and legal should not be shared with clients or anyone


else outside of IBM or the Business
Partners’ company.
countries in which IBM operates.
Product release dates and/or capabilities
referenced in this presentation may change

disclaimer
at any time at IBM’s sole discretion based
© IBM Corporation 2023. on market opportunities or other factors
All Rights Reserved. and are not intended to be a commitment to
future product or feature availability in any
The information contained in this way. Nothing contained in these materials is
publication is provided for informational intended to, nor shall have
purposes only. While efforts were made the effect of, stating or implying that any
to verify the completeness and accuracy activities undertaken by you will result
of the information contained in this in any specific sales, revenue growth,
IBM and Business Partner publication, it is provided AS IS without or other results.
warranty of any kind, express or implied.
Internal Use Only In addition, this information is based on All client examples described are presented
IBM’s current product plans and strategy, as illustrations of how those clients have
which are subject to change by IBM without used IBM products and the results they may
notice. IBM shall not be responsible for any have achieved. Actual environmental costs
damages arising out of the use of, or and performance characteristics may vary
otherwise related to, this publication or any by client.
other materials. Nothing contained in this
publication is intended to, nor shall have All statements in this report attributable to
the effect of, creating any warranties or Gartner represent IBM’s interpretation of
representations from IBM or its suppliers or data, research opinion or viewpoints
licensors, or altering the terms and published as part of a syndicated
conditions of the applicable license subscription service by Gartner, Inc., and
agreement governing the use of IBM have not been reviewed by Gartner. Each
software. Gartner publication speaks as of its original
publication date (and not as of the date of
this presentation). The opinions
expressed in Gartner publications are not
representations of fact and are subject to
change without notice.”
watsonx.data •1 Types of competitors
• Primary competitors
Competitive Insights • Background
Agenda • Key strengths and weaknesses
• Summary
• Secondary competitors
• Background
• Key strengths and weaknesses
• Summary
• Competitive positioning
• Watsonx.data differentiators
• Latest Updates
• Objection handling
• Setting traps for competitors
watsonx.data Competitors

1. Primary Competitors 2. Secondary Competitors

Augmented Data Data Lake(house)


Data Lakehouse
Warehouse Offerings

IBM and Business Partner – Internal Use Only


Primary competitors
watsonx.data Competitors

1. Primary Competitors 2. Secondary Competitors

Augmented Data Data Lake(house)


Data Lakehouse
Warehouse Offerings

IBM and Business Partner – Internal Use Only


Types of (Primary) Data lakehouse Augmented data
competitors warehouse competitors
Competitors
• Designed for both • Primarily designed for
A data lakehouse combines the best features of
structured and structured and semi-
data warehouses and data lakes to provide cost
unstructured data. structured data.
optimization for clients. Compute and storage
are separated so that data can be accessed
• Based on open • Uses proprietary or
from different engines.
table and data open file formats,
formats for but supports open
An augmented data warehouse may be able to
data storage. table format for
access different types of data, but all compute
data storage.
processing is performed through the data
• Fit for purpose query
warehouse engine. There is no compute cost
engine for different • Query processing
optimization allowing different engines, only the
use cases. uses data
compute amount used by the data warehouse
warehouse engine.
engine can be adjusted.
• Cost optimization
for compute engine • Compute and
and storage. storage are not
completely separated.
• Separate compute
and storage.
IBM and Business Partner – Internal Use Only
watsonx.data Competitor Overview

Databricks Databricks pioneered the term “Lakehouse”. It’s currently positioned as a leader in
the emerging Lakehouse market. Founded in 2013 by the creators of Apache Spark,

Data Lakehouse Competitors


Competitor Databricks offers a unified analytics platform for a variety of use-cases such as data
engineering, machine learning, data science, and AI.
Overview
Dremio Dremio has gained significant recognition for its proprietary Dremio query engine
(Weaknesses (Sonar) technology. It is positioned as a leader in the data lakehouse market, with a
growing market share. Founded in 2015, Dremio offers an open, cloud-native data
not mentioned) lakehouse engine that simplifies and accelerates data processing and analytics.

Starburst Starburst has made a name for itself in the data access and analytics space. It is
positioned as a leader in the enterprise data access market, with a growing market
share. Founded in 2017, Starburst offers a cloud-native platform that enables fast
and easy access to data across a range of sources.

Amazon Athena Amazon Athena was first released in 2016 and is the AWS data lakehouse offering that
All competitors are utilizes Apache Spark for analytics on data in open file formats and the Trino engine for
relatively new companies SQL queries. Amazon Athena combines with other AWS services, like AWS Lake
(within the past decade) Formation, for data governance to build a complete lakehouse solution.
and are rapidly growing in
the public cloud market Snowflake Founded in 2012, Snowflake has made significant strides in the cloud data
warehousing market and is currently positioned as a leader in the cloud data platform
space. Snowflake supports open table formats but locks clients into the Snowflake
environment that is locked and controlled (not open-source based). It has a single
SQL query engine and a limited ability to access data outside of a Snowflake data
Others

warehouse, and no hybrid cloud capability.

**Augmented data
Amazon Redshift Spectrum Amazon Redshift Spectrum is a Redshift service that allows direct queries on data
warehouse competitor stored in Amazon S3 files without having to load the data into an Amazon Redshift data
warehouse. Amazon Redshift Spectrum requires an active Amazon Redshift data
warehouse cluster to execute queries, so it is tightly integrated with Amazon Redshift
and extends the data warehouse to access external tables in Amazon S3.

IBM and Business Partner – Internal Use Only


Primary Competitors
Background
Competitor Background information
watsonx.data
Databricks • Databricks has partnered with AWS, Microsoft Azure, and Google Cloud Platform,
but has special optimizations with Microsoft Azure, including tight integrations
with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics,
Competitors Power BI and other Azure services.

• Databricks focuses on a cloud-only solution, with no on-premises product.


Background • Databricks introduced its Photon engine to speed up SQL queries over their
(1 of 3) previous engine for data warehouse workloads. Photon is a compiled C++ query
engine for SQL and DataFrame API workloads and delivers 3x to 8x performance
gains over the previous query engine.

• It’s built on open-source Delta Lake and Delta Sharing components, but Databricks
has proprietary extensions to move clients to Databricks versus open-source.

• In May 2023, Databricks announced an investment in the privacy and access


control market. They invested in the modern cloud focused protection platform,
Immuta, and acquired Okera, a data access and governance platform.

Dremio • Based on community driven standards such as Apache Arrow, Apache Iceberg,
and Apache Parquet.

• Has the Dremio Sonar query engine to assist in providing self-service analytics
along with a shared semantic layer for governed, self-service data access to
provide a consistent view of the data along with transparent query acceleration.

• Dremio provides deployment flexibility as it can be deployed within the public


cloud or deployed standalone on-premises.

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
watsonx.data
Competitor Background information
Competitors Starburst • Starburst has expanded the focus from public cloud-only to include an on-premises
deployment option with Starburst Enterprise.

Background • Starburst has connectors to many data sources including data lakes (Hadoop HDFS,
Ceph, MinIO, and Dell/EMC ECS), data warehouses (Teradata, Oracle Exadata, IBM
(2 of 3) Netezza Performance Server (NPS)), message queues/NoSQL (Apache Kafka,
MongoDB, and Elastic), and current and legacy databases (MySQL, PostgreSQL,
IBM Db2, Microsoft SQL Server, and Oracle Database).

Amazon Athena • Amazon Athena includes both Apache Spark and the AWS Trino engines and
is central to the AWS data lakehouse strategy. Combined with AWS Glue as the AWS
serverless data integration service, AWS provides a strong data lakehouse offering.

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
Competitor Background information
watsonx.data
Snowflake • Snowflake has an extensive array of partners (200+) that help it provide services and
functionality for data fabric and other use case capabilities that Snowflake lacks.

Competitors • Snowflake provides many data sets and applications from partners in the
Snowflake Marketplace. However, rather than providing data sets for usage,
Snowflake includes the data sets as part of a Snowflake data warehouse, driving
Background additional data warehouse usage and a larger number of deployed data warehouses.

(3 of 3) • Snowflake is moving into transactional data support with their Unistore hybrid table
offering (in private preview as of September 2023) to be able to accommodate more
types of client workloads.

Amazon Redshift Spectrum • Amazon Redshift Spectrum is a component of the Amazon Redshift data warehouse
offering.

• Redshift Spectrum provides the ability for a client to query data within Amazon S3 files
directly, without having to move the data into Amazon Redshift.

• Query processing charges within Amazon Redshift Spectrum are based on the
amount of data processed by the query (not the query result).

• Amazon Redshift Spectrum by itself is not a data lakehouse solution as it can only
work with data files within Amazon S3.

Other Competitors
IBM and Business Partner – Internal Use Only
Primary Competitors:
Strengths & weaknesses

IBM and Business Partner – Internal Use Only


Primary Competitive landscape
Data lakehouse competitors

Details

Deployment options Public cloud only Public cloud & on-premises Public cloud & on-premises AWS only
Query engines • Apache Spark • Dremio Sonar (proprietary) • Starburst (Trino-based) • Apache Spark
• Photon • Amazon
(Trino-based)
Open table format support • Delta Lake • Apache Iceberg • Apache Iceberg • Apache Iceberg
• Delta Lake (Parquet only)

Others (Augmented data warehouse competitors)

Details

Deployment options Public cloud only AWS only


Query engines • Snowflake • Amazon Redshift
Open table format support • Supports Apache Iceberg but • None, uses Amazon S3 files
primarily uses Cloud Object defined as external tables
Storage (COS)

IBM and Business Partner – Internal Use Only


Primary Competitors at
a glance
watsonx.data
Primary competitors at a glance
Hybrid cloud

Only single cloud Multi-cloud Both hybrid cloud


deployment option deployment and multi-cloud
options deployment options

Amazon Amazon
Athena Redshift
Spectrum Dremio

Worst Best

IBM and Business Partner – Internal Use Only


watsonx.data
Primary competitors at a glance
Multiple query engines

Single data Single data Two query engines Three or more


warehouse engine lakehouse engine query engines

Amazon Dremio Amazon


Redshift Athena
Spectrum

Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Primary competitors at a glance
Open-source based

Multiple open-source Multiple


NO open-source Single open-source open-source
components and
components component components and
single vendor
focused community strong community

Dremio Amazon
Athena
Amazon
Redshift
Spectrum

Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Primary competitors at a glance
Data governance

Limited data Enterprise data


NO data Data governance
governance governance for all
governance requires a separate
capability assets in the data
capability product or service
lakehouse

Dremio

Amazon Amazon
Redshift Athena
Spectrum

Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Primary competitors at a glance
Market presence

Developing market Limited market Leading market


presence presence presence

(GA July 2023) Amazon Amazon


Athena Redshift
Spectrum
Dremio

Less More

IBM and Business Partner – Internal Use Only


Secondary
competitors
watsonx.data Competitors

1. Primary Competitors 2. Secondary Competitors

Augmented Data Data Lake(house)


Data Lakehouse
Warehouse Offerings

IBM and Business Partner – Internal Use Only


Microsoft OneLake overview
Microsoft OneLake

• The “OneDrive for data”


• Separates the data lake into workspaces
• All engines can access the data in OneLake
• Primarily supports Microsoft Azure but does
support Amazon S3 files

Follows the Azure Databricks standards


for open table format to define “tables”
within the data lake

• Delta Lake with Parquet files

Other data sources must be defined as


”shortcuts” (connections) and are
treated as “files” versus “tables”

IBM and Business Partner – Internal Use Only


Microsoft OneLake overview
Microsoft OneLake

• Can be accessed by Azure Data Lake Storage


(ADLS) compatible applications

• Azure Databricks and Azure HDInsight are two


of the applications that can access OneLake

OneLake is divided into workspaces

• Allows data to be accessed easily while


providing separation and security
between different groups of data sources

OneLake allows multiple data accesses


within each workspace

• Lakehouses, data warehouses, and other


data sources (shortcuts)
IBM and Business Partner – Internal Use Only
Teradata VantageCloud overview

Teradata VantageCloud - two offerings

• VantageCloud Enterprise – a rename of their


earlier fully managed cloud solution

• VantageCloud Lake – a new offering


(currently only on AWS) that allows database
files to reside on Amazon S3

Targeted at cloud data warehouses

• Separate compute and storage

• Compute that scales in small increments,


but can scale very large

• Ability to limit autoscaling to ensure a


client does not overspend their budget
IBM and Business Partner – Internal Use Only
Teradata VantageCloud overview
ClearScope Analytics

• Rebranded analytics capabilities that includes


new features and functions
• IBM has comparable capabilities within the
data warehouse engines and AI capabilities

Data Fabric

• Teradata data fabric is comprised of


QueryGrid, data discovery and catalog,
and Teradata Industry Models

Object Storage

• Ability to use cloud object storage for data


warehouse data storage (currently only
Amazon S3 storage on AWS) –
VantageCloud Lake offering
IBM and Business Partner – Internal Use Only
Google Cloud Platform Data
lakehouse architecture overview
Google Cloud Platform (GCP)
data lakehouse architecture

• Rather than a specific product to


deliver a lakehouse solution,
GCP provides an architecture

At a high-level, a data lakehouse should be


able to perform a variety of workloads (BI,
reports, data science, and ML)

• Data resides in the data lake

• Consists of structured, semi-structured, and


unstructured data

• All data sources exist in the single lakehouse


with metadata, caching, and indexing
existing within the data lake infrastructure
IBM and Business Partner – Internal Use Only
Google Cloud Platform Data
lakehouse architecture overview
Google Cloud Platform (GCP)
data lakehouse architecture

• GCP provides 4 query engines


• Dataproc – Hadoop and Spark engine
• Vertex AI – Unified MLOps platform to
enable large scale model building with
limited coding
• BigQuery – SQL query engine
• Serverless Spark – Spark engine

Partner products that provide data


lakehouse capabilities are available from

• Databricks

• Starburst

IBM and Business Partner – Internal Use Only


Oracle Data lake overview

Oracle data lakehouse offerings

• This diagram highlights the various Oracle


components (”O” in upper right of box)

• Oracle supports open file formats, but does


NOT support open table formats such as
Apache Iceberg, Delta Lake, or Hudi

• MySQL HeatWave Lakehouse can query up


to 400 TB of data across HeatWave clusters
of up to 512 nodes

• Query engines are Oracle Autonomous


Database, MySQL Heatwave, and Apache
Spark

• MySQL HeatWave Lakehouse supports OCI


and AWS to provide multi-cloud deployment
IBM and Business Partner – Internal Use Only
options but NOT hybrid cloud deployment
Oracle Data lake overview

Oracle data lakehouse offerings

• Oracle has the new MySQL HeatWave


Lakehouse offering or clients can combine to
create their own data lake

• OCI Data Catalog is a metadata


management service that allows clients to
discover various (Oracle) data sources within
the cloud or on-premises

• The data repositories are either Oracle


Autonomous Database, MySQL HeatWave
Lakehouse, or Oracle BigData (Apache
Hadoop with Spark)

• Files in cloud object storage can be accessed


by MySQL HeatWave Lakehouse for open file
formats (Parquet, CSV, and others)
IBM and Business Partner – Internal Use Only
Secondary Competitors:
Strengths & weaknesses

IBM and Business Partner – Internal Use Only


Secondary Competitors
at a glance
watsonx.data
Secondary competitors at a glance
Hybrid cloud

Only single cloud Multi-cloud Both hybrid cloud


deployment option deployment and multi-cloud
options deployment options

Worst Best

IBM and Business Partner – Internal Use Only


watsonx.data
Secondary competitors at a glance
Multiple query engines

Single data Single data Two query engines Three or more


warehouse engine lakehouse engine query engines

Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Secondary competitors at a glance
Open-source based

Multiple open-source Multiple


NO open-source Single open-source open-source
components and
components component components and
single vendor
focused community strong community

Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Secondary competitors at a glance
Data governance

Limited data Enterprise data


NO data Data governance
governance governance for all
governance requires a separate
capability assets in the data
capability product or service
lakehouse

Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Secondary competitors at a glance
Market presence

Developing market Limited market Leading market


presence presence presence

(GA July 2023)

Less More

IBM and Business Partner – Internal Use Only


Competitive positioning
Competitive positioning options
Primary competitors
Competitive (everything or nothing) strategy

• Complete data lakehouse solution

Coexistence (surround) strategy


• Incomplete data lakehouse solution
• Limited scope offerings

IBM and Business Partner – Internal Use Only


Competitive positioning options
Secondary competitors
Competitive (everything or nothing) strategy

• Use for ALL completely new competitive


opportunities where no competitor currently has a
lakehouse solution in place

Coexistence (surround) strategy

• If a client is already using a secondary


competitor’s lakehouse solution, then
a coexistence strategy is best

IBM and Business Partner – Internal Use Only


watsonx.data
Differentiators
watsonx.data
Differentiators

• No other data lakehouse offering has integrated • Other data lakehouse competitors do NOT have
data warehouse engines in addition to the the level of experience with mission critical
Apache Spark and open-source query engines applications, and level of research in query
optimization and query processing, as IBM
• The cloud hyperscalers (AWS, Microsoft Azure,
and GCP) along with Databricks provide no • Watsonx.data plus other IBM data sources
hybrid cloud deployment capability (Netezza Performance Server and Db2) deliver
a query performance spectrum not offered by
• Deployment flexibility in other clouds – no other data lakehouse competitors
other data lakehouse offering can be deployed
as easily across different cloud platforms • Watsonx.data and its selection of Apache
Iceberg and Presto delivers an open solution
versus a single contributor open-source lock-in

IBM and Business Partner – Internal Use Only


Latest Updates

IBM and Business Partner – Internal Use Only


Objection handling

From the competitor’s background and strength,


client could materialize those as objections towards
watsonx.data.

Let’s see some an example of how to handle


objections and craft IBM response.
Objection handling against Databricks

Objection IBM response

Databricks has partnerships with all three major


cloud vendors (AWS, Microsoft Azure, and GCP).
IBM watsonx.data is not available on all clouds
as a fully managed service, and it is important to
have the flexibility to choose any of these
cloud providers.

Competitive objections
IBM and Business Partner – Internal Use Only
Competitor Background information
watsonx.data
Databricks • Databricks has partnered with AWS, Microsoft Azure, and Google Cloud Platform,
but has special optimizations with Microsoft Azure, including tight integrations
with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics,
Competitors Power BI and other Azure services.

• Databricks focuses on a cloud-only solution, with no on-premises product.


Background • Databricks introduced its Photon engine to speed up SQL queries over their
(1 of 3) previous engine for data warehouse workloads. Photon is a compiled C++ query
engine for SQL and DataFrame API workloads and delivers 3x to 8x performance
gains over the previous query engine.

• It’s built on open-source Delta Lake and Delta Sharing components, but Databricks
has proprietary extensions to move clients to Databricks versus open-source.

• In May 2023, Databricks announced an investment in the privacy and access


control market. They invested in the modern cloud focused protection platform,
Immuta, and acquired Okera, a data access and governance platform.

Dremio • Based on community driven standards such as Apache Arrow, Apache Iceberg,
and Apache Parquet.

• Has the Dremio Sonar query engine to assist in providing self-service analytics
along with a shared semantic layer for governed, self-service data access to
provide a consistent view of the data along with transparent query acceleration.

• Dremio provides deployment flexibility as it can be deployed within the public


cloud or deployed standalone on-premises.

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
Strengths & weaknesses
(1 of 6)
Has a difficult learning curve
Market leader in the data
and setup complexity, it
lakehouse market and has
takes time for clients to be
mindshare of clients
productive with Databricks

Unity catalog for metadata High cost for poorly


management and data optimized workloads in
governance of all Databricks data production resulting in
assets including ML models higher-than-expected costs
for clients

Has published and certified No hybrid cloud or on-premises


100 TB TPC-DS benchmark to deployment option, ONLY
validate query performance public cloud

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
watsonx.data
Differentiators

• No other data lakehouse offering has integrated • Other data lakehouse competitors do NOT have
data warehouse engines in addition to the the level of experience with mission critical
Apache Spark and open-source query engines applications, and level of research in query
optimization and query processing, as IBM
• The cloud hyperscalers (AWS, Microsoft Azure,
and GCP) along with Databricks provide no • Watsonx.data plus other IBM data sources
hybrid cloud deployment capability (Netezza Performance Server and Db2) deliver
a query performance spectrum not offered by
• Deployment flexibility in other clouds – no other data lakehouse competitors
other data lakehouse offering can be deployed
as easily across different cloud platforms • Watsonx.data and its selection of Apache
Iceberg and Presto delivers an open solution
versus a single contributor open-source lock-in

IBM and Business Partner – Internal Use Only


Objection handling against Databricks

Objection IBM response

Databricks has partnerships with all three major • IBM watsonx.data can be deployed on any cloud
cloud vendors (AWS, Microsoft Azure, and GCP). provider that supports Red Hat OpenShift or
IBM watsonx.data is not available on all clouds within a private cloud or on-premises
as a fully managed service, and it is important to environment as a self managed solution.
have the flexibility to choose any of these
cloud providers. • Although IBM watsonx.data is not available on
all three cloud vendors (AWS, Microsoft Azure,
and GCP) as a fully managed service,
watsonx.data can be deployed on all three cloud
providers and IBM will provide that option in the
future as client demand dictates.

Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling

IBM and Business Partner – Internal Use Only


Setting traps

Sellers could set traps for clients during


watsonx.data discussions using:
✓ Competitor’s weaknesses
✓ Watsonx.data differentiators

Let’s see some an example of how to set your trap


questions.
Setting traps Trap to set/question to ask Trap to set/question to ask
against Databricks

Reason Reason

IBM and Business Partner – Internal Use Only


Strengths & weaknesses
(1 of 6)
Has a difficult learning curve
Market leader in the data
and setup complexity, it
lakehouse market and has
takes time for clients to be
mindshare of clients
productive with Databricks

Unity catalog for metadata High cost for poorly


management and data optimized workloads in
governance of all Databricks data production resulting in
assets including ML models higher-than-expected costs
for clients

Has published and certified No hybrid cloud or on-premises


100 TB TPC-DS benchmark to deployment option, ONLY
validate query performance public cloud

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
Setting traps Trap to set/question to ask Trap to set/question to ask
against Databricks Ask the client if they want simple,
easy to understand pricing for
their data lakehouse.

Reason Reason
Databricks measures
consumption and calculates cost
through Databricks (consumption)
Units (DBUs). The cost per DBU
varies based on different use
cases (Databricks SQL, Databricks
All Purpose Compute for
Interactive Workloads, Delta Live
Tables (DLT), and others). This
makes Databricks pricing complex
and makes it difficult for clients to
understand what a Databricks
Data Lakehouse will cost.

IBM and Business Partner – Internal Use Only


Link to watsonx Packaging and Pricing (owner: Jason Foss)
RECAP
watsonx.data
Differentiators

• No other data lakehouse offering has integrated • Other data lakehouse competitors do NOT have
data warehouse engines in addition to the the level of experience with mission critical
Apache Spark and open-source query engines applications, and level of research in query
optimization and query processing, as IBM
• The cloud hyperscalers (AWS, Microsoft Azure,
and GCP) along with Databricks provide no • Watsonx.data plus other IBM data sources
hybrid cloud deployment capability (Netezza Performance Server and Db2) deliver
a query performance spectrum not offered by
• Deployment flexibility in other clouds – no other data lakehouse competitors
other data lakehouse offering can be deployed
as easily across different cloud platforms • Watsonx.data and its selection of Apache
Iceberg and Presto delivers an open solution
versus a single contributor open-source lock-in

IBM and Business Partner – Internal Use Only


Setting traps Trap to set/question to ask Trap to set/question to ask
against Databricks Ask the client if they want simple, Ask the client if they want
easy to understand pricing for their data lakehouse SQL
their data lakehouse. workloads to be as performant
as SQL workloads within a
Reason data warehouse.
Databricks measures
consumption and calculates cost Reason
through Databricks (consumption) Although Databricks has improved
Units (DBUs). The cost per DBU their SQL query performance with
their new Photon engine, it will
varies based on different use
still not be as performant as what
cases (Databricks SQL, Databricks
IBM offers with their Netezza and
All Purpose Compute for Db2 Warehouse engines today.
Interactive Workloads, Delta Live IBM watsonx.data provides
Tables (DLT), and others). This multiple engines to provide fit-for-
makes Databricks pricing complex purpose performance for data
and makes it difficult for clients to warehouse workloads.
understand what a Databricks
Data Lakehouse will cost.

IBM and Business Partner – Internal Use Only


The integrated IBM watsonx.data ecosystem for maximum 1 Analyze Z data
workload coverage and optimal price-performance easily and securely
with Db2 for z/OS
Data Gate
IBM watsonx.data functionality Integrations at GA

2 Warehouses can
access data in
the lakehouse

Db2
Db2W Netezza Spark Presto
z/OS 3 The lakehouse
1
can access data
residing in
5 Db2/Netezza

4 Easily Promote
Watsonx.data IBM Knowledge
6 data between
Metadata Store Catalog
the warehouse
and lakehouse

5 Query routing service,


2 4 multiple engines can
access same data
lake data

Object storage Object storage KC policies


6
enforced by the
3 lakehouse via
metadata service

RECAP
Setting traps

IBM and Business Partner – Internal Use Only


Summary

• The key difference between an augmented


data warehouse and a data lakehouse is that
the data warehouse is the compute engine for
all queries versus multiple engines that allow • Watsonx.data is a better lakehouse choice
clients to cost optimize their queries. for clients that are already using IBM Db2
or Netezza Performance Server (NPS) than
• Limited ability for a data warehouse to offerings from Databricks, Dremio, and
access external data sources. Starburst due to the integration of
watsonx.data with Db2 and NPS.

• Amazon Athena, Amazon Redshift Spectrum,


and Snowflake provide both competitive • Databricks is the strongest data lakehouse
takeout and coexistence sales opportunities. competitor but does have complex pricing
and high cost compared to other offerings.
• Will provide data access and query
capability for data external to a data • Clients with smaller data lakehouse
warehouse and/or AWS. requirements will find Databricks
too costly.
Additional references
(1 of 3)

Documentation and technical information

• Starburst Enterprise documentation


• Snowflake documentation
• Databricks on Azure documentation
• Dremio documentation
• Amazon Athena documentation
• Amazon Redshift Spectrum documentation
Additional references
(2 of 3)

Articles and blogs

• Article on a client experience with Dremio


• Blog entry on The Good and Bad of Databricks Lakehouse Platform
• Blog entry on The Good and Bad of Snowflake Data Warehouse
• Blog entry on Data Lakehouse, beyond the hype
• Blog entry on Data Warehouses vs. Data Lakes vs. Data Lakehouses: Which Is Better for Your Business?
Additional references
(3 of 3)

Articles and blogs

• Blog entry on Amazon Athena Explained: What is it and When Should I Use it?
• Blog entry on Amazon Redshift Spectrum and how it works
• Blog entry on Exploring AWS Lambda Deployment Limits
• Blog entry on What’s the Difference between Trino and PrestoDB?
© 2023 International Business Machines Corporation

Thank you
IBM and the IBM logo are trademarks of IBM
Corporation, registered in many jurisdictions
worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list
of IBM trademarks is available on ibm.com/trademark.

THIS DOCUMENT IS DISTRIBUTED “AS IS” WITHOUT


ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN
NO EVENT, SHALL IBM BE LIABLE FOR ANY DAMAGE
ARISING FROM THE USE OF THIS INFORMATION,
INCLUDING BUT NOT LIMITED TO, LOSS OF DATA,
BUSINESS INTERRUPTION, LOSS OF PROFIT OR
LOSS OF OPPORTUNITY.

Client examples are presented as illustrations of how


those clients have used IBM products and the results
they may have achieved. Actual performance, cost,
savings or other results in other operating
environments may vary.

Not all offerings are available in every country in which


IBM operates.

IBM’s statements regarding its plans, directions, and


intent are subject to change or withdrawal without
notice at IBM’s sole discretion. Information regarding
potential future products is intended to outline our
general product direction and it should not be relied
on in making a purchasing decision. The information
mentioned regarding potential future products is not
a commitment, promise, or legal obligation to deliver
any material, code or functionality. Information about
potential future products may not be incorporated into
any contract. The development, release, and timing of
any future features or functionality described for our
products remains at our sole discretion.

Red Hat and OpenShift are registered trademarks of


Red Hat, Inc. or its subsidiaries in the United States
and other countries.
Appendix
Primary Competitors Strength & Weaknesses
Strengths & weaknesses
(1 of 6)
Has a difficult learning curve
Market leader in the data
and setup complexity, it
lakehouse market and has
takes time for clients to be
mindshare of clients
productive with Databricks

Unity catalog for metadata High cost for poorly


management and data optimized workloads in
governance of all Databricks data production resulting in
assets including ML models higher-than-expected costs
for clients

Has published and certified No hybrid cloud or on-premises


100 TB TPC-DS benchmark to deployment option, ONLY
validate query performance public cloud

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
Strengths & weaknesses Dremio

(2 of 6)
Has a proprietary Sonar Star schema design and table
query engine that seems to joins are not optimal, clients
deliver better performance may need to re-write queries
than open-source query and re-design table structures
engines

Has limited marketing


Offers both public cloud presence, clients may not be
and on-premises aware of the Dremio offerings
deployment options

Has competitive pricing with


a low-end, no-cost package Fully managed Dremio is
that allows clients to easily only available on AWS
try Dremio for small projects

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
Strengths & weaknesses
(3 of 6)
Has many data connectors for
Starburst is the primary
traditional databases, non-
contributor to Trino making it
relational databases, columnar
less open-source than Presto
databases, and other data sources

Designed for interactive (ad-hoc)


Offers both public cloud and queries, not optimized or
on-premises deployment options designed for long-running queries
for data warehouse workloads

Provides RBAC for table Limited data governance


level, column level, and row capability and no data
level access control lineage capability

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
Strengths & weaknesses Amazon
Athena
(4 of 6)

Expensive if queries are not


Part of AWS and integrates
optimized and/or using a
well with other AWS services
proper partitioning strategy

Provides an Apache Spark Non-Amazon S3 data sources require


and an Amazon Trino-based programming code and Athena
engine to provide multiple Federated Query using Amazon
query engines Lambda (separate AWS service)

Uses AWS Glue for data Inconsistent performance due to


integration and AWS Lake the serverless design with no
Formation for data governance dedicated compute configuration

Data Lakehouse Competitors


IBM and Business Partner – Internal Use Only
Strengths & weaknesses
(5 of 6)

Offers consumption-based Each T-shirt size scale-up


pricing that appeals to clients increment increases cost by
2x over the prior T-shirt size

Provides complete Limited hybrid cloud capability


separation between the for on-premises data sources,
data on storage and the only supports storage devices
compute engine within that follow the S3 file standard
a virtual warehouse

Automatic scale-out of multiple


Provides dynamic scale-up of virtual warehouse clusters
compute with no downtime during high query concurrency
periods increases cost by 100%
for each scale-out virtual
warehouse cluster
Other Competitors
IBM and Business Partner – Internal Use Only
Strengths & weaknesses Amazon
Redshift
(6 of 6) Spectrum

Requires a running
Infrequently accessed data Amazon Redshift data
can remain in Amazon S3 warehouse cluster to execute
Redshift Spectrum queries

Query processing incurs


Part of AWS, so Amazon Amazon Redshift compute
Redshift integrates with usage plus Redshift Spectrum
other AWS services data processing charges

Can access only Amazon S3


Provides the ability for Amazon
files that have been defined as
Redshift clients to access
external tables within an
Amazon S3 files directly
Amazon Redshift data
warehouse cluster

Other Competitors
IBM and Business Partner – Internal Use Only
Secondary Competitors Strength & Weaknesses
Strengths & weaknesses

Supports different
Tables must reside in Delta
query engines to allow
Parquet files
compute choice based
on query requirements

Non-Delta Parquet files must be


Allows access to Amazon S3 files manually and explicitly defined
in addition to Microsoft Azure as files using shortcuts
data sources (connections)

Can span geographies No hybrid cloud or multi-


and allows workspaces cloud deployment options
for security and
workload separation

IBM and Business Partner – Internal Use Only


Strengths & weaknesses

Teradata is a proven data


Teradata supports Parquet
warehouse provider with many
files, but does NOT support an
on-premises clients
open table format like Apache
Iceberg for table storage

Teradata does NOT offer multiple


Teradata is a leader in the cloud query engines to reduce client
database space in the most compute cost
recent Gartner Magic Quadrant

Teradata has strengthened VantageCloud Lake is difficult


their analytics capabilities to size as both a Principal
with their new ClearScope cluster and Compute clusters
Analytics component are required for workloads
using Object File System (OFS)

IBM and Business Partner – Internal Use Only


Strengths & weaknesses

Google Cloud Platform has Clients must piece together


the components for a client different services to build a
to build a data lakehouse data lakehouse solution

Google Cloud Platform does


Google Cloud Platform has many not have the data lakehouse
tools around ML and AI market presence of
AWS and Microsoft Azure

Google Cloud Platform offers For a more complete data


storage that is optimized for lakehouse solution, clients
open-source storage formats must purchase partner
solutions from Databricks
and Starburst

IBM and Business Partner – Internal Use Only


Strengths & weaknesses

Oracle offers multiple query Oracle and their lakehouse


engines for their lakehouse; offerings are limited to OCI
Oracle Autonomous and AWS deployments
Database, MySQL Heatwave,
and Apache Spark

Oracle has no hybrid cloud


Oracle tightly integrates the data deployment capability for MySQL
lakehouse offerings with Oracle HeatWave Lakehouse
cloud-based applications

Oracle Cloud Infrastructure Oracle lakehouse does NOT


(OCI) is optimized for Oracle support any open table
data sources and applications formats, so files are limited
to cloud object storage
within OCI and AWS

IBM and Business Partner – Internal Use Only


Latest Updates
watsonx.data
Latest updates

Databricks announcements IBM competitive analysis

• Lakehouse federation capabilities in Unity Catalog to • Unity Catalog update is in private preview
support external data sources • Limited number of public cloud data warehouses
and databases included but NO hybrid cloud
• Delta Lake 3.0 with Universal Format (UniForm) and
Liquid Clustering • Delta Lake UniForm is in public preview and Liquid
Clustering is announced only (no preview)
• UniForm allows Delta Lake to read Apache Iceberg
tables by creating Delta Lake metadata for the table
• Liquid Clustering adjusts the data layout in a Delta
Lake table based on partitioning key selection to
improve query performance

For the complete competitive analysis of


Databricks Data+AI Summit 2023, see
Unity Catalog will support ONLY this write-up on Seismic
these 7 public cloud data sources

June 2023 announcements


watsonx.data
Latest updates

Snowflake announcements IBM competitive analysis

• Managed Apache Iceberg tables • Managed Apache Iceberg tables are in


private preview
• Dynamic Tables • Preliminary information indicates that the
Apache Iceberg table is converted to use the
Snowflake catalog and is no longer managed by
the Iceberg catalog

• Dynamic Tables are enhanced Materialized Views


(MVs) in public preview that bring Snowflake on-par
with other data warehouses on the public cloud, the
goal is to reduce full table rebuilds for data pipelines

For the complete competitive analysis of


Managed Apache Iceberg tables are Snowflake Summit 2023, see this write-
converted to use the Snowflake catalog up on Seismic
June 2023 announcements
watsonx.data
Latest updates Iceberg table support

Snowflake update IBM competitive analysis

• Single Iceberg table type • Snowflake wants clients to use Snowflake catalog for
Iceberg tables
• Two catalog management options • Delivers better query performance
• External (Iceberg) catalog • Locks client into Snowflake query engine
• Managed by Snowflake
• Snowflake claims “catalog integration” if using
• Performance implications for catalog management external Iceberg catalog
• Currently only supports AWS Glue Catalog with
catalog integration
• Can only define Iceberg tables as External Tables for
production Snowflake use as of October 2023

For the complete competitive analysis of


Snowflake Iceberg table support see this
September 2023 update Currently in PRIVATE preview write-up on Seismic
watsonx.data
Latest updates

MySQL HeatWave Lakehouse announcement IBM competitive analysis

• Can access data in open file formats like Apache • No hybrid cloud deployment option, only AWS and OCI
Parquet, CSV, and others are the only supported cloud platforms

• Can access database export files from Oracle • Internal engine processing uses a proprietary format
Database, MySQL, Amazon Aurora, and Amazon
Redshift • Released a non-certified TPC-H benchmark result

For the complete competitive analysis of


MySQL HeatWave Lakehouse, see this
write-up on Seismic
July 2023 announcements
watsonx.data
Latest updates
Serverless

Amazon Web Services announcement IBM competitive analysis

• Amazon Redshift Serverless or Amazon Redshift • Very limited capability in using Iceberg tables from
Spectrum can now query Apache Iceberg tables Amazon Redshift

• Iceberg table access is read-only • Requires Amazon Athena or Amazon EMR to write to
Iceberg tables
• Iceberg table must be cataloged in the AWS
Glue Data Catalog • Not a lakehouse solution and uses proprietary
components within Amazon Redshift
• Iceberg table access is primarily restricted to
tables located within AWS • No time travel or data sharing capabilities within
Amazon Redshift using Iceberg tables.
IN PREVIEW MODE ONLY – August 2023 For more information on Amazon
(non-production use) Redshift Iceberg table support, see this
write-up on Seismic
August 2023 announcements
Objection Handling
Objection handling against Databricks

Objection IBM response

Databricks has partnerships with all three major • IBM watsonx.data can be deployed on any cloud
cloud vendors (AWS, Microsoft Azure, and GCP). provider that supports Red Hat OpenShift or
IBM watsonx.data is not available on all clouds within a private cloud or on-premises
as a fully managed service, and it is important to environment as a self managed solution.
have the flexibility to choose any of these
cloud providers. • Although IBM watsonx.data is not available on
all three cloud vendors (AWS, Microsoft Azure,
and GCP) as a fully managed service,
watsonx.data can be deployed on all three cloud
providers and IBM will provide that option in the
future as client demand dictates.

Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling against Dremio

Objection IBM response

Dremio provides a simple, fast, and cost-effective • Performance and reliability, Dremio has
platform for analytics, ML, and other data-driven a history of issues with new releases.
applications. What advantages does IBM Watsonx.data is based on open-source Apache
watsonx.data have over Dremio? Spark and Presto which are well-tested and
proven technologies. IBM has a long history
with query engines and query optimization
and will incorporate this knowledge into
watsonx.data to continually improve
performance and query efficiency.

• Enterprise-class capabilities, IBM provides


strong data governance and security around
watsonx.data that Dremio cannot provide.

Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling against Starburst
Objection IBM response

Starburst provides a query engine that can • IBM watsonx.data provides multiple query
efficiently process an analytic workload across engines: Apache Spark and Presto are the
many different data sources. What advantages query engines currently available (along with
does IBM watsonx.data provide over Starburst? the optional Db2 and Netezza Performance
Server (NPS) specialized data warehouse
engines) versus the single primarily SQL
query engine within Starburst.

• Watsonx.data provides data governance,


security, and deep integration with IBM
data sources including the IBM Z platform
that Starburst does not provide without 3rd
party integrations.

• IBM has a decades long history of query engine


research and development including query
optimizations within IBM Research that
Competitive objections Starburst cannot match.
IBM and Business Partner – Internal Use Only
Objection handling against Amazon Athena
(1 of 2)

Objection IBM response

Amazon Athena has an Apache Spark query • While it is true that both Amazon Athena and
engine and a Trino SQL query engine and IBM IBM watsonx.data provide multiple engine
watsonx.data uses an Apache Spark query support, IBM watsonx.data has an advantage
engine and a Presto query engine. What are the in the number and location of data sources
advantages of IBM watsonx.data over supported.
Amazon Athena?
• IBM watsonx.data has data connectors to many
different data sources across hybrid cloud.
Amazon Athena is designed for Amazon S3 file
access primarily. All other data sources require
the use of Amazon Lambda and coding through
Amazon Athena Federated Query.

• IBM watsonx.data provides flexible deployment


options versus being locked-in to AWS.
Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling against Amazon Athena
(2 of 2)

Objection IBM response

Amazon Athena is well-integrated within the AWS • IBM watsonx.data can provide data lakehouse
ecosystem and AWS is our chosen cloud provider. capabilities for data sources outside of AWS
What is the benefit of adding watsonx.data to our and provides data connectors to many data
product portfolio? sources versus having to utilize Amazon
Lambda and coding for any data source that
is not Amazon S3.

• IBM watsonx.data can be deployed on multiple


clouds or private cloud/on-premises for
flexibility that Amazon Athena cannot provide.

• IBM has more research and development


expertise with query engines and
query optimizations than AWS for
future enhancements.
Coexistence objections
IBM and Business Partner – Internal Use Only
Objection handling against Snowflake
(1 of 2)
IBM response
Objection
• Data lakehouses generally scale by adding
Snowflake scales-up and scales-out. This provides compute nodes to the environment. Snowflake
performance and the ability to scale dynamically compute scale-up is always 2x the current
is important for a data lakehouse platform, as compute configuration.
workloads vary throughout the day.
• This is cost prohibitive as data lakehouse data
volumes grow, a lakehouse environment with
8 compute nodes likely doesn’t need to
automatically scale-up to 16 compute nodes.

• The value of a data lakehouse is to be able to


access less frequently used data at a lower cost
than a typical data warehouse. Without this
lower cost possibility, the solution is not a data
lakehouse but identical to a data warehouse.

• IBM watsonx.data offers true data lakehouse


Competitive objections cost benefits for large volumes of data versus
IBM and Business Partner – Internal Use Only
a data warehouse alone.
Objection handling against Snowflake
(2 of 2)

Objection IBM response

Snowflake has been selected as our cloud data • Snowflake is a cloud data warehouse and in
platform and we don’t see why we need IBM order to access data, you must do everything
watsonx.data as a data lakehouse platform. within the Snowflake environment and
What would be the advantage of adding ecosystem. Clients are successful only if they
watsonx.data to our environment? can move all data into Snowflake or access all
data through a Snowflake virtual warehouse.

• IBM watsonx.data provides the flexibility to work


with many external data sources directly. Using
its included Presto engine, SQL queries can be
executed against data sources while delivering
granular compute scaling allowing insights into
all types of data at a much lower cost for data
external to Snowflake.
Coexistence objections
IBM and Business Partner – Internal Use Only
Objection handling against Redshift Spectrum
(1 of 2)

Objection IBM response

Amazon S3 files are the only source of external • IBM watsonx.data allows you to access
data we need to access, and we already use Amazon S3 files as well as many other data
Amazon Redshift as our data warehouse. Why sources across a hybrid cloud without the
should we choose IBM watsonx.data as our requirement for a companion data warehouse
data lakehouse platform? cluster as requirements change over time.

• IBM watsonx.data uses its own compute nodes


and does not require a data warehouse cluster
(unlike the Amazon Redshift Spectrum and
Amazon Redshift cluster required). Clients pay
only for the compute nodes used by the query
itself versus the additive cost of Amazon
Redshift Spectrum compute plus the Amazon
Redshift data warehouse cluster compute.
Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling against Redshift Spectrum
(2 of 2)

Objection IBM response

Amazon Redshift Spectrum is serverless and • Adding IBM watsonx.data future proofs the data
allows us to query Amazon S3 data without lakehouse solution by allowing all types of data
loading the data into Amazon Redshift. How sources to be supported within the lakehouse.
does it benefit our organization to add IBM
watsonx.data as a data lakehouse platform? • Watsonx.data has no dependencies on other
services to deliver query results for the data
lakehouse with no competition for compute
resources with your data warehouse workload.

• Watsonx.data will allow queries on many


different data sources without limiting the
data lakehouse to only AWS.

Coexistence objections
IBM and Business Partner – Internal Use Only
Objection handling against Microsoft OneLake
(1 of 2)

Objection IBM response

Microsoft OneLake is part of Microsoft Fabric and • Microsoft OneLake works with Azure Data Lake
we have selected Fabric as our data fabric Storage (ADLS) only and is not a data lakehouse
solution within Microsoft Azure. solution outside of Microsoft Azure and S3 files
within AWS.

• IBM watsonx.data provides a complete data


lakehouse solution that spans both multi-cloud
and hybrid cloud environments, Microsoft
OneLake does not provide this level of flexibility.

• IBM watsonx.data can work with data that is


outside of Microsoft Azure and Amazon S3 and
uses the more open Apache Iceberg open table
format versus the Databricks controlled Delta
Competitive objections Lake open table format used by Microsoft.
IBM and Business Partner – Internal Use Only
Objection handling against Microsoft OneLake
(2 of 2)

Objection IBM response

Microsoft OneLake and Microsoft Fabric provides • IBM watsonx.data supports multiple public
us the data lakehouse solution that we need today clouds and private clouds/on-premises
on Microsoft Azure. What is the advantage of deployments which Microsoft OneLake does not
adding IBM watsonx.data to our existing solution? offer as an option.

• IBM watsonx.data can directly query data in


Apache Iceberg files, which is used by several
other data lakehouse vendors (Starburst,
Dremio, Amazon Athena, and others) instead of
creating a shortcut to process non-Delta Lake
files within Microsoft OneLake.

• Presto SQL is more of a standard query language


than the Kusto Query Language (KQL) developed
Coexistence objections by Microsoft and contributed to GitHub.
IBM and Business Partner – Internal Use Only
Objection handling against Teradata VantageCloud
(1 of 2)

Objection IBM response

Teradata VantageCloud Lake provides the data • IBM watsonx.data uses the Apache Iceberg
lake environment that our organization requires open table format, VantageCloud Lake provides
and is multi-cloud. the ability to use cloud object storage.

• IBM watsonx.data uses many open-source


components to prevent vendor lock-in by a
client. Apache Spark, Presto, and Apache
Iceberg, are used by watsonx.data with Teradata
VantageCloud containing no open-source query
engine.

• IBM watsonx.data provides Presto and Spark as


query engines to reduce compute cost. Teradata
VantageCloud Lake uses the Teradata
Competitive objections proprietary query engine.
IBM and Business Partner – Internal Use Only
Objection handling against Teradata VantageCloud
(2 of 2)

Objection IBM response

Teradata VantageCloud Lake allows us to use less • The use of cloud object storage reduces the data
expensive cloud object storage, what advantage storage cost only, IBM watsonx.data provides
does IBM watsonx.data provide to justify adding it multiple open-source query engines and
to our environment? specialized data warehouse query engines to
allow clients to cost optimize their query
workloads based on client performance needs.

IBM watsonx.data supports multi-cloud and


hybrid cloud deployment options, Teradata
VantageCloud Lake is only on AWS (June 2023).

IBM watsonx.data supports many more data


sources with connectors and can access any table
that uses Apache Iceberg directly. This provides a
Coexistence objections true data lakehouse capability that VantageCloud
Lake cannot deliver.
IBM and Business Partner – Internal Use Only
Objection handling against Google Cloud Platform (GCP)
(1 of 2)

Objection IBM response

Google Cloud is our chosen public cloud platform • IBM watsonx.data is a data lakehouse solution
and provides the data lakehouse architecture that versus an architecture that a client must build
meets our requirements. Why should we consider themselves using components.
IBM watsonx.data as a data lakehouse solution?
• IBM watsonx.data supports both open-source
query engines and specialized data warehouse
engines versus the single proprietary SQL query
engine (Google BigQuery).

• IBM watsonx.data uses the Apache Iceberg open


table format that is used by other vendors. This
allows many different data assets to be directly
queried as tables by watsonx.data and there are
connectors to other external data sources
Competitive objections providing a data lakehouse that can span multi-
cloud and hybrid cloud environments.
IBM and Business Partner – Internal Use Only
Objection handling against Google Cloud Platform (GCP)
(2 of 2)

Objection IBM response

Google Cloud is our data lakehouse platform, what • IBM watsonx.data extends your current data
is the advantage of adding IBM watsonx.data to lakehouse to multi-cloud and hybrid cloud to
our existing data lakehouse environment? include data assets on other clouds and even
IBM Db2 on Z data assets.

• IBM watsonx.data includes multiple open-


source query engines along with support for
existing IBM data warehouse specialized
engines to ensure your data lakehouse is not
locked-in to a single cloud platform.

• IBM watsonx.data uses multiple open-source


components to ensure that clients are not
locked-in to a single vendor solution and have
Coexistence objections flexibility and deployment options.
IBM and Business Partner – Internal Use Only
Objection handling against Oracle
(1 of 2)

Objection IBM response

Oracle provides MySQL HeatWave Lakehouse on • IBM watsonx.data has many open-source
both Oracle Cloud Infrastructure (OCI) and components to provide flexibility and prevent
Amazon Web Services (AWS). What advantages lock-in to the IBM solution for clients. MySQL
does IBM watsonx.data have over Oracle? HeatWave Lakehouse is a proprietary Oracle
public cloud only offering.

• IBM watsonx.data can be deployed as a self-


managed solution on any public cloud or hybrid
cloud platform that supports Red Hat OpenShift.
MySQL HeatWave Lakehouse deployment is
limited to OCI and AWS.

• IBM watsonx.data uses Apache Iceberg as an


open table format versus MySQL HeatWave
Competitive objections Lakehouse using open file formats. Open table
formats have a catalog and metadata about files.
IBM and Business Partner – Internal Use Only
Objection handling against Oracle
(2 of 2)

Objection IBM response

MySQL HeatWave Lakehouse is our data • IBM watsonx.data uses Apache Parquet as an
lakehouse solution on OCI. What is are the open table format versus MySQL HeatWave
advantages of adding IBM watsonx.data to our Lakehouse using open file formats. This means
existing lakehouse solution? that the files in cloud object storage do not have
a central catalog and a client must configure OCI
Data Catalog for the catalog component.

• IBM watsonx.data has more deployment options


across multi-cloud and hybrid cloud versus
MySQL HeatWave Lakehouse (AWS and OCI).

• IBM watsonx.data uses open-source Apache


Iceberg as the open table format and Apache
Spark and Ahana Presto as open-source query
Coexistence objections engines. MySQL HeatWave Lakehouse is a
proprietary public cloud only offering by Oracle.
IBM and Business Partner – Internal Use Only
Setting Traps
Setting traps Trap to set/question to ask Trap to set/question to ask
against Databricks Ask the client if they want simple, Ask the client if they want
easy to understand pricing for their data lakehouse SQL
their data lakehouse. workloads to be as performant
as SQL workloads within a
Reason data warehouse.
Databricks measures
consumption and calculates cost Reason
through Databricks (consumption) Although Databricks has improved
Units (DBUs). The cost per DBU their SQL query performance with
their new Photon engine, it will
varies based on different use
still not be as performant as what
cases (Databricks SQL, Databricks
IBM offers with their Netezza and
All Purpose Compute for Db2 Warehouse engines today.
Interactive Workloads, Delta Live IBM watsonx.data provides
Tables (DLT), and others). This multiple engines to provide fit-for-
makes Databricks pricing complex purpose performance for data
and makes it difficult for clients to warehouse workloads.
understand what a Databricks
Data Lakehouse will cost.

IBM and Business Partner – Internal Use Only


Setting traps Trap to set/question to ask Trap to set/question to ask
against Dremio Ask the client if they are Ask the client if their query
considering only AWS to host workload will be restricted to
their fully managed data only SQL for their data lakehouse
lakehouse solution. or whether they will be executing
data engineering and/or
Reason ML/AI workloads.
Dremio does not offer a fully
managed solution on any other Reason
cloud provider other than AWS. Dremio’s query engine is designed
for SQL, it is not designed to
handle ML/AI or data engineering
workloads. IBM watsonx.data can
support SQL queries in addition to
other query types.

IBM and Business Partner – Internal Use Only


Setting traps Trap to set/question to ask Trap to set/question to ask
against Starburst Ask the client if they have Ask the client if they have a
requirements for ML/AI or workload with long-running
data engineering workloads. queries ETL workloads that
they need to execute.
Reason
Starburst’s query engine is Reason
focused strictly on SQL workloads. Starburst was designed for
IBM watsonx.data provides a ad-hoc query workloads and will
Spark engine for efficient not perform well for long-running
processing of ML/AI and query or ETL workloads. IBM
data engineering workloads. watsonx.data has the flexibility
Watsonx.data has multiple to handle different workload
engines to allow clients to choose types without impacting
the correct engine for their lakehouse performance.
intended workload and provide
workload flexibility and
performance optimization that
Starburst cannot deliver.

IBM and Business Partner – Internal Use Only


Setting traps against Trap to set/question to ask Trap to set/question to ask
Amazon Athena Ask the client if the only data Ask the client if they want
source they need to access is predictable lakehouse
contained within Amazon S3 files. query costs.

Reason Reason
Amazon Athena can only directly One of the leading client
access Amazon S3 files. All other complaints about Amazon
data sources require the use of Athena is unpredictable query
Amazon Athena Federated Query costs due to poorly optimized
which utilizes the separate queries and excessive charges
Amazon Lambda service and for long-running.
requires coding to access other
data services. Amazon Lambda
has many restrictions that
potential clients need to
understand prior to making
a lakehouse decision for
Amazon Athena.

IBM and Business Partner – Internal Use Only


Setting traps Trap to set/question to ask Trap to set/question to ask
against Snowflake Ask the client if cost is a Ask the client if they want a
consideration for their data complete data lakehouse solution
lakehouse solution. or just an augmented data
warehouse solution.
Reason
Snowflake’s 2x difference Reason
between scale-up compute Snowflake is primarily a data
configurations and the need warehouse platform. Although
to scale-out to up to 10 virtual they have some pieces of a data
warehouse clusters to handle lakehouse, Snowflake cannot
peak query concurrency access data everywhere.
results in exponential cost Snowflake’s ability to access on-
increases to clients that they premises data is limited. If a
will not experience with IBM client has legacy data sources
watsonx.data (2x to 3x to incorporate in their data
less expensive). lakehouse, Snowflake will be
unable this data without data
migration to Snowflake.

IBM and Business Partner – Internal Use Only


Setting traps against Trap to set/question to ask Trap to set/question to ask
Amazon Redshift Spectrum Ask the client if they understand Ask the client if all data that they
that Amazon Redshift Spectrum need to query that is external to
requires Amazon Redshift data their Amazon Redshift data
warehouse compute resources warehouse is contained within
in addition to Amazon Redshift Amazon S3.
Spectrum compute charges.
Reason
Reason Amazon Redshift Spectrum can
Amazon Redshift Spectrum only query data stored in Amazon
requires an active Amazon Redshift S3 files that have been defined
data warehouse cluster and uses as external tables in an Amazon
compute nodes within Amazon Redshift data warehouse cluster.
Redshift during query processing. No other data can be queried by
The cost of an Amazon Redshift Amazon Redshift Spectrum.
Spectrum query is the cost of the
data scanned in Amazon S3 (per
TB) PLUS the compute charges
incurred during query execution
in Amazon Redshift.

IBM and Business Partner – Internal Use Only


Setting traps against Trap to set/question to ask Trap to set/question to ask
Microsoft OneLake Ask the client if all the data that Ask the client if most data sources
will be contained in the data within the data lakehouse will be
lakehouse exists on Microsoft stored using the Delta Lake open
Azure and Amazon S3 files. table format?

Reason Reason
Microsoft OneLake can only access Microsoft OneLake can access
data files on Microsoft Azure or tables stored in Azure Data Lake
files contained within Amazon S3 Storage (ADLS) or Amazon S3
files on Amazon Web Services storage that is stored in Delta
(AWS). If a client has other data Parquet format. Only Databricks
requirements, Microsoft OneLake and Microsoft ADLS use the Delta
cannot meet their requirements. Parquet format. More data within
IBM watsonx.data allows data AWS uses Apache Iceberg
access across a hybrid and multi- Parquet including Amazon
cloud environment. Athena, Dremio, Starburst, and
IBM watsonx.data.

IBM and Business Partner – Internal Use Only


Setting traps against Trap to set/question to ask Trap to set/question to ask
Teradata VantageCloud Ask the client if the largest Ask the client if their future data
expense component in their data lakehouse platform will be
warehouse cost is storage or is it exclusively public cloud or will
compute. they still have a need for a hybrid
cloud environment?
Reason
The fact that Teradata Reason
VantageCloud Lake can use cloud Teradata VantageCloud Lake is
object storage for table storage only available on the public cloud
only reduces client cost at the (AWS as of June 2023). Teradata
storage layer of the data lake. The is moving away from on-premises
compute cost for Teradata environments and will not be
VantageCloud will continue to be supported by future product
high for clients. Teradata offers no releases.
cost optimization opportunities for
client compute charges.

IBM and Business Partner – Internal Use Only


Setting traps against Trap to set/question to ask Trap to set/question to ask
Google Cloud Platform (GCP) Ask the client if they want a data Ask the client if their data
lakehouse solution or if they want a lakehouse environment will span
data lakehouse architecture that multiple clouds or require a hybrid
they must implement themselves cloud deployment option.
using individual components?
Reason
Reason GCP solutions run on the Google
Google Cloud does not offer a Cloud with limited multi-cloud
single data lakehouse offering. A capabilities and no hybrid cloud
client is forced to choose the capability. If a client has other
various applicable services and put data sources to include in a data
them together to build their data lakehouse, then GCP is not a good
lakehouse solution. Databricks and choice as a data lakehouse
Starburst offer data lakehouse platform.
solutions on GCP.

IBM and Business Partner – Internal Use Only


Setting traps against Oracle Trap to set/question to ask Trap to set/question to ask
Ask the client if their data Ask the client if they want a data
lakehouse will be restricted to lakehouse solution where the
Oracle Cloud Infrastructure (OCI) query engine is controlled by a
and Amazon Web Services (AWS) single company (Oracle).
or will they need to access external
data assets. Reason
MySQL is an open-source
Reason database, but MySQL HeatWave
MySQL HeatWave Lakehouse, Lakehouse is a proprietary Oracle
which is Oracle’s primary data lakehouse offering that is only
lakehouse offering is available only available in the public cloud on
on OCI and AWS. If other data OCI and AWS.
platforms are required, MySQL
HeatWave Lakehouse is not the
correct choice for the data
lakehouse solution.

IBM and Business Partner – Internal Use Only

You might also like