Professional Documents
Culture Documents
06 - IBM Watsonx - Data Competitive Insights
06 - IBM Watsonx - Data Competitive Insights
data
Competitive Insights
Content by:
Danny Arnold
Principal, Learning Content Development | Data & AI
darnold@us.ibm.com
Presenter:
Farah Auni Hisham
Technical Enablement Specialist | Data & AI
farah.hisham@ibm.com
Seller guidance Slides in this presentation marked as "IBM
and Business Partner Internal Use Only" are
for IBM and Business Partner use and
References in this presentation to IBM
products, programs, or services do not
imply that they will be available in all
disclaimer
at any time at IBM’s sole discretion based
© IBM Corporation 2023. on market opportunities or other factors
All Rights Reserved. and are not intended to be a commitment to
future product or feature availability in any
The information contained in this way. Nothing contained in these materials is
publication is provided for informational intended to, nor shall have
purposes only. While efforts were made the effect of, stating or implying that any
to verify the completeness and accuracy activities undertaken by you will result
of the information contained in this in any specific sales, revenue growth,
IBM and Business Partner publication, it is provided AS IS without or other results.
warranty of any kind, express or implied.
Internal Use Only In addition, this information is based on All client examples described are presented
IBM’s current product plans and strategy, as illustrations of how those clients have
which are subject to change by IBM without used IBM products and the results they may
notice. IBM shall not be responsible for any have achieved. Actual environmental costs
damages arising out of the use of, or and performance characteristics may vary
otherwise related to, this publication or any by client.
other materials. Nothing contained in this
publication is intended to, nor shall have All statements in this report attributable to
the effect of, creating any warranties or Gartner represent IBM’s interpretation of
representations from IBM or its suppliers or data, research opinion or viewpoints
licensors, or altering the terms and published as part of a syndicated
conditions of the applicable license subscription service by Gartner, Inc., and
agreement governing the use of IBM have not been reviewed by Gartner. Each
software. Gartner publication speaks as of its original
publication date (and not as of the date of
this presentation). The opinions
expressed in Gartner publications are not
representations of fact and are subject to
change without notice.”
watsonx.data •1 Types of competitors
• Primary competitors
Competitive Insights • Background
Agenda • Key strengths and weaknesses
• Summary
• Secondary competitors
• Background
• Key strengths and weaknesses
• Summary
• Competitive positioning
• Watsonx.data differentiators
• Latest Updates
• Objection handling
• Setting traps for competitors
watsonx.data Competitors
Databricks Databricks pioneered the term “Lakehouse”. It’s currently positioned as a leader in
the emerging Lakehouse market. Founded in 2013 by the creators of Apache Spark,
Starburst Starburst has made a name for itself in the data access and analytics space. It is
positioned as a leader in the enterprise data access market, with a growing market
share. Founded in 2017, Starburst offers a cloud-native platform that enables fast
and easy access to data across a range of sources.
Amazon Athena Amazon Athena was first released in 2016 and is the AWS data lakehouse offering that
All competitors are utilizes Apache Spark for analytics on data in open file formats and the Trino engine for
relatively new companies SQL queries. Amazon Athena combines with other AWS services, like AWS Lake
(within the past decade) Formation, for data governance to build a complete lakehouse solution.
and are rapidly growing in
the public cloud market Snowflake Founded in 2012, Snowflake has made significant strides in the cloud data
warehousing market and is currently positioned as a leader in the cloud data platform
space. Snowflake supports open table formats but locks clients into the Snowflake
environment that is locked and controlled (not open-source based). It has a single
SQL query engine and a limited ability to access data outside of a Snowflake data
Others
**Augmented data
Amazon Redshift Spectrum Amazon Redshift Spectrum is a Redshift service that allows direct queries on data
warehouse competitor stored in Amazon S3 files without having to load the data into an Amazon Redshift data
warehouse. Amazon Redshift Spectrum requires an active Amazon Redshift data
warehouse cluster to execute queries, so it is tightly integrated with Amazon Redshift
and extends the data warehouse to access external tables in Amazon S3.
• It’s built on open-source Delta Lake and Delta Sharing components, but Databricks
has proprietary extensions to move clients to Databricks versus open-source.
Dremio • Based on community driven standards such as Apache Arrow, Apache Iceberg,
and Apache Parquet.
• Has the Dremio Sonar query engine to assist in providing self-service analytics
along with a shared semantic layer for governed, self-service data access to
provide a consistent view of the data along with transparent query acceleration.
Background • Starburst has connectors to many data sources including data lakes (Hadoop HDFS,
Ceph, MinIO, and Dell/EMC ECS), data warehouses (Teradata, Oracle Exadata, IBM
(2 of 3) Netezza Performance Server (NPS)), message queues/NoSQL (Apache Kafka,
MongoDB, and Elastic), and current and legacy databases (MySQL, PostgreSQL,
IBM Db2, Microsoft SQL Server, and Oracle Database).
Amazon Athena • Amazon Athena includes both Apache Spark and the AWS Trino engines and
is central to the AWS data lakehouse strategy. Combined with AWS Glue as the AWS
serverless data integration service, AWS provides a strong data lakehouse offering.
Competitors • Snowflake provides many data sets and applications from partners in the
Snowflake Marketplace. However, rather than providing data sets for usage,
Snowflake includes the data sets as part of a Snowflake data warehouse, driving
Background additional data warehouse usage and a larger number of deployed data warehouses.
(3 of 3) • Snowflake is moving into transactional data support with their Unistore hybrid table
offering (in private preview as of September 2023) to be able to accommodate more
types of client workloads.
Amazon Redshift Spectrum • Amazon Redshift Spectrum is a component of the Amazon Redshift data warehouse
offering.
• Redshift Spectrum provides the ability for a client to query data within Amazon S3 files
directly, without having to move the data into Amazon Redshift.
• Query processing charges within Amazon Redshift Spectrum are based on the
amount of data processed by the query (not the query result).
• Amazon Redshift Spectrum by itself is not a data lakehouse solution as it can only
work with data files within Amazon S3.
Other Competitors
IBM and Business Partner – Internal Use Only
Primary Competitors:
Strengths & weaknesses
Details
Deployment options Public cloud only Public cloud & on-premises Public cloud & on-premises AWS only
Query engines • Apache Spark • Dremio Sonar (proprietary) • Starburst (Trino-based) • Apache Spark
• Photon • Amazon
(Trino-based)
Open table format support • Delta Lake • Apache Iceberg • Apache Iceberg • Apache Iceberg
• Delta Lake (Parquet only)
Details
Amazon Amazon
Athena Redshift
Spectrum Dremio
Worst Best
Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Primary competitors at a glance
Open-source based
Dremio Amazon
Athena
Amazon
Redshift
Spectrum
Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Primary competitors at a glance
Data governance
Dremio
Amazon Amazon
Redshift Athena
Spectrum
Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Primary competitors at a glance
Market presence
Less More
Data Fabric
Object Storage
• Databricks
• Starburst
Worst Best
Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Secondary competitors at a glance
Open-source based
Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Secondary competitors at a glance
Data governance
Worst Best
IBM and Business Partner – Internal Use Only
watsonx.data
Secondary competitors at a glance
Market presence
Less More
• No other data lakehouse offering has integrated • Other data lakehouse competitors do NOT have
data warehouse engines in addition to the the level of experience with mission critical
Apache Spark and open-source query engines applications, and level of research in query
optimization and query processing, as IBM
• The cloud hyperscalers (AWS, Microsoft Azure,
and GCP) along with Databricks provide no • Watsonx.data plus other IBM data sources
hybrid cloud deployment capability (Netezza Performance Server and Db2) deliver
a query performance spectrum not offered by
• Deployment flexibility in other clouds – no other data lakehouse competitors
other data lakehouse offering can be deployed
as easily across different cloud platforms • Watsonx.data and its selection of Apache
Iceberg and Presto delivers an open solution
versus a single contributor open-source lock-in
Competitive objections
IBM and Business Partner – Internal Use Only
Competitor Background information
watsonx.data
Databricks • Databricks has partnered with AWS, Microsoft Azure, and Google Cloud Platform,
but has special optimizations with Microsoft Azure, including tight integrations
with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics,
Competitors Power BI and other Azure services.
• It’s built on open-source Delta Lake and Delta Sharing components, but Databricks
has proprietary extensions to move clients to Databricks versus open-source.
Dremio • Based on community driven standards such as Apache Arrow, Apache Iceberg,
and Apache Parquet.
• Has the Dremio Sonar query engine to assist in providing self-service analytics
along with a shared semantic layer for governed, self-service data access to
provide a consistent view of the data along with transparent query acceleration.
• No other data lakehouse offering has integrated • Other data lakehouse competitors do NOT have
data warehouse engines in addition to the the level of experience with mission critical
Apache Spark and open-source query engines applications, and level of research in query
optimization and query processing, as IBM
• The cloud hyperscalers (AWS, Microsoft Azure,
and GCP) along with Databricks provide no • Watsonx.data plus other IBM data sources
hybrid cloud deployment capability (Netezza Performance Server and Db2) deliver
a query performance spectrum not offered by
• Deployment flexibility in other clouds – no other data lakehouse competitors
other data lakehouse offering can be deployed
as easily across different cloud platforms • Watsonx.data and its selection of Apache
Iceberg and Presto delivers an open solution
versus a single contributor open-source lock-in
Databricks has partnerships with all three major • IBM watsonx.data can be deployed on any cloud
cloud vendors (AWS, Microsoft Azure, and GCP). provider that supports Red Hat OpenShift or
IBM watsonx.data is not available on all clouds within a private cloud or on-premises
as a fully managed service, and it is important to environment as a self managed solution.
have the flexibility to choose any of these
cloud providers. • Although IBM watsonx.data is not available on
all three cloud vendors (AWS, Microsoft Azure,
and GCP) as a fully managed service,
watsonx.data can be deployed on all three cloud
providers and IBM will provide that option in the
future as client demand dictates.
Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling
Reason Reason
Reason Reason
Databricks measures
consumption and calculates cost
through Databricks (consumption)
Units (DBUs). The cost per DBU
varies based on different use
cases (Databricks SQL, Databricks
All Purpose Compute for
Interactive Workloads, Delta Live
Tables (DLT), and others). This
makes Databricks pricing complex
and makes it difficult for clients to
understand what a Databricks
Data Lakehouse will cost.
• No other data lakehouse offering has integrated • Other data lakehouse competitors do NOT have
data warehouse engines in addition to the the level of experience with mission critical
Apache Spark and open-source query engines applications, and level of research in query
optimization and query processing, as IBM
• The cloud hyperscalers (AWS, Microsoft Azure,
and GCP) along with Databricks provide no • Watsonx.data plus other IBM data sources
hybrid cloud deployment capability (Netezza Performance Server and Db2) deliver
a query performance spectrum not offered by
• Deployment flexibility in other clouds – no other data lakehouse competitors
other data lakehouse offering can be deployed
as easily across different cloud platforms • Watsonx.data and its selection of Apache
Iceberg and Presto delivers an open solution
versus a single contributor open-source lock-in
2 Warehouses can
access data in
the lakehouse
Db2
Db2W Netezza Spark Presto
z/OS 3 The lakehouse
1
can access data
residing in
5 Db2/Netezza
4 Easily Promote
Watsonx.data IBM Knowledge
6 data between
Metadata Store Catalog
the warehouse
and lakehouse
RECAP
Setting traps
• Blog entry on Amazon Athena Explained: What is it and When Should I Use it?
• Blog entry on Amazon Redshift Spectrum and how it works
• Blog entry on Exploring AWS Lambda Deployment Limits
• Blog entry on What’s the Difference between Trino and PrestoDB?
© 2023 International Business Machines Corporation
Thank you
IBM and the IBM logo are trademarks of IBM
Corporation, registered in many jurisdictions
worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list
of IBM trademarks is available on ibm.com/trademark.
(2 of 6)
Has a proprietary Sonar Star schema design and table
query engine that seems to joins are not optimal, clients
deliver better performance may need to re-write queries
than open-source query and re-design table structures
engines
Requires a running
Infrequently accessed data Amazon Redshift data
can remain in Amazon S3 warehouse cluster to execute
Redshift Spectrum queries
Other Competitors
IBM and Business Partner – Internal Use Only
Secondary Competitors Strength & Weaknesses
Strengths & weaknesses
Supports different
Tables must reside in Delta
query engines to allow
Parquet files
compute choice based
on query requirements
• Lakehouse federation capabilities in Unity Catalog to • Unity Catalog update is in private preview
support external data sources • Limited number of public cloud data warehouses
and databases included but NO hybrid cloud
• Delta Lake 3.0 with Universal Format (UniForm) and
Liquid Clustering • Delta Lake UniForm is in public preview and Liquid
Clustering is announced only (no preview)
• UniForm allows Delta Lake to read Apache Iceberg
tables by creating Delta Lake metadata for the table
• Liquid Clustering adjusts the data layout in a Delta
Lake table based on partitioning key selection to
improve query performance
• Single Iceberg table type • Snowflake wants clients to use Snowflake catalog for
Iceberg tables
• Two catalog management options • Delivers better query performance
• External (Iceberg) catalog • Locks client into Snowflake query engine
• Managed by Snowflake
• Snowflake claims “catalog integration” if using
• Performance implications for catalog management external Iceberg catalog
• Currently only supports AWS Glue Catalog with
catalog integration
• Can only define Iceberg tables as External Tables for
production Snowflake use as of October 2023
• Can access data in open file formats like Apache • No hybrid cloud deployment option, only AWS and OCI
Parquet, CSV, and others are the only supported cloud platforms
• Can access database export files from Oracle • Internal engine processing uses a proprietary format
Database, MySQL, Amazon Aurora, and Amazon
Redshift • Released a non-certified TPC-H benchmark result
• Amazon Redshift Serverless or Amazon Redshift • Very limited capability in using Iceberg tables from
Spectrum can now query Apache Iceberg tables Amazon Redshift
• Iceberg table access is read-only • Requires Amazon Athena or Amazon EMR to write to
Iceberg tables
• Iceberg table must be cataloged in the AWS
Glue Data Catalog • Not a lakehouse solution and uses proprietary
components within Amazon Redshift
• Iceberg table access is primarily restricted to
tables located within AWS • No time travel or data sharing capabilities within
Amazon Redshift using Iceberg tables.
IN PREVIEW MODE ONLY – August 2023 For more information on Amazon
(non-production use) Redshift Iceberg table support, see this
write-up on Seismic
August 2023 announcements
Objection Handling
Objection handling against Databricks
Databricks has partnerships with all three major • IBM watsonx.data can be deployed on any cloud
cloud vendors (AWS, Microsoft Azure, and GCP). provider that supports Red Hat OpenShift or
IBM watsonx.data is not available on all clouds within a private cloud or on-premises
as a fully managed service, and it is important to environment as a self managed solution.
have the flexibility to choose any of these
cloud providers. • Although IBM watsonx.data is not available on
all three cloud vendors (AWS, Microsoft Azure,
and GCP) as a fully managed service,
watsonx.data can be deployed on all three cloud
providers and IBM will provide that option in the
future as client demand dictates.
Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling against Dremio
Dremio provides a simple, fast, and cost-effective • Performance and reliability, Dremio has
platform for analytics, ML, and other data-driven a history of issues with new releases.
applications. What advantages does IBM Watsonx.data is based on open-source Apache
watsonx.data have over Dremio? Spark and Presto which are well-tested and
proven technologies. IBM has a long history
with query engines and query optimization
and will incorporate this knowledge into
watsonx.data to continually improve
performance and query efficiency.
Competitive objections
IBM and Business Partner – Internal Use Only
Objection handling against Starburst
Objection IBM response
Starburst provides a query engine that can • IBM watsonx.data provides multiple query
efficiently process an analytic workload across engines: Apache Spark and Presto are the
many different data sources. What advantages query engines currently available (along with
does IBM watsonx.data provide over Starburst? the optional Db2 and Netezza Performance
Server (NPS) specialized data warehouse
engines) versus the single primarily SQL
query engine within Starburst.
Amazon Athena has an Apache Spark query • While it is true that both Amazon Athena and
engine and a Trino SQL query engine and IBM IBM watsonx.data provide multiple engine
watsonx.data uses an Apache Spark query support, IBM watsonx.data has an advantage
engine and a Presto query engine. What are the in the number and location of data sources
advantages of IBM watsonx.data over supported.
Amazon Athena?
• IBM watsonx.data has data connectors to many
different data sources across hybrid cloud.
Amazon Athena is designed for Amazon S3 file
access primarily. All other data sources require
the use of Amazon Lambda and coding through
Amazon Athena Federated Query.
Amazon Athena is well-integrated within the AWS • IBM watsonx.data can provide data lakehouse
ecosystem and AWS is our chosen cloud provider. capabilities for data sources outside of AWS
What is the benefit of adding watsonx.data to our and provides data connectors to many data
product portfolio? sources versus having to utilize Amazon
Lambda and coding for any data source that
is not Amazon S3.
Snowflake has been selected as our cloud data • Snowflake is a cloud data warehouse and in
platform and we don’t see why we need IBM order to access data, you must do everything
watsonx.data as a data lakehouse platform. within the Snowflake environment and
What would be the advantage of adding ecosystem. Clients are successful only if they
watsonx.data to our environment? can move all data into Snowflake or access all
data through a Snowflake virtual warehouse.
Amazon S3 files are the only source of external • IBM watsonx.data allows you to access
data we need to access, and we already use Amazon S3 files as well as many other data
Amazon Redshift as our data warehouse. Why sources across a hybrid cloud without the
should we choose IBM watsonx.data as our requirement for a companion data warehouse
data lakehouse platform? cluster as requirements change over time.
Amazon Redshift Spectrum is serverless and • Adding IBM watsonx.data future proofs the data
allows us to query Amazon S3 data without lakehouse solution by allowing all types of data
loading the data into Amazon Redshift. How sources to be supported within the lakehouse.
does it benefit our organization to add IBM
watsonx.data as a data lakehouse platform? • Watsonx.data has no dependencies on other
services to deliver query results for the data
lakehouse with no competition for compute
resources with your data warehouse workload.
Coexistence objections
IBM and Business Partner – Internal Use Only
Objection handling against Microsoft OneLake
(1 of 2)
Microsoft OneLake is part of Microsoft Fabric and • Microsoft OneLake works with Azure Data Lake
we have selected Fabric as our data fabric Storage (ADLS) only and is not a data lakehouse
solution within Microsoft Azure. solution outside of Microsoft Azure and S3 files
within AWS.
Microsoft OneLake and Microsoft Fabric provides • IBM watsonx.data supports multiple public
us the data lakehouse solution that we need today clouds and private clouds/on-premises
on Microsoft Azure. What is the advantage of deployments which Microsoft OneLake does not
adding IBM watsonx.data to our existing solution? offer as an option.
Teradata VantageCloud Lake provides the data • IBM watsonx.data uses the Apache Iceberg
lake environment that our organization requires open table format, VantageCloud Lake provides
and is multi-cloud. the ability to use cloud object storage.
Teradata VantageCloud Lake allows us to use less • The use of cloud object storage reduces the data
expensive cloud object storage, what advantage storage cost only, IBM watsonx.data provides
does IBM watsonx.data provide to justify adding it multiple open-source query engines and
to our environment? specialized data warehouse query engines to
allow clients to cost optimize their query
workloads based on client performance needs.
Google Cloud is our chosen public cloud platform • IBM watsonx.data is a data lakehouse solution
and provides the data lakehouse architecture that versus an architecture that a client must build
meets our requirements. Why should we consider themselves using components.
IBM watsonx.data as a data lakehouse solution?
• IBM watsonx.data supports both open-source
query engines and specialized data warehouse
engines versus the single proprietary SQL query
engine (Google BigQuery).
Google Cloud is our data lakehouse platform, what • IBM watsonx.data extends your current data
is the advantage of adding IBM watsonx.data to lakehouse to multi-cloud and hybrid cloud to
our existing data lakehouse environment? include data assets on other clouds and even
IBM Db2 on Z data assets.
Oracle provides MySQL HeatWave Lakehouse on • IBM watsonx.data has many open-source
both Oracle Cloud Infrastructure (OCI) and components to provide flexibility and prevent
Amazon Web Services (AWS). What advantages lock-in to the IBM solution for clients. MySQL
does IBM watsonx.data have over Oracle? HeatWave Lakehouse is a proprietary Oracle
public cloud only offering.
MySQL HeatWave Lakehouse is our data • IBM watsonx.data uses Apache Parquet as an
lakehouse solution on OCI. What is are the open table format versus MySQL HeatWave
advantages of adding IBM watsonx.data to our Lakehouse using open file formats. This means
existing lakehouse solution? that the files in cloud object storage do not have
a central catalog and a client must configure OCI
Data Catalog for the catalog component.
Reason Reason
Amazon Athena can only directly One of the leading client
access Amazon S3 files. All other complaints about Amazon
data sources require the use of Athena is unpredictable query
Amazon Athena Federated Query costs due to poorly optimized
which utilizes the separate queries and excessive charges
Amazon Lambda service and for long-running.
requires coding to access other
data services. Amazon Lambda
has many restrictions that
potential clients need to
understand prior to making
a lakehouse decision for
Amazon Athena.
Reason Reason
Microsoft OneLake can only access Microsoft OneLake can access
data files on Microsoft Azure or tables stored in Azure Data Lake
files contained within Amazon S3 Storage (ADLS) or Amazon S3
files on Amazon Web Services storage that is stored in Delta
(AWS). If a client has other data Parquet format. Only Databricks
requirements, Microsoft OneLake and Microsoft ADLS use the Delta
cannot meet their requirements. Parquet format. More data within
IBM watsonx.data allows data AWS uses Apache Iceberg
access across a hybrid and multi- Parquet including Amazon
cloud environment. Athena, Dremio, Starburst, and
IBM watsonx.data.