Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Licensed for Distribution

Cool Vendors in Data Management


Published 7 May 2020 - ID G00721393 - 21 min read
By Analysts Robert Thanaraj, Julian Sun, Eric Thoo, Ehtisham Zaidi, Eric Hunter

Augmented capabilities are becoming the major differentiators for today’s data management
solutions. These Cool Vendors offer data and analytics leaders ways to connect, ingest, analyze
and share data more quickly and at a lower cost.

Overview
Key Findings
■ Gartner’s Data and Analytics Adoption Survey reveals that organizations with relatively high levels
of data and analytics maturity (those that claim to be at “enterprise” or “transformative” levels of
maturity) list integrating multiple data sources (38%) and adding more agility (35%) as their top
internal challenges.

■ Data and analytics leaders responsible for data management are under pressure to deliver
projects faster and at a lower cost. This has led to a rise in demand for augmented data
management in various offerings like active metadata, artificial intelligence (AI)/machine learning
(ML) algorithms and data fabric designs utilizing semantic knowledge graphs.

Recommendations
For data and analytics leaders focused on delivering faster results by using augmented data
management capabilities:

■ Start with a business case for introducing augmented capabilities into your data management
estate by connecting the potential benefits to business outcomes. Select use cases that have
struggled to deliver timely value that would benefit from the increased efficiency achievable
through augmented capabilities.

■ Reevaluate your data management products by making augmented capabilities a must-have


selection criterion for new purchases and renewals. Make further investments only in those
vendors that exhibit a realistic AI/ML augmentation roadmap for existing and upcoming products.

■ Test the augmented capabilities of data management products and the validity of the automated
functionality. Audit the results of any new functionality because there is the risk of introducing
errors and reduced performance, leading to dissatisfaction of augmented capabilities among
business users.

Strategic Planning Assumptions


■ Through 2022, manual data management tasks will be reduced by 45% through the addition of
machine learning and automated service-level management.

■ Through 2022, the application of graph processing and graph databases will grow at 100%
annually to accelerate data preparation and integration, and enable more adaptive data science.

Analysis
This research does not constitute an exhaustive list of vendors in any given technology area, but rather
is designed to highlight interesting, new and innovative vendors, products and services. Gartner
disclaims all warranties, express or implied, with respect to this research, including any warranties of
merchantability or fitness for a particular purpose.

What You Need to Know


As data management solutions mature, we are starting to see a shift in buyer focus from “how data
is retained and controlled” to “how data is used and accessed,” especially in the cloud. This is an era
in which humans and machines (the AI/ML engines) work as teammates and partners across the
flow of data within the enterprise — the era of augmented data management. It’s the era of
harnessing human creativity augmented by machines.

Augmented data management refers to the application of AI/ML for optimization and improved
operations. For example, using the existing usage and workload data, an augmented engine can tune
operations and optimize configuration, security and performance. Augmented data management
products can examine large samples of operational data, including actual queries, performance data,
schema and metadata. These solutions can not only tune and optimize the use of products
themselves based on actual usage, including failures and poor performance, but also suggest and
implement new designs, schemas and queries. Augmented data management has high benefits;
however, it will take two to five years for mainstream adoption (see “Hype Cycle for Data
Management, 2019”).

As AI/ML technology matures and becomes more widely adopted, many data management vendors
are starting to leverage augmentation to drive better automation in areas that have traditionally relied
on intensive manual input; tasks like data integration, data preparation, data quality, data cataloging
and optimizing the operations of database management systems. AI/ML enabled solutions are
becoming the primary differentiator among the mainstream data management vendors — with a goal
to make mundane tasks much more automated, transparent and efficient to business users.
Here are several data management tool categories in which data and analytics leaders should be
investing to stay relevant in the new era of augmented data management:

■ Active-metadata-enriched AI/ML algorithms learn over time and make more accurate predictions
and decisions regarding key aspects of data management and integration. These include
autocorrection of schema drifts, autointegration of next-best data sources, and automanagement
of workloads. A data fabric can deliver a fully automated data orchestration by integrating data
across various data sources augmented by AI/ML algorithms using active metadata (see “Data
Fabrics Add Augmented Intelligence to Modernize Your Data Integration”).

■ A data preparation tool with augmented capabilities can detect schemas by profiling, cataloging
metadata and recommending enrichment, thus improving the data discovery efforts by humans
(see “Market Guide for Data Preparation Tools”).

■ A data catalog with augmented capabilities can deliver a high degree of automation in data
profiling, derivation of lineage, field-level tagging and data quality detection based on rules and
thresholds, thus improving the data inventorying efforts by humans (see “Augmented Data
Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders”).

■ A database with augmented capabilities like automated elasticity, automated data sharding and
self-optimization eliminates the need for manual tuning and optimization efforts. However, it does
not eliminate expert involvement in schema and application design, nor the need to write efficient
code. Likewise, automating a data warehouse development can be a huge cost and time saver
(see “Automating Data Warehouse Development”).

■ A data quality tool with augmented capabilities can simplify the core data quality tasks like
matching, linking and cleansing with a higher level of accuracy than humans (see “Critical
Capabilities for Data Quality Tools”).

■ A master data management (MDM) platform with augmented capabilities can learn from the
manual data stewardship tasks performed by humans to provide added levels of machine-driven
automation around future governance and stewardship decisions (see “Critical Capabilities for
Master Data Management Solutions”).

■ A knowledge graph tells us what the connections across various data assets mean. A graph
makes visible those connections we didn’t think existed. When we see those connections, we have
those “aha-ha” moments of insight and discovery — “That’s why this happened when that
happened” — recognizing connections across datasets that always existed, but we never saw
those relationships (see “An Introduction to Graph Data Stores and Applicable Use Cases”).

According to Gartner’s 2019 Data Management Strategy Survey, 1 data management teams spend
most of their time on data preparation and data integration (36% collectively). Data and analytics
leaders are under pressure to connect, ingest, analyze and share data more quickly, and at a lower
cost. By introducing augmented capabilities to their data management solutions, these challenges
can be addressed over time. However, only a few traditional vendors have credible augmentation
strategies as part of their product roadmap. The selected vendors in this research are primarily
examples of tools that will serve as potential contributors in this new era of augmented data
management. They specifically represent only the beginning of offerings that will have to evolve on a
continuous path for this new demand.

Cinchy
Toronto, Ontario, Canada ( www.cinchy.com)

Analysis by Eric Hunter

Why Cool:

Organizations have numerous applications — each with their own data silos. As new applications
emerge within the enterprise, established data silos are reinforced and new silos are introduced.
These silos lack integration and create data duplication that drives complexity for downstream data
and analytics use cases, which typically means significant data integration costs and data
management complexity. With data at the core of digital business, Cinchy seeks to reimagine
applications with data as the focus (in the form of a data fabric) for realizing outcomes in the
modern enterprise.

In their own words, Cinchy seeks to “make data the application” and promote the collaboration
around data in a manner that moves beyond traditional application development and analytics
paradigms. Cinchy’s approach incorporates data integration, data management, access controls,
data governance, data catalog, graph-based visualization and packaged business capabilities (PBCs)
within a single platform.

The implementation of Cinchy is rather straightforward — but certainly departs from tradition. Once
implemented, Cinchy provides a central data fabric over which an enterprise can develop application-
like experiences and incorporate incremental data assets and domains. With data at the heart of
Cinchy, users are able to navigate their cataloged enterprise data assets via a rich temporal graph-
based interface and catalog that includes a timeline slider to view data changes over time. This is
complemented with Cinchy-resident application experiences that are enabled through enterprise-
created user interface “skins” to provide full read/write capabilities and support of business rules
over a single instance of data for the enterprise. This scenario has implications for both application
and data and analytics leaders. Internally developed applications are able to improve time to market
through use of Cinchy-resident skins where this paradigm suffices for the demands of the given
application. From an analytics perspective, incremental applications and resulting data attributes can
support organizational analytic demands without the typical data integration costs.
While Cinchy desires to reimagine the role of stand-alone applications in the enterprise, it does not
overlook the need to integrate with such traditional applications. Enterprises leverage Cinchy’s
change data capture (CDC) capabilities for the acquisition of data from existing applications (query-
and event-based CDC are supported). This data is captured and retained within Cinchy’s data fabric
for users to access via skins or explore via queries using a streamlined SQL interface. All queries
written in Cinchy automatically create a REST-based API that can be leveraged for third-party access
from external systems and applications.

Behind the curtain, Cinchy leverages a relational database (RDBMS) as its core data management
technology. It currently supports Microsoft SQL Server with plans for additional platforms in the
future. Cinchy has abstracted the underlying database to provide unique capabilities such as granular
access controls and data versioning.

Challenges:

While Cinchy’s claim as “the end of applications” is a bit far reaching, it provides a unique approach in
the market for data and applications. Because its approach departs from so many accepted industry
norms, Cinchy will need to invest in its education of prospective buyers and the market overall —
status quo will be a key inhibitor to overcome.

For the integration of data from applications outside of Cinchy’s data fabric, one-time and/or
incremental data movement is required, as Cinchy does not provide data virtualization capabilities.
Cinchy promotes agility and time-to-market improvements for data and applications onboarded to its
data fabric. However, should organizations seek to migrate applications and associated data away
from Cinchy, the level of effort will be much greater relative to the time and investment to onboard
applications.

As organizations maintain a diverse portfolio of investments in platforms and technologies, Cinchy


will likely find themselves among other applications versus being “the end of applications” for most
clients. This will require Cinchy to also invest in its partnership ecosystem to ensure that it’s
positioned and operates in the most efficient manner within this diverse technology landscape.

Who Should Care:

Data and analytics leaders prioritizing reduced time to market and improved
business agility within fast changing and increasingly digital environments
can consider Cinchy as a platform to deliver critical business outcomes.
Cinchy is primarily focused on customer acquisition in the finance and banking vertical today.
However, cross-industry technology leaders focusing on application delivery or data and analytics
will find Cinchy’s offerings to be a unique departure from long-held methods and paradigms in
relation to data integration/data management, as well as application development and delivery.
Cinchy’s data fabric and surrounding capabilities provide a refreshing take on long-held development
norms referenced by the paradigms noted above. While its underlying technology innovations are
impactful, Cinchy also empowers CIOs to bridge organizational gaps between application, analytics
and business-resident teams in support of business outcomes.

CluedIn
Copenhagen, Denmark ( www.cluedin.net)

Analysis by Ehtisham Zaidi

Why Cool:

Data management teams are under constant pressure to provide faster access to integrated data
across increasingly distributed landscapes. In fact, data integration, data ingestion and data
preparation remain prime candidates for automation in 2020. 2 The CluedIn platform looks to
simplify data integration by weaving the common pillars of data management into a consistent data
fabric architecture. CluedIn provides solutions to common data management and integration
challenges. These include integrating data from large and complex applications, automating data
preparation, cataloging this data for the business and then making this data easy to share throughout
the business in a secure manner.

CluedIn is cool because it takes a differentiated approach to data integration by accelerating the
process of ingesting, preparing and sharing data from various applications and data silos into a
unified data hub of ready-to-use data. Instead of always necessitating upfront ETL-based data
modeling, where schema needs to be assigned upfront by developers — even when they are not yet
sure about the requirements (leading to models that the business never agrees on or uses) — CluedIn
relies on the concept of “eventual connectivity.” Eventual connectivity works on the principle of
allowing for flexible data integration where schema assignment and data modeling are deferred to
when it’s more suitable — that is, when requirements are more stable, and business users can work
with developers to assign schema and resolve semantics for their specific use cases.

This promise of eventual connectivity is achieved through the underlying technology of CluedIn,
which is a flexible deployment of a polyglot persistence layer of relational and graph databases that
process and persist multistructured data from heterogeneous operational source systems. CluedIn
will then facilitate the tools and processes needed to bring this data to a level of data maturity that
can be consumed by the business with confidence.
CluedIn provides an integration pattern that utilizes a schemaless graph to automatically connect
and integrate records across systems, instead of tedious and complex ETL designs that require
developers and architects to find the connections between systems themselves. CluedIn also
supports ELT to ingest data without the need for upfront transformation and will then provide
embedded tools to address data cataloging, data quality, data preparation and data transformation
for further processing of data. Finally, CluedIn utilizes a combination of batch, streaming (through
CDC) and virtual (through data virtualization) technologies to deliver this integrated data for
downstream consumption for various analytics and operational requirements.

Challenges:

CluedIn’s market messaging often gets quite technical and it struggles to connect the value of data
fabric with business outcomes. CluedIn has challenges around market education of the data fabric
architecture (see “Data Fabrics Add Augmented Intelligence to Modernize Your Data Integration”).
Many companies are simply not aware that data fabric architectures can be adopted to accelerate
new projects that require flexible data integration prior to creating yet another data silo. A related
challenge here is that end users evaluate data fabric enabling technologies like CluedIn only when
they reach the very limit in terms of scalability and performance of their current data integration
technologies such as ETL/ELT.

CluedIn faces challenges that are otherwise common to relatively small vendors in the data
integration space. With a workforce of less than 50 employees and presence in only a few countries
(particularly in Europe), CluedIn has limited resources and market visibility for making its offering
known to a wide set of potential buyers with augmented data management needs. It will need
significant investment in sales and product marketing areas to educate the mainstream market
about the value of data fabric architectures, which is an area of limited experience for conventional
and established IT centric integration teams. CluedIn will need to invest significantly to expand its
partner ecosystem to include value added resellers, system integrators and global consulting
partners to expand its reach and footprint.

Who Should Care:

Data and analytics leaders who are looking to streamline their complex and
distributed data integration tool portfolios for driving data integration
augmentation.

Data and analytics leaders across large enterprises and government agencies with legacy and
complex system landscapes should pay attention to CluedIn. This vendor could benefit large data
integration challenges where there is a need to integrate and consolidate data from many systems,
and where IT teams comprising architects, developers and data engineers are struggling with
productivity challenges and need some level of integration automation support.

CluedIn should also be of interest to organizations with extremely complex integration requirements,
that urgently need graph-based integration and modeling to preserve complex relationships and
context in data, and those looking for semantics to be assigned by business teams (instead of IT).

Inzata
Tampa, Florida, U.S. ( www.inzata.com)

Analysis by Eric Thoo

Why Cool:

Indicative of evolving trends, AI-enabled data management technologies signal new ways of bringing
data together to simplify data sharing. Inzata aims at enhancing the value of data through
augmented profiling, structuring and integration of data varieties to support insights. By introspecting
data associations spanning internal, external and spread of data types and structures, Inzata focuses
on enriching data to the context of usage scenarios, such as an integrated prospect or customer
view. Descriptive capabilities that can aid in understanding data, determining relevance and accuracy,
are inferred based on prebuilt enrichment artifacts including geocoding, consumer location
demographics, political data overlays, environmental data, industry-specific diagnosis and
classification, and other processes of curation.

Inzata features an in-memory aggregation system and massively parallel execution of the compute-
intensive tasks of the data aggregation process, which support the data reuse and manipulation
needed for data management and analytics. Using an array of learning algorithms, descriptive
statistics of ingested or available data, Inzata augments and integrates data models and deduces
logical points of overlap and connections, establishing correlations. Once integrated and enriched,
the data can then be analyzed in-platform, or automatically pushed to other data hosting
environments, pipelined to data warehouse or SQL databases, distributed via REST API, or accessed
directly via Python and R programs using connector libraries.

By enabling data management efforts to visually and iteratively integrate and blend data from
multiple sources and types — such as social media, CRM, ERP, Hadoop, or data warehouses that are
in cloud or on-premises — Inzata seeks to simplify ways for sharing and dynamically repurposing
data through the combined work of human and AI. Available on Inzata’s platform as built-in functions,
InFlow provides automated assistance to generate data pipeline flow, purposed to ease data
exploration efforts without a reliance on SQL, ETL or manual coding skills. AI-assisted data
preparation approaches empower users — including less-technical roles — to load, blend and model
raw and unstructured data into data models that are useful for supporting analytics and engaging
visualizations.

Accessible via web or mobile browser, Inzata’s offerings support cloud based deployment on AWS
and Azure. To support enterprises that favor an on-premises model, Inzata supports private
deployment to deliver its data management functionality in an enterprise’s internal infrastructure.
Inzata’s offerings also include industry-specific, secure, private cloud options for organizations
requiring HIPAA- or CJIS-compliant cloud configurations.

Challenges:

■ Inzata will need to educate prospective buyers about the propositions of AI in data management
and the value of Inzata’s business-user-oriented platform architectures, while the understanding of
augmented data management concepts in organizations at large is limited.

■ As with all small and early-stage providers, Inzata will need to grow mind share in the markets
addressing data management technologies.

■ Emerging opportunities will stretch Inzata in terms of how versatile its technology is to work
seamlessly and easily for the many diverse data sources that large organizations currently
support, including legacy data environments.

Who Should Care:

Data and analytics leaders focused on data management, as well as


architects and modelers seeking to increase the reuse of data assets and the
use of AI in data management.

Leaders of data-related initiatives with a focus on sharing data across organizational boundaries and
beyond should develop an understanding of augmented data management that continuously adapts
and combines diverse data of interest into a cohesive set to support data and analytics demands.

TigerGraph
Redwood City, California, U.S. ( www.tigergraph.com)

Analysis by Julian Sun

Why Cool:
TigerGraph is cool for its ability to democratize graph analytics for enterprise adoption. Its GSQL
language reduces the learning curve for users to perform graph analytics without sacrificing
performance. Its GraphStudio, with a visual design interface provisioned by TigerGraph Cloud, further
lowers the barrier to enable users to get started with graph-based application development tasks in a
sandbox approach on the cloud.

Today, the graph data stores and analytics are promising but the enterprise adoption is low.
Customers usually choose a graph database and analytics to design and implement complex and
expensive algorithms such as pagerank, betweenness centrality, and closeness — none of which SQL
can solve easily. It takes a lot of effort for developers to learn a new language and use it efficiently.
The difficulty of finding skilled graph developers and nonstandardization of a graph query language is
slowing down enterprise adoption of the graph.

TigerGraph’s GSQL, a declarative graph query language, narrows the gap between asking complex
questions and developing a graph-based solution. Its building block is a single SQL-like block that
facilitates users to perform high-level declarative traversal. The building block is augmented with
innovative ACCUM clauses for parallel processing that guarantee high performance for expensive
graph algorithms. It is also an SQL user-friendly language in which SQL developers can learn and
implement graph applications with minimal ramp-up time. It reuses a lot of SQL syntax, semantics
and keywords. As the data sizes grow, GSQL’s data definition language (DDL) supports dynamic
schema change to manage enterprise data from a single machine to multiple machines. The
expressive loading language can onboard structured and unstructured data sources into a single
graph model quickly with easy-to-use loading functionality. The TigerGraph team has created an
open-source graph algorithm library using GSQL for broader adoption. It is actively working with other
industry experts to standardize the graph language.

TigerGraph GraphStudio is a graphical user interface that integrates all the phases of graph analytics.
Business users can start using the builtin starter kits such as anti-fraud, entity resolution, risk
scoring, customer 360 — all without writing any code. It includes graph schemas, sample data,
preloaded queries and a library of customized graph algorithms such as pagerank, shortest path and
community detection. Sophisticated users can directly write GSQL and the query interface can be
automatically generated as a REST API service. It drives reusability for new users to more easily
leverage and incorporate the existing data infrastructure.

TigerGraph offers a free tier of its enterprise version on TigerGraph Cloud, plus a free developer
version for nonproduction use and limited for one machine, one user and one graph. Users can learn
and experience graph analytics with the support from a fast-growing community and weekly online
office hours.

Recently, TigerGraph granted licenses to government research institutions around the world to use
TigerGraph Enterprise Edition to model the spread of COVID-19 for the duration of the crisis, and
establish a dedicated forum for people to collaborate on. This data-for-good activity indicates their
social responsibility and ambition to nurture the graph community.

Challenges:

■ While GSQL and GraphStudio lower the barrier to use, and no-code features on the road might
further facilitate graph capability democratization, organizations still need to have enough
competency to code graph algorithms efficiently.

■ GraphStudio lacks storytelling capability to explain the insights to casual business leaders. The
visualization capabilities are not intuitive to perform visual-based exploration.

■ The ecosystem is not yet complete. TigerGraph Cloud is cloud-agnostic but only runs on AWS right
now. The connectors to support different data sources and the output options of graph analytics
results are limited.

Who Should Care:

Data and analytics leaders who have tried graph technology on highly
interconnected data but fail to load or query trillions of edges with high
performance.

TigerGraph is a good fit for organizations that have clear graph problems to solve, but cannot find a
solution initially for enterprise-level adoption. It also fits organizations that have the requirement for
real-time and multihop analytics.

This research does not constitute an exhaustive list of vendors in any given technology area, but
rather is designed to highlight interesting, new and innovative vendors, products and services.
Gartner disclaims all warranties, express or implied, with respect to this research, including any
warranties of merchantability or fitness for a particular purpose.

Acronym Key and Glossary Terms


ACCUM accumulators

AI artificial intelligence

API application program interface


CDC change data capture

CJIS Criminal Justice Information Services Division

ELT extract, load, transform

ETL extract, transform, load

HIPAA Health Insurance Portability and Accountability Act

ML machine learning

REST representational state transfer

Evidence
1
Gartner’s Data Management Strategy Survey, 2019: This survey was conducted online between
August and September 2019 with Gartner Research Circle Members — a Gartner-managed panel.
There were 129 respondents for this survey. The survey was developed collaboratively by a team of
Gartner analysts and was reviewed, tested, and administered by Gartner’s Research Data and
Analytics team.

2
Gartner’s Data and Analytics Adoption Survey, 2019: This study was conducted to learn how
organizations use data and analytics. The research was conducted online during November and
December 2019 among 272 respondents from North America, Western Europe and APAC regions.
Companies from different industries were screened for having annual revenue less than $100 million.
Respondents were required to be at manager level or above and to have a primary involvement and
responsible for their organization’s data and analytics solutions, including purchase and investments.
The study was developed collaboratively by Gartner Analysts and the Primary Research Team, which
follows data and analytics management.

Notes

© 2020 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its
affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written
permission. It consists of the opinions of Gartner's research organization, which should not be construed as
statements of fact. While the information contained in this publication has been obtained from sources believed to
be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information.
Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment
advice and its research should not be construed or used as such. Your access and use of this publication are
governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its
research is produced independently by its research organization without input or influence from any third party. For
further information, see "Guiding Principles on Independence and Objectivity."

About Careers Newsroom Policies Site Index IT Glossary Gartner Blog Network Contact Send
Feedback

© 2018 Gartner, Inc. and/or its Affiliates. All Rights Reserved.

You might also like