Professional Documents
Culture Documents
Atlan - Data Management Report
Atlan - Data Management Report
Atlan - Data Management Report
What’s inside:
Meanwhile, data is growing faster than any of these companies can keep up.
Data teams are becoming more diverse than ever — data engineers, analysts,
analytics engineers, data scientists, product managers, business analysts, citizen
data scientists, and more.
The data tools and infrastructure they use are increasingly complicated. These
include data warehouses, lakes, lake houses, databases, real-time data streams,
BI tools, notebooks, modeling tools, and more.
1
The Rise of DataOps
In 2020, Gartner launched the industry’s first report on DataOps by recognizing
a set of Cool Vendors in DataOps. Since then, DataOps has only grown in
prominence and become more mainstream.
Google Trends data on global searches for “DataOps” since 2015. The y-axis shows “interest over time”, or a
normalized version of search interest. 100 represents peak popularity for the term in the given time and region.
In their 2022 Hype Cycle, Gartner predicted that DataOps will fully penetrate
the market in 2-5 years and moved the trend from the far left side of its curve to
the “Peak of Inflated Expectations”. Then, on June 23, 2022, Forrester
launched the latest version of its Wave report about data catalogs. But instead
of talking about “Machine Learning Data Catalogs” like before, they renamed
the category to “Enterprise Data Catalogs for DataOps” — announcing the
mainstream arrival of DataOps.
— Gartner Glossary
2
What actually is DataOps?
The first, and perhaps most important, thing to know about DataOps is that it’s
not a product. It’s not a tool.
In fact, it’s not anything you can buy, and anyone trying to tell you otherwise is
trying to trick you.
There’s no standard definition for DataOps. However, you’ll see that everyone
talks about DataOps in terms of being beyond tech or tools. Instead, they focus
on terms like communication, collaboration, integration, experience, and
cooperation.
3
🔦 SPOTLIGHT
But internally, life was chaos. We dealt with every fire drill possible, leading to
long chains of frustrating phone calls and hours spent trying to figure out what
went wrong. We were breaking trust, and we knew this couldn’t continue.
We put our minds to solving this problem and came together to solve these
questions with new tooling and practices. Taking inspiration from other best
practices, we stumbled upon what we now know as DataOps.
It was during this time that we saw what the right tooling and culture can do for
a data team. The chaos decreased, the same massive data projects became
exponentially faster and easier, and the late-night calls became wonderfully
rare. And as a result, we were able to accomplish far more with far less. We built
India’s national data platform, done by an eight-member team in just 12
months, many of whom had never pushed a line of code to production before.
We later wrote down our learnings in our DataOps Culture Code, a set of
principles to help data teams work together, build trust, and collaborate better.
That’s ultimately what DataOps does, and why it’s all the rage today. It helps
data teams stop wasting time on the endless interpersonal and technical
speed bumps that stand between them and the work they love to do. And in
today’s economy, anything that saves time is priceless.
4
🔦 SPOTLIGHT
All data assets, from code and models to data and dashboards, are assets and
should be treated as such. Assets are easily discoverable, maintained, and
reusable.
As business needs evolve rapidly, data teams need to be a step ahead, not
deluged with three months of backlog. Constantly measure your team’s
velocity, and invest in foundational initiatives to improve cycle times.
With the inherent diversity of data teams, it's all too easy to misunderstand
other team members' roles. But that creates trust deficiencies — especially
when things go wrong. Intentionally create systems of trust in your team.
● Make everyone’s work accessible and discoverable to break down "tool" silos.
● Create transparency in data pipelines and lineage so everyone can see and
troubleshoot issues.
● Set up monitoring and alerting systems to proactively know when things break.
5
Operationalizing DataOps with a DataOps role
Today, every other domain has a focused enablement function to help that
function be productive and successful. For example, SalesOps and Sales
Enablement focus on improving productivity, ramp time, and success for a sales
team. DevOps and Developer Productivity Engineering teams are focused on
improving collaboration between software teams and productivity for
developers. Why don’t we have a similar function for our data organizations?
A DataOps function helps the rest of the organization achieve value from data.
This function doesn’t execute data or analytics projects. Instead, it focuses on
the Tools, Processes, Automation, and Culture that will help the rest of the
organization get value from data.
Data Consumers
● Enable self-service
IMPACT
Executives, business users, ● Reduce dependencies on data team
product managers, compliance, ● Improve speed of decision-making
finance, etc.
6
Structuring a DataOps function
The best way to think about DataOps teams is through analogies to other
teams: e.g. RevenueOps teams activate revenue data to improve revenue
growth, and ProductOps teams activate product data to build better products.
7
Activating metadata holds the key to
the DataOps dream
In this increasingly diverse data world, metadata holds the key to the elusive
promised land — a single source of truth. There will always be countless tools
and tech in a team’s data infrastructure. But by aggregating all of their diverse
metadata, a team can finally unify context about all their tools, processes, and
data.
All these new forms of metadata are being created by living data systems,
sometimes in real time. This has led to an explosion in the size and scale of
metadata. Metadata is itself becoming big data.
In the past few months, concepts such as the data mesh, data fabric, and
DataOps have been gaining more momentum. However, all of these concepts
are fundamentally based on being able to collect, store, and analyze metadata.
As metadata increases and the intelligence we can derive from it increases, so
too does the number of use cases that it can power. Today, even the most
data-driven organizations have only scratched the surface of what is possible
with metadata. But using metadata to its fullest potential can fundamentally
change how our data systems operate.
For this new paradigm, where metadata is approaching “big data” and the use
cases of metadata are growing from simple data cataloging and governance to
automated and programmatic use cases to power DataOps and data meshes,
the old way of approaching metadata is no longer enough.
8
Traditionally, data catalogs were built to be passive. They brought metadata
from lots of different tools into the “data catalog” or the “data governance
tool”. The problem with this approach is that it tries to solve a “too many silos”
problem by adding one more siloed tool. User adoption suffers, metadata
stagnates, and these exciting catalogs turn into expensive shelfware.
Active metadata changes this. Instead of just collecting metadata from the rest
of the stack and bringing it back into a passive data catalog, active metadata
makes a two-way movement of metadata possible. It sends enriched metadata
back into every tool in the data stack, giving the humans of data context
wherever and whenever they need it — inside the BI tool as they wonder what a
metric actually means, inside Slack when someone sends the link to a data
asset, inside the query editor as try to find the right column, and inside Jira as
they create tickets for data engineers or analysts.
9
3 characteristics of active metadata platforms
Siloed tool → Embedded collaboration
🔦 SPOTLIGHT
10
From generic experiences to personalized experiences
🔦 SPOTLIGHT
11
From closed & manual to open & autonomous
Metadata will also be the key to unlocking new superpowers in the modern
data stack, such as auto-tuning pipelines based on demand or automatically
deprecating unused data assets based on usage metadata.
🔦 SPOTLIGHT
This company needed to track and delete regularly, but this happened
manually. This sometimes led to data not being deleted, resulting in
contractual breaches and costing the firm from a legal and compliance
perspective. They used Atlan’s active metadata to automate manual data
deletion processes for petabytes of data to improve data-contract compliance.
12
The leading active metadata
platform for modern data teams
Built by a data team for data teams, Atlan is the active metadata platform for DataOps. Our platform activates
metadata to help data-driven enterprises discover, understand, trust, and collaborate on their data. With
intelligent bots, column-level lineage, and personalized experiences, Atlan creates a single source of truth and
brings context back into the tools where data teams live. Just three years after launch, Atlan is the tool of choice
for a growing list of modern data teams around the world, including WeWork, Plaid, Postman, Scripps Health,
TechStyle, Snapcommerce, and Delhivery.
Named in Gartner’s inaugural Market Guide for Active Recognized as a Top 5 Global Innovator in
Metadata, 3 Hype Cycles, and 7 reports in 2021 DataOps by IDC in 2022