Atlan - Data Management Report

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Data Can Be Chaos.

Work Shouldn’t Be.

How Active Metadata helps modern data


organizations embrace the DataOps way

What’s inside:

What is DataOps & Active Metadata?

How to operationalize DataOps and structure a DataOps function

5 real industry implementations of Active Metadata


Let’s face it.
Traditional data management doesn’t work.
75% of executives don’t trust their organization’s data. That’s a huge problem,
given how much money companies are spending on their data these days. To
make matters worse, the average Chief Data Officer’s tenure is only 2.5 years
and only 27% of data projects are actually successful.

Meanwhile, data is growing faster than any of these companies can keep up.

Data teams are becoming more diverse than ever — data engineers, analysts,
analytics engineers, data scientists, product managers, business analysts, citizen
data scientists, and more.

The data tools and infrastructure they use are increasingly complicated. These
include data warehouses, lakes, lake houses, databases, real-time data streams,
BI tools, notebooks, modeling tools, and more.

All of this has led to data chaos like never before.

“Can I trust this data asset? Where does it come from?”


“Where can I find the latest cleaned data set for our customer master?”
“What does this column name mean?”

Collaboration Challenges Tribal Knowledge “Hero” Bottlenecks

1
The Rise of DataOps
In 2020, Gartner launched the industry’s first report on DataOps by recognizing
a set of Cool Vendors in DataOps. Since then, DataOps has only grown in
prominence and become more mainstream.

Google Trends data on global searches for “DataOps” since 2015. The y-axis shows “interest over time”, or a
normalized version of search interest. 100 represents peak popularity for the term in the given time and region.

In their 2022 Hype Cycle, Gartner predicted that DataOps will fully penetrate
the market in 2-5 years and moved the trend from the far left side of its curve to
the “Peak of Inflated Expectations”. Then, on June 23, 2022, Forrester
launched the latest version of its Wave report about data catalogs. But instead
of talking about “Machine Learning Data Catalogs” like before, they renamed
the category to “Enterprise Data Catalogs for DataOps” — announcing the
mainstream arrival of DataOps.

“DataOps is a collaborative data management practice


focused on improving the communication, integration
and automation of data flows between data managers
and data consumers across an organization.”

— Gartner Glossary

2
What actually is DataOps?
The first, and perhaps most important, thing to know about DataOps is that it’s
not a product. It’s not a tool.

In fact, it’s not anything you can buy, and anyone trying to tell you otherwise is
trying to trick you.

Instead, DataOps is a mindset or a culture — a way to help data teams and


people work together better.

There’s no standard definition for DataOps. However, you’ll see that everyone
talks about DataOps in terms of being beyond tech or tools. Instead, they focus
on terms like communication, collaboration, integration, experience, and
cooperation.

In our mind, DataOps is really


about bringing together today’s
increasingly diverse data teams
and helping them work across
equally diverse tools and
processes. Its principles and
processes help teams drive better
data management, save time, and
reduce wasted effort.

Think of DataOps as the best parts


of Agile, Lean, DevOps, and
Product Thinking, all applied to
the field of data management.

3
🔦 SPOTLIGHT

How a DataOps approach helped our team


become 6x more agile and build India’s
National Data Platform

At Atlan, we started as a data team ourselves, solving social good problems


with large-scale data projects. The projects were really cool — we got to work
with organizations like the UN and Gates Foundation on large-scale projects
affecting millions of people.

But internally, life was chaos. We dealt with every fire drill possible, leading to
long chains of frustrating phone calls and hours spent trying to figure out what
went wrong. We were breaking trust, and we knew this couldn’t continue.

We put our minds to solving this problem and came together to solve these
questions with new tooling and practices. Taking inspiration from other best
practices, we stumbled upon what we now know as DataOps.

It was during this time that we saw what the right tooling and culture can do for
a data team. The chaos decreased, the same massive data projects became
exponentially faster and easier, and the late-night calls became wonderfully
rare. And as a result, we were able to accomplish far more with far less. We built
India’s national data platform, done by an eight-member team in just 12
months, many of whom had never pushed a line of code to production before.

We later wrote down our learnings in our DataOps Culture Code, a set of
principles to help data teams work together, build trust, and collaborate better.

That’s ultimately what DataOps does, and why it’s all the rage today. It helps
data teams stop wasting time on the endless interpersonal and technical
speed bumps that stand between them and the work they love to do. And in
today’s economy, anything that saves time is priceless.

4
🔦 SPOTLIGHT

The DataOps Culture Code

🤝 It’s a team sport, and collaboration is key

The data team is incredibly diverse. Data scientists, analysts, engineers,


business users... all diverse people, with diverse tools, skill sets, and DNA.
Embrace diversity, and create mechanisms for effective collaboration.

🗄 Treat all data assets as assets or products

All data assets, from code and models to data and dashboards, are assets and
should be treated as such. Assets are easily discoverable, maintained, and
reusable.

🚀 Optimize for agility

As business needs evolve rapidly, data teams need to be a step ahead, not
deluged with three months of backlog. Constantly measure your team’s
velocity, and invest in foundational initiatives to improve cycle times.

● Reduce dependencies between business, analysts, and engineers.


● Enable a documentation-first culture.
● Automate whatever is repetitive.

👥 Create systems of trust

With the inherent diversity of data teams, it's all too easy to misunderstand
other team members' roles. But that creates trust deficiencies — especially
when things go wrong. Intentionally create systems of trust in your team.

● Make everyone’s work accessible and discoverable to break down "tool" silos.
● Create transparency in data pipelines and lineage so everyone can see and
troubleshoot issues.
● Set up monitoring and alerting systems to proactively know when things break.

5
Operationalizing DataOps with a DataOps role
Today, every other domain has a focused enablement function to help that
function be productive and successful. For example, SalesOps and Sales
Enablement focus on improving productivity, ramp time, and success for a sales
team. DevOps and Developer Productivity Engineering teams are focused on
improving collaboration between software teams and productivity for
developers. Why don’t we have a similar function for our data organizations?

A DataOps function helps the rest of the organization achieve value from data.
This function doesn’t execute data or analytics projects. Instead, it focuses on
the Tools, Processes, Automation, and Culture that will help the rest of the
organization get value from data.

The consumers of DataOps include…

● Improve productivity of data team


Data Team IMPACT ● Increase time to value or speed of
Analysts, analytics engineers, delivery
scientists, data engineers, etc. ● Reduce ramp time of a new joinee
● Reduce attrition

Data Consumers
● Enable self-service
IMPACT
Executives, business users, ● Reduce dependencies on data team
product managers, compliance, ● Improve speed of decision-making
finance, etc.

Data Platforms and Business Applications


Automated and programmatic workflows to drive automated data platform use cases for
the business and/or product

6
Structuring a DataOps function

DataOps is a central function that


WeWork’s DataOps Function
enables the rest of the organization.
There are two key personas:
EMILY LAZIO
DataOps Enablement Leads: They DATAOPS ENABLEMENT
understand data and users, and are
great at cross-team collaboration and
bringing people together. DataOps 1. Masters in Information and Library Sciences
Understands taxonomy and structure
Enablement Leads often come from
2. Children’s Librarian
backgrounds like Information Energetic, extroverted, and great at bringing people
together
Architects, Data Governance
3. Information Architect in WeWork’s Design team
Managers, Library Sciences, Data Understands the data ecosystem and user research
Strategists, Data Evangelists, and even
extroverted Data Analysts and
Engineers.
YONG LU
DataOps Enablement Engineers: They DATAOPS ENGINEERING

are the automation brain in the


DataOps team. Their key strength is
1. Masters in Computer Science
sound knowledge of data and how it Understands data and technology

flows between systems/teams, acting 2. Engineer and Data Management Leader


Systems thinker who is great at simplifying complex
as both advisors and executors on problems
automation. They are often former 3. Data Engineering Lead in WeWork’s Engineering
Internal data “guru” who can identify patterns for
Developers, Data Architects, Data automation
Engineers, and Analytics Engineers.

The best way to think about DataOps teams is through analogies to other
teams: e.g. RevenueOps teams activate revenue data to improve revenue
growth, and ProductOps teams activate product data to build better products.

DataOps teams activate “data data” (aka metadata) to help


organizations achieve value from data.

7
Activating metadata holds the key to
the DataOps dream
In this increasingly diverse data world, metadata holds the key to the elusive
promised land — a single source of truth. There will always be countless tools
and tech in a team’s data infrastructure. But by aggregating all of their diverse
metadata, a team can finally unify context about all their tools, processes, and
data.

All these new forms of metadata are being created by living data systems,
sometimes in real time. This has led to an explosion in the size and scale of
metadata. Metadata is itself becoming big data.

In the past few months, concepts such as the data mesh, data fabric, and
DataOps have been gaining more momentum. However, all of these concepts
are fundamentally based on being able to collect, store, and analyze metadata.
As metadata increases and the intelligence we can derive from it increases, so
too does the number of use cases that it can power. Today, even the most
data-driven organizations have only scratched the surface of what is possible
with metadata. But using metadata to its fullest potential can fundamentally
change how our data systems operate.

For this new paradigm, where metadata is approaching “big data” and the use
cases of metadata are growing from simple data cataloging and governance to
automated and programmatic use cases to power DataOps and data meshes,
the old way of approaching metadata is no longer enough.

“The increased demand for orchestrating existing and


new systems has rendered traditional metadata
practices insufficient.”

— Gartner, Market Guide for Active Metadata Management

8
Traditionally, data catalogs were built to be passive. They brought metadata
from lots of different tools into the “data catalog” or the “data governance
tool”. The problem with this approach is that it tries to solve a “too many silos”
problem by adding one more siloed tool. User adoption suffers, metadata
stagnates, and these exciting catalogs turn into expensive shelfware.

Active metadata changes this. Instead of just collecting metadata from the rest
of the stack and bringing it back into a passive data catalog, active metadata
makes a two-way movement of metadata possible. It sends enriched metadata
back into every tool in the data stack, giving the humans of data context
wherever and whenever they need it — inside the BI tool as they wonder what a
metric actually means, inside Slack when someone sends the link to a data
asset, inside the query editor as try to find the right column, and inside Jira as
they create tickets for data engineers or analysts.

Active metadata also enables tons of programmatic use cases through


automation — e.g. data deprecation by automatically purging low-quality or
outdated data products, or data quality management by automatically
stopping downstream pipelines when a data quality issue is detected and using
past records to predict what went wrong and fix it without human intervention.

9
3 characteristics of active metadata platforms
Siloed tool → Embedded collaboration

When I have a question, the last thing


I want to do is jump to a different tool,
find my login, search for the
dashboard, and look at lineage. I want
context where I am, when I need it.

Imagine a world where data catalogs


don’t live in their own “third website”.
Instead, a user can get all the context
where they need it — in a BI
dashboard or whatever tool they’re
already in, whether that’s Slack, Jira,
the query editor, or the data
warehouse.

🔦 SPOTLIGHT

Monster.com “We used to have two starting


Global job search behemoth points, either Looker or Atlan, and
users had to make a choice. Now,
Monster improved their enterprise’s with Atlan metadata showing up
collaboration by creating a single on Looker, users don’t have to
source of truth with Atlan’s business choose, which is awesome!”
glossary. They are unlocking
embedded collaboration by activating — Sara Swart
metadata from Atlan into Looker, so VP of Strategic Planning
the team always has metadata and
context at their fingertips.

10
From generic experiences to personalized experiences

Data teams are diverse. Analysts,


engineers, scientists, and architects
all have their own preferences. Data
engineers care about pipeline health
and data quality tests, whereas
analysts care about column
descriptions and frequency
distribution. But passive metadata
tools treat us all the same with the
same generic experience.
If Netflix can serve personalized experiences to you and me, why serve the
same generic experience to all the humans of data?

🔦 SPOTLIGHT

How a multi-billion dollar investment management firm


used personalization to make their data mesh a reality

The data mesh is built upon


the paradigm of domain-
based personalization. This
firm receives data from
10,000+ external data feeds.
They use Atlan’s granular
Personas and Purposes,
along with detailed data and
metadata policy
management, to serve the
right products to the right
users in the right domains.

11
From closed & manual to open & autonomous

Metadata will also be the key to unlocking new superpowers in the modern
data stack, such as auto-tuning pipelines based on demand or automatically
deprecating unused data assets based on usage metadata.

🔦 SPOTLIGHT

Programmatic data deletion for a leading media


analytics platform to manage external data providers

This company needed to track and delete regularly, but this happened
manually. This sometimes led to data not being deleted, resulting in
contractual breaches and costing the firm from a legal and compliance
perspective. They used Atlan’s active metadata to automate manual data
deletion processes for petabytes of data to improve data-contract compliance.

12
The leading active metadata
platform for modern data teams

Built by a data team for data teams, Atlan is the active metadata platform for DataOps. Our platform activates
metadata to help data-driven enterprises discover, understand, trust, and collaborate on their data. With
intelligent bots, column-level lineage, and personalized experiences, Atlan creates a single source of truth and
brings context back into the tools where data teams live. Just three years after launch, Atlan is the tool of choice
for a growing list of modern data teams around the world, including WeWork, Plaid, Postman, Scripps Health,
TechStyle, Snapcommerce, and Delhivery.

Pioneering the Active Metadata and DataOps categories

Named a Leader in the Forrester Wave™: Recognized as a Gartner Cool Vendor in


Enterprise Data Catalogs for DataOps, Q2 2022 DataOps in 2020

Named in Gartner’s inaugural Market Guide for Active Recognized as a Top 5 Global Innovator in
Metadata, 3 Hype Cycles, and 7 reports in 2021 DataOps by IDC in 2022

Deep partnerships and integrations across the modern data stack

First data catalog validated as a Snowflake


Ready Technology Partner

Native integration with Unity Catalog,


including column-level lineage

Named an AWS Advanced Technology


Partner and Marketplace Seller

SEE A DEMO LEARN MORE

You might also like