Dummies For Data Observability

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

These materials are © 2023 John Wiley & Sons, Inc.

Any dissemination, distribution, or unauthorized use is strictly prohibited.


Data
Observability
Monte Carlo Special Edition

by Stephanie Diamond

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Data Observability For Dummies®, Monte Carlo Special Edition

Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2023 by John Wiley & Sons, Inc., Hoboken, New Jersey

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
the prior written permission of the Publisher. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, and related trade
dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in
the United States and other countries, and may not be used without written permission. Monte
Carlo and MC and related trade dress are registered trademarks of Monte Carlo Data, Inc. in the
United States, and may not be used without written permission. All other trademarks are the
property of their respective owners. John Wiley & Sons, Inc., is not associated with any product
or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: WHILE THE PUBLISHER AND AUTHORS HAVE


USED THEIR BEST EFFORTS IN PREPARING THIS WORK, THEY MAKE NO REPRESENTATIONS
OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF
THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION
ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES REPRESENTATIVES, WRITTEN SALES
MATERIALS OR PROMOTIONAL STATEMENTS FOR THIS WORK. THE FACT THAT AN ORGANIZATION,
WEBSITE, OR PRODUCT IS REFERRED TO IN THIS WORK AS A CITATION AND/OR POTENTIAL
SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE PUBLISHER AND AUTHORS
ENDORSE THE INFORMATION OR SERVICES THE ORGANIZATION, WEBSITE, OR PRODUCT MAY
PROVIDE OR RECOMMENDATIONS IT MAY MAKE. THIS WORK IS SOLD WITH THE UNDERSTANDING
THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING PROFESSIONAL SERVICES. THE ADVICE
AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR YOUR SITUATION. YOU SHOULD
CONSULT WITH A SPECIALIST WHERE APPROPRIATE. FURTHER, READERS SHOULD BE AWARE
THAT WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN
THIS WORK WAS WRITTEN AND WHEN IT IS READ. NEITHER THE PUBLISHER NOR AUTHORS
SHALL BE LIABLE FOR ANY LOSS OF PROFIT OR ANY OTHER COMMERCIAL DAMAGES, INCLUDING
BUT NOT LIMITED TO SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR OTHER DAMAGES.

For general information on our other products and services, or how to create a custom For
Dummies book for your business or organization, please contact our Business Development
Department in the U.S. at 877-409-4177, contact info@dummies.biz, or visit www.wiley.com/go/
custompub. For information about licensing the For Dummies brand for products or services,
contact BrandedRights&Licenses@Wiley.com.
ISBN 978-1-394-21844-8 (pbk); 978-1-394-21845-5 (ebk)

Publisher’s Acknowledgments
We’re proud of this book and of the people who worked on it. Some of the
people who helped bring this book to market include the following:
Project Manager: Rebecca Senninger Client Account Manager:
Acquisitions Editor: Traci Martin Molly Daugherty

Senior Managing Editor: Production Editor: Pradesh Kumar


Rev Mengle

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Table of Contents
INTRODUCTION................................................................................................ 1
About This Book.................................................................................... 1
Icons Used in This Book........................................................................ 1
Beyond the Book................................................................................... 2

CHAPTER 1: Introducing Data Observability........................................ 3


Learning About Data Observability..................................................... 3
Understanding data observability................................................. 4
Differentiating it from application observability.......................... 4
Exploring Why Data Observability Matters Now............................... 5
Identifying the consequences of data downtime........................ 5
Exploring the Business Case for Data Observability......................... 7
Deciding When to Invest in Data Observability................................. 8
Scaling your data stack.................................................................... 8
Increased time firefighting data quality issues............................ 8
More data consumers at your company....................................... 9
Moving to a self-service analytics model...................................... 9
Pioneering the Future of Data Observability..................................... 9

CHAPTER 2: Discovering the Five Pillars of Data


Observability.................................................................................. 11
Examining the Five Pillars of Data Observability............................. 11
Uncovering Key Components of Data
Observability Platforms...................................................................... 12
Unlocking automated data quality monitoring.......................... 13
Receiving alerts and notifications................................................ 13
Automated root cause analysis.................................................... 13
Supporting collaboration.............................................................. 13
Achieving end-to-end visibility..................................................... 14

CHAPTER 3: Recognizing the Limits of Testing and Data


Quality Monitoring.................................................................... 15
Navigating the Tradeoffs of Testing.................................................. 15
Advantages..................................................................................... 16
Disadvantages................................................................................ 16
Viewing the Tradeoffs of Data Quality Monitoring.......................... 17
Advantages..................................................................................... 17

Table of Contents iii

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Disadvantages................................................................................ 18
How Data Observability Goes Beyond Testing
and Monitoring.................................................................................... 18

CHAPTER 4: Applying Best Practices for Scalable


Data Observability..................................................................... 21
Five Best Practices for Data Observability........................................ 21
Step 1: Set up domains................................................................. 22
Step 2: Set up dedicated channels............................................... 22
Step 3: Set up freshness, volume, and schema monitors......... 22
Step 4: Set up custom monitors for data quality checks.......... 23
Step 5: Track SLAs, SLOs, and SLIs for important tables........... 23
Using Data Quality Metrics That Actually Matter............................ 25

CHAPTER 5: Top Ten Reasons to Invest in


Data Observability..................................................................... 27

iv Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
A
s data becomes increasingly important to today’s busi-
nesses, ensuring its quality and reliability has never been
more critical. High-quality data is vital for any organiza-
tion’s success, from building new products, driving accurate
decision-making, and more recently, powering AI and ML use
cases.

This book explores the evolving landscape of data management


and the crucial role that detecting, resolving, and preventing data
issues (also known as data observability) plays in this process.

About This Book


Welcome to Data Observability For Dummies, Monte Carlo Special
Edition. This book covers the information you need to get the
most from a data observability platform.

We cover several topics, including the following:

»» Introducing the concept of data observability


»» Understanding the five pillars of data observability
»» Exploring the limits of testing and data quality monitoring
»» Creating a step-by-step strategy to implement data
observability
»» Top ten reasons to invest in data observability

Icons Used in This Book


Throughout this book, we use different icons to highlight impor-
tant information. Here’s what they mean:

The Tip icon highlights information that can make doing things
easier or faster.

Introduction 1

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
The Remember icon points out things you need to remember
when searching your memory bank.

The Warning icon alerts you to things that can harm you or your
company.

Beyond the Book


In this book, business leaders like you discover more about effec-
tively implementing data observability in your organization.

If you want resources beyond what this short book offers, visit
(montecarlodata.com/blog) to discover more about the follow-
ing topics:

»» Improving data quality at scale


»» Building reliable data platforms
»» The latest technologies and trends in the data engineering
landscape, including data observability
»» Data observability case studies
»» Data observability best practices
»» The latest technologies and trends in data engineering,
beyond data observability

2 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Taking a modern approach to data
quality management

»» Introducing and investigating data


downtime

»» Assessing when to invest in data


observability

Chapter 1
Introducing Data
Observability

Y
ou’ve likely heard the phrase “Data is the new software.”
This statement illustrates data’s great importance to most
businesses in today’s economy. Like software applications,
data is a strategic resource, but if the data is incomplete, inaccu-
rate, or unreliable, it can have detrimental impacts on the entire
company.

This chapter introduces you to data observability and explains the


value data observability brings to your organization.

Learning About Data Observability


Data observability is relatively new to the modern tech stack.
Its use has become critical to organizations because the conse-
quences of bad data quality become more consequential as data
becomes increasingly critical to driving decision-making and
powering digital services.

CHAPTER 1 Introducing Data Observability 3

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Do you think quality won’t impact your company? Think again.
According to Gartner, through 2025, 80 percent of organiza-
tions seeking to scale digital business will fail because they
­didn’t take a modern approach to data and analytics governance
(https://blogs.gartner.com/andrew_white/2021/01/12/
our-top-data-and-analytics-predicts-for-2021).

Understanding data observability


At its core, data observability provides your organization with a
comprehensive understanding of the health of your data at each
stage in its life cycle, across data pipelines, infrastructure, and
the data itself. Data observability ensures that data teams are the
first to know — and fix — when data quality issues arise, and
in turn, provides companies with confidence in the accuracy
and reliability of their data.

For more details about the fundamental principles of data observ-


ability, see Chapter 2.

Differentiating it from application


observability
It’s important to distinguish data observability from another key
concept, application observability:

»» Data observability monitors analytical data pipelines,


infrastructure, and data inputs. It’s leveraged by data
engineers, data scientists, data analysts, and other data
professionals to ensure data pipeline uptime while keeping
instances of bad data and broken dashboards to a
minimum.
»» Application observability refers to a software engineering
team’s ability to determine an application’s internal state
based on its outputs. This means you monitor applications
to ensure they’re performing correctly (such as when the
website is down) and fix problems quickly when they’re not.
In the context of a software application, observability typically
involves collecting and analyzing metrics, traces, and logs:

Application observability focuses on the performance and reli-


ability of software applications. Data observability focuses on the
health and reliability of data within an organization’s systems.

4 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Exploring Why Data Observability
Matters Now
One of the main reasons data observability matters more now
than ever before is the impact of data downtime. Monte Carlo
has found that data downtime tops the list of pain points for
the many worldwide data leaders they work with (https://www.
montecarlodata.com/blog-what-is-data-observability/).

Data downtime is defined as periods of time when data is incom-


plete, erroneous, missing, or otherwise inaccurate.

Data downtime affects everyone. It can be due to such things


as buggy pull requests, schema changes, software failures, and
more. Obviously, it’s imperative that you do everything you can
to minimize it.

Identifying the consequences


of data downtime
Consider the consequences of data downtime. It’s detrimental
to all parts of your business — including your customers — as
shown in Figure 1-1.

FIGURE 1-1: Some of the consequences of data downtime include lost


time, lost trust, and lost revenue.

CHAPTER 1 Introducing Data Observability 5

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
As the time to detect issues increases, you see a severe impact on
your business, which includes:

»» Lost time: Your data engineering team experiences lost


time while frantically trying to identify the root cause of the
problem. The team must halt other essential projects while
they search for an answer.
»» Lost trust: Business stakeholders and customers lose trust
in the data and avoid using it. This loss of trust defeats the
purpose of having data if you can’t rely on it. Bad data can
also cause regulatory infringements, inaccurate reporting,
and other issues that affect public perception.
»» Lost revenue: If your external customers find that your
customer service people don’t have the correct information
or can’t help them, they move to competitors. This results in
lost revenue.

AVOIDING A DATA DISASTER


A data observability platform can help your company avoid serious
problems. Following are two real-world disasters that happened to
leading companies because of bad data:

• Equifax: Global credit giant Equifax issued millions of customers


the wrong credit scores because they trained a machine learning
(ML) model on bad data. This error resulted in a stock drop of
5 percent the day after the incident (https://www.cnn.com/
2022/08/03/business/equifax-wrong-credit-scores/
index.html).
• Unity Technologies: Unity, the ad monetization company, experi-
enced a revenue loss of $110 million due to bad ads data fueling
its targeting algorithms (https://seekingalpha.com/news/
3836713-unity-crashes-20-as-guidance-shows-slowing-
growth-ad-delay-could-hurt-revenue-for-a-year).

These are just two examples of how unreliable data can have dire
consequences for your organization. It makes investing in a data
observability platform a way to future-proof your organization.

6 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Exploring the Business Case for
Data Observability
When you think about data observability, you probably
think about how it helps you avoid the consequences of data­
downtime — wasted resources, customer churn, and reputational
risk, to name a few. However, there are many more ­benefits
derived from using data observability.

Your investment in data observability leads to these four


outcomes:

»» Enhance decision-making: Your company’s future depends


on the big decisions you make. Data observability drives
better insights and gives your leaders confidence in their
decisions.
»» Stand out among competitors: It’s difficult to stand above
the competition in today’s crowded sea of online products.
Having reliable data to power your products sets your
company apart from competitors by giving you the data-
driven edge.
»» Remove bottlenecks: Automated tooling removes bottle-
necks. It reduces labor costs and increases team capacity. It
also significantly speeds up data engineering workflows.
»» Create trusted data products: Reliable data leads to more
reliable and trusted data products, including dashboards,
reports, large-language models, and other assets that drive
value for your stakeholders.

CHAPTER 1 Introducing Data Observability 7

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
BEST-IN-CLASS DATA
OBSERVABILITY
A best-in-class data observability framework provides the following
tactical benefits, including:

• Being the first to know about data quality issues in production.

• Fully understanding the impact of the issue.

• Understanding where and how the data broke.

• Taking action to fix the issue.

• Collecting learnings so you can prevent issues from occurring


again.

These results make data observability frameworks a valuable addition


to your organization and make collaboration easier.

Deciding When to Invest in Data


Observability
Investing in data observability is vital to ensure your systems and
enterprise health. In the following sections, we consider when it’s
essential to invest.

Scaling your data stack


Is your data stack scaling with more data sources, tables, and
complexity? Your business is constantly evolving and expanding
if things are going well (or even if they’re not). If you have more
and more moving parts, it’s time to invest in data observability.

A good rule is to consider investing in data observability if you


have over fifty tables.

Increased time firefighting data


quality issues
If your team spends at least 30 percent of its time firefighting data
quality issues, it’s a good time to invest in data observability. You

8 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
want your data engineers to be innovating and building new data
products to drive growth, not fixing ad hoc data quality problems.

More data consumers at your company


Your team probably has more data consumers than they did a
year ago. This means that you have an ever-increasing number
of business stakeholders relying on analytical data in their day-
to-day jobs. More eyeballs on the data — and more data being
produced for new use cases — puts your company at risk for an
increase in data downtime. So this would be a perfect time to
invest in data observability.

Moving to a self-service analytics model


When you move to a self-service analytics model, your business
users can interact directly with data, and they’re relieved not to be
faced with any bottlenecks to data access. This move also means
your data team isn’t burdened with fulfilling ad hoc requests.
Win-win.

However, one crucial issue can break down the entire process.
Your business users need to be able to trust the data they are
using. Your self-service model will fail if they don’t trust their
dashboards and reports, or the data fueling their most critical
data products and services is erroneous. That’s why it’s critically
important that you have reliable data. The best way to do that is
to invest in data observability.

Pioneering the Future of Data


Observability
As data teams increase their scope in the organization and data
use cases grow, the data team is more impactful to the bottom
line than ever before. Now, everyone across the business lever-
ages data every day to drive insights, power digital services, and
even train ML models. Today, data is a product.

To support this shift, data teams are moving from the CFO’s office
to the Engineering organization, where they’re held accountable
to supporting or embedding within functional teams across the
company. Even titles and roles are changing to match this new

CHAPTER 1 Introducing Data Observability 9

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
reality, with Heads of Data and Analytics becoming Heads of Data,
ML, and AI, and data engineering teams newly responsible for
supporting both data and AI use cases.

Current applications of generative AI in data and engineering


are focused almost exclusively on scaling productivity, such as
GitHub Co-Pilot. In many ways, we don’t know what the future
of Generative AI holds but we do know that data teams will play a
big part in its success.

There’s an exciting opportunity for LLMs to help with data qual-


ity, but the even more powerful thesis is that data quality and
reliability can help LLMs. In fact, LLMs serving production use
cases cannot exist without a solid foundation: having lots of high
quality, reliable, trusted data.

To meet this need, we need to look beyond just the data to achieve
data reliability. Unreliable data doesn’t live in a silo. It’s impacted
by all three ingredients of the data ecosystem: data, code, and
infrastructure.

This broader vision reflects how software engineering tackles


detection and resolution, too. Application observability starts
with infrastructure but analyzes way more than that to detect
and resolve software downtime; root cause analysis takes into
account code, infrastructure, services, network, and plenty of
other factors.

As with software reliability, data reliability isn’t achieved in a


vacuum. It’s often impacted by multiple factors, frequently act-
ing in tandem or compounding on one another. It’s time to start
treating it that way.

10 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Introducing the five pillars of data
observability

»» Reviewing key components of data


observability platforms

Chapter 2
Discovering the
Five Pillars of Data
Observability

H
igh-quality data holds the power to drive strategies and
shape outcomes. It provides a roadmap to driving business
growth and enhanced decision-making.

This chapter explores the five pillars of data observability that


help businesses achieve data reliability, and in the process, unlock
their data’s full potential. We also look at the key components of
modern data observability platforms.

Examining the Five Pillars of Data


Observability
The five pillars are the guiding principles to ensure data quality
and reliability. We look at each in turn.

»» Pillar 1: Freshness: Data freshness is essential. Is your data


recent? When it comes to decision-making, using stale data
wastes time and money.

CHAPTER 2 Discovering the Five Pillars of Data Observability 11

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Ask: When was the last time this table was updated? How
up-to-date is your data? Is this table updating at the right
time each day?
»» Pillar 2: Volume: Volume gives you an insight into the
completeness of your data tables and the health of your
data sources.
Ask: Do you have too many or too few rows? If there is a
significant change, will you know? Significant changes in
volume should be noted and investigated.
»» Pillar 3: Schema: Schema refers to the organization of your
data. In fact, schema changes are the number one cause of data
downtime issues, and major changes can indicate broken data.
Ask: Has the organization of the data changed? Who makes
changes to data schema and when.
»» Pillar 4: Quality: The Quality pillar tells you if your collected
data falls within an expected range. It gives you insight into
whether you can trust your tables based on what you
typically expect from your data.
Ask: Is the data in a normal range? Are there one too many
null values? Is it properly formatted and complete?
»» Pillar 5: Lineage: Lineage maps your data’s upstream sources
to downstream ingestor and paints an end-to-end, compre-
hensive picture of your data landscapes. Monitoring this helps
you determine where errors or outages have occurred.
Ask: How are data assets connected across your data stack,
upstream and downstream? Which dashboards are powered
by a given table? And at which points in the pipeline is the
data transformed?

Good lineage automatically aggregates metadata that addresses


governance, business, and technical guidelines associated with
specific tables and downstream reports.

Uncovering Key Components of Data


Observability Platforms
As you decide which data observability platform to choose, you
need to be aware of the key components that must be included.
The best data observability solutions include automated monitor-
ing, alerts, and notifications, automated root cause analysis tools,

12 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
collaboration functionalities, and end-to-end lineage across your
data environments down to the BI layer.

Unlocking automated data


quality monitoring
It’s critically important for your data team to rely on automated
monitoring to alert them to data issues before your customer
does. Your data observability platform must proactively and auto-
matically monitor data anomalies across every production table
and down to the most critical fields in your tables.

Receiving alerts and notifications


The value of a data observability platform is that it provides alerts
and notifications directly to existing communications channels,
such as Slack, Microsoft Teams, and JIRA, telling you that issues
need attention. Finding out about issues after your stakeholders
is never fun.

Automated root cause analysis


Without a data observability platform in place, identifying the root
cause of a problem is costly and time-consuming. Given the nature
of most data quality issues, root cause analysis without a single
interface to pinpoint root cause and impacts requires input from
multiple stakeholders and systems. We also highly suggest invest-
ing in a data observability platform distinct from ­warehouses/
lakehouses — while tempting, warehouses and lakehouses
are unable to truly centralize root cause analysis or lineage —
because they’re just one part of your larger data environment.

Supporting collaboration
A vital benefit of a best-in-class data observability platform is
that it facilitates increased collaboration among data engineers,
data analysts, and data scientists. A platform:

»» Provides transparency in data quality across every data


stakeholder.
»» Breaks down silos to ensure everyone is using the right data
and allow teams to organize data by domain ownership.
»» Helps teams better understand data health and collaborate
on improving data quality.

CHAPTER 2 Discovering the Five Pillars of Data Observability 13

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Achieving end-to-end visibility
Your platform should provide a single view into data health across
your entire ecosystem, including data lakes, warehouses, ETL,
business intelligence tools, and catalogs via rich, end-to-end
data lineage. Data lineage leverages metadata to aggregate logs
and other information about your data without accessing the data
itself.

WHAT TO WATCH OUT FOR


Your data provides a competitive advantage if you choose the right
data observability platform. To ensure quick time to value, your data
observability platform shouldn’t require manual work upfront to get
started. You shouldn’t have to:

• Make upfront investments to get started. Your platform


shouldn’t require you to modify data pipelines, write new code, or
use a special coding language. It should connect to your existing
stack quickly and seamlessly.
• Move your stored data. Your tools should monitor your data at
rest. For both security and compliance reasons, you should be
able to use the data where it is currently stored.
• Spend resources configuring and maintaining noisy rules.
Your platform should require minimal configuration and no
threshold setting. It should use machine learning models to learn
your environment and data.
• Conduct prior mapping. You shouldn’t have to map what needs
to be monitored and in what way before using your platform.
• Get a report that stops at ‘field X in table Y has values lower
than Z today.’ Your platform should provide rich context to ena-
ble rapid triage and troubleshooting, as well as the insights neces-
sary to improve data reliability over time, including SLAs for the
five pillars of data observability.

Data observability tools prevent data quality issues from happening in


the first place. They expose rich information about your data assets
so that you can proactively make changes when data downtime
occurs, and in the process, evade future breakages without having to
write new tests.

14 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Looking at the limits of traditional
approaches

»» Understanding tradeoffs in testing and


data quality monitoring

»» Using data observability to enhance


traditional data quality monitoring
methods

Chapter 3
Recognizing the Limits
of Testing and Data
Quality Monitoring

C
ompanies have relied on testing and data monitoring for
several years to manage data quality. While these tradi-
tional methods have their place, they often fall short as
organizations’ data landscapes become more complex and
dynamic. As data increases in volume and complexity, traditional
approaches just aren’t enough. You can no longer rely on investi-
gating a single point of failure.

In this chapter, we look at the strengths and weaknesses of testing


and data quality monitoring and see the value that data observ-
ability brings.

Navigating the Tradeoffs of Testing


At its most basic level, data testing is used to catch specific known
problems in your data pipelines. In the following sections, we look
at the strengths and weaknesses of testing.

CHAPTER 3 Recognizing the Limits of Testing and Data Quality Monitoring 15

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Advantages
Testing brings many benefits to an organization. These positives
include the following:

»» Identifying known problems: Testing helps you catch


known problems in your data pipelines; it warns you when
new data or code disrupts your original assumptions.
»» Being cost-effective: Testing is, for the most part, free. You
usually don’t incur a cost to write code, and open source
testing solutions are widely available.
»» Providing flexibility: You can write custom tests based on
your knowledge of the data and the specific requirements of
your use case.

Disadvantages
As important as testing is, it has several disadvantages when used
at scale. These include:

»» Lack of trust: Data testing alone doesn’t usually provide the


quality that your data consumers trust. With testing, data
engineers need to have enough specialized knowledge to
understand the data’s common characteristics, constraints,
and thresholds.
»» Lack of scalability: Applying the most common, basic
tests using programmatic solutions is inefficient and time-
consuming, Data tests are static and manual, and as a result,
unable to update when pipelines change or new data is
ingested.
»» Limited coverage: Because these tests are designed based
on known issues, they miss out on detecting unknown
problems. Data can break for a million different reasons,
and testing only covers the bare minimum.
»» Not efficient: Despite being free, data testing can be a
worse economic choice due to the inefficiencies and wasted
engineering time. Teams can’t afford to spend 30 percent or
more of their workweek firefighting data quality issues.

It’s not doable to add tests immediately to each new dataset. Even
if you could double your capacity to write tests, it might not be a
good use of your company’s time, resources, and budget.

16 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
CASE STUDY: CHECKOUT.COM
Challenge: Checkout.com, a growing fintech company, faced chal-
lenges maintaining data quality and observability as data volume
increased. Manual testing and monitoring became impractical, lead-
ing to visibility issues and delayed data resolution.

Approach: Checkout.com partnered with Monte Carlo to deploy a


data observability platform. It used dedicated domains, code-based
monitors, a central UI, ML-based anomaly detection, and integrations
with Datadog and PagerDuty. This approach automated monitoring,
improved visibility, accelerated incident resolution, and enabled scala-
ble operations.

Solution: Using Monte Carlo’s data observability solution, Checkout.


com achieved scalability, comprehensive data quality coverage, and
efficient issue identification and resolution.

Viewing the Tradeoffs of Data


Quality Monitoring
Data quality monitoring refers to automatically monitoring data
quality by querying the data directly and identifying anomalies
using machine learning algorithms.

Advantages
You can’t understate the importance of data quality monitoring.
It’s crucial for any organization that wants to leverage its data
efficiently and effectively. It provides:

»» Real-time alerts: Monitoring can provide real-time alerts


about potential issues, enabling faster reaction times.
»» Proactive issue detection: It can help identify issues before
they cause significant downstream effects.
»» Improved coverage: Monitoring can provide better
coverage against unknown unknowns. They can set thresh-
olds for what is anomalous, including acceptable levels of
NULL values that might shift over time.

CHAPTER 3 Recognizing the Limits of Testing and Data Quality Monitoring 17

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Unknown unknowns refer to data quality issues you had no way of
predicting. The bad news is that they don’t often make ­themselves
known until they’ve impacted downstream systems. An example
would be a code change that causes an API to stop collecting data
feeding an important new product.

Disadvantages
Although data quality monitoring is vital, its use has some short-
comings. These include:

»» Inability to identify key tables: If you narrowly apply


monitoring across a specific set of key tables, it may not be
as effective because it is challenging to identify the most
critical tables due to constantly shifting teams, platforms,
and consumption patterns.
»» Lack of context: Tables may lack context around previous
data downtime incidents, so teams may not be aware of the
impact of an incident on downstream consumers, making
prioritization challenging.
»» Resolution limitation: Despite identifying issues, data
quality monitoring doesn’t fix the problems. Teams still
need to allocate time and resources to conduct root cause
analysis, impact analysis, and incident mitigation, which
can be manual and time intensive.
»» False positives/negatives: Simple threshold-based alerts
can lead to many false positives (when there’s no actual
issue) or false negatives (missing real issues).
»» Limited scope: As data environments grow more complex
and interconnected, there’s a need for a solution that covers
all production tables and is fully integrated across your stack.
This may not be possible with data quality monitoring alone,
even those that leverage metadata monitoring.

How Data Observability Goes Beyond


Testing and Monitoring
Compared to testing and monitoring, data observability provides
a more comprehensive approach to managing and improving data
quality at scale.

18 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
As with software systems relying on DevOps best practices to
ensure highly reliable and performant software, data systems
require similar processes to maintain reliability at scale. Data
observability uses automated tracking, rulemaking, root cause
analysis, and impact analysis to catch and mitigate these unex-
pected problems.

DevOps aims to improve team collaboration, automate as much as


possible, quickly deliver quality software, and respond immedi-
ately to issues and changes. DataOps applies these same tenants,
and data observability is a critical piece of the DataOps workflow.

Consider some benefits a holistic data observability approach


offers over traditional methods. You get:

»» A centralized view of your data ecosystem. It exposes


rich lineage, schema, historical changes, freshness, volume,
users, queries, and more to gain a holistic understanding of
data health over time.
»» The ability to address data issues immediately.
Automatic monitoring, alerting, and root cause analysis
ensure you address data issues requiring minimal configura-
tion and no threshold-setting.
»» The ability to track upstream and downstream depen-
dencies (lineage). This allows you to monitor data flow from
your pull request in GitHub all the way down to the business
intelligence layer.
»» Custom and automatically generated rules that use
machine learning to identify abnormalities. Based on
historical behavior, you can set rules unique to your data in
addition to out-of-the-box rules for freshness, volume, and
schema.
»» Enhanced data reliability insights. Understanding what
data matters most to the business, how your data reliability
SLAs are tracking, and the health of your respective data
domains and data products with dashboarding provides
immediate, automated insights about your company’s
data quality.

CHAPTER 3 Recognizing the Limits of Testing and Data Quality Monitoring 19

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
CASE STUDY: MERCARI
Challenge: Mercari, an e-commerce marketplace, faced challenges
ensuring reliable data at scale, including managing data infrastruc-
ture, effective pipeline monitoring, and maintaining data trust.

Approach: To tackle these challenges, Mercari utilized Monte


Carlo’s monitoring platform and formed a dedicated Data Reliability
Engineering (DRE) team, set clear goals, empowered on-call engineers,
adjusted alerts, resolved data incidents, and conducted post-mortems
and pre-mortems.

Results: Mercari’s data observability efforts improved data trust,


resolved incidents more efficiently, enhanced pipeline monitoring,
optimized infrastructure, and increased preparedness for contingen-
cies. These outcomes positively impacted e-commerce operations,
reinforcing a commitment to reliable data.

20 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Establishing your data observability
strategy

»» Reviewing metrics that matter

»» Calculating data downtime

Chapter 4
Applying Best Practices
for Scalable Data
Observability

Y
ou need to establish a clear strategy when you start work-
ing with a data observability program. Your strategy pro-
vides you with a roadmap to a successful launch and helps
your data teams perform better and foster a higher level of data
trust with stakeholders.

This chapter introduces a five-step process that you can follow to


establish your strategy and a formula that you can use to calculate
data downtime.

Five Best Practices for Data Observability


In the following sections, we look at five steps you can take to
develop your data observability strategy. These steps include
setting up: (1) domains, (2) dedicated channels, (3) monitors
for freshness, volume, and schemas, (4) custom monitors, and
(5) tracking for SLAs, SLOs, and SLIs.

CHAPTER 4 Applying Best Practices for Scalable Data Observability 21

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Step 1: Set up domains
To start, it’s essential to set up domains for all your teams that
utilize different datasets. Domains provide a workspace for indi-
vidual teams and reduce white noise for those unaffected by inci-
dents impacting these datasets. Specific domains allow teams to
see only the data they are interested in.

Step 2: Set up dedicated channels


Next, you can set up channels where your team collaborates.
Establishing proper routing and alert groups is essential for any
incident management routine. (For example, Monte Carlo has
native integrations with Slack, Microsoft Teams, PagerDuty,
OpsGenie, JIRA, and email.)

You should receive an average of one to three daily alerts per


channel for the best results. If you have more than five per day,
you should consider filtering the alerts into different channels or
removing them altogether.

Step 3: Set up freshness, volume,


and schema monitors
As the third step of your strategy, you need monitors for your
essential production data. A strong data observability platform
will implement data freshness, volume, and schema monitors
automatically, out-of-the-box. These monitors harness rich
metadata to track issues as they arise, without needing to moni-
tor data directly, the cost of which can add up over time.

The purpose of implementing these monitors is to provide a


complete view of your data environment, understand its overall
health, and identify critical touchpoints within different teams.

We look at each monitor in turn:

»» Freshness monitors: Freshness monitors ensure that your


data team works with the most recent data assets, which is
critical for accurate analytics, product development, decision-
making, and operations.
»» Volume monitors: Volume monitors keep track of the
amount of data generated or processed within a specific
time period. Any significant or unexpected fluctuations in

22 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
data volume may indicate a potential issue with your source
system or pipeline. (An example would be a failure in data
ingestion or a surge in data generation.)
»» Schema monitors: Schema monitors observe changes in
the structure or format of the data. As data sources evolve,
there can be changes in the data schema. If these changes
are not adequately managed, they can lead to downstream
issues in the data pipelines, potentially causing data loss or
inaccuracies.

Step 4: Set up custom monitors


for data quality checks
At this point, you need to focus on layering custom monitors for
your company’s critical data assets. Determine which data tables
or fields are most crucial to your business and define the qual-
ity metrics you want to monitor. Then establish the thresholds
for each. With the thresholds and metrics in place, you’re alerted
to deviations, anomalies, and other incidents affecting that given
table or field.

Make sure to create tests to show that the monitors identify and
alert you to data quality issues. Also, establish a regular review
process to decide whether the metrics and data you monitor are
still relevant to the business.

Step 5: Track SLAs, SLOs, and


SLIs for important tables
Tracking Service Level Agreements (SLAs), Service Level Objec-
tives (SLOs), and Service Level Indicators (SLIs) for important
tables are a crucial part of data reliability and are step five in our
strategy.

»» Service Level Agreements (SLAs): This is a contract


between a customer and a service provider that specifies the
quality, availability, and responsibilities of the service that will
be provided.
»» Service Level Objectives (SLOs): This is a component of a
SLA that focuses on a particular metric, such as system
uptime or response time.

CHAPTER 4 Applying Best Practices for Scalable Data Observability 23

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Service Level Indicators (SLIs): This is a measure that
offers insight into the specific aspects of a system or service’s
performance. It serves as an indicator of the level of service
a customer is experiencing at any given time.

To get started with tracking, you should:

»» Define data reliability with SLAs: Clearly define what


reliable data means to your organization. You need to
understand what data you’re working with, how it’s used,
and who uses it.
You use SLAs to ensure data teams and stakeholders speak
the same language and care about the same metrics.
»» Choose the right SLIs: You need to decide on the key
metrics or SLIs that accurately reflect the reliability of
your data. These could be metrics such as data freshness,
accuracy, or completeness. For each important table,
define these SLIs and make sure they are quantifiable
and measurable.
»» Track data reliability with SLOs: Define your SLOs, and
the target values or ranges for each SLI. These should be
realistic, based on your historical data, and reflect real-world
circumstances. Set SLOs that are attainable and within the
thresholds defined by your SLAs.

CASE STUDY: SEATGEEK


Challenge: SeatGeek, an online ticketing marketplace, faced data
quality issues that impacted its daily operations. Despite having
data experts and advanced tools, it spent significant time resolving
data errors reported by users.

Approach: SeatGeek sought a solution to identify and address data


anomalies faster while reducing the overall number and types of
issues. It explored options to improve its capability to identify the
root causes of internal data failures.

Solution: SeatGeek implemented end-to-end data observability using


Monte Carlo’s ML-enabled anomaly detection and reporting. This
allowed SeatGeek to proactively identify anomalies before they
affected business users, even across third-party data sources.

24 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Using Data Quality Metrics
That Actually Matter
Consider measuring data downtime if you want to understand
which data quality metrics matter most. Monte Carlo calls data
downtime a data leader’s North Star KPI. This is because it
­provides crucial insights into several key areas:

»» Measures data reliability: Data downtime measures how


reliable your data is. If you have a lot of downtime, it may
suggest data availability, quality, or accuracy issues.
»» Directly impacts your business: Downtime directly impacts
your business operations. If data isn’t available when
needed, it delays decision-making, disrupts operations,
and can even result in lost opportunities or revenue.
»» Protects customer trust: Consistent data availability is key
to maintaining a strong reputation and customer trust. If
you’re providing customer data, frequent or extended data
downtime can damage your reputation and customer trust.
»» Surfaces data quality improvement opportunities:
Measuring downtime allows you to identify patterns or
recurring issues that lead to data unavailability, staleness, or
inaccuracy. These insights point you toward areas where you
can make improvements.
»» Tracks incident management: Data downtime is influ-
enced by how quickly data incidents are detected and
resolved (time-to-detection and time-to-resolution).
Therefore, keeping track of data downtime can provide
feedback on the effectiveness of your overall incident
management and data engineering workflows.

Figure 4-1 shows the formula for data downtime. It involves


measuring the number of data incidents, time-to-detection, and
time-to-resolution.

The formula is stated as follows: For each data incident, you add
up the time it took to detect the incident and the time it took to
resolve it and then multiply that sum by the total number of such
incidents over a given period of time.

CHAPTER 4 Applying Best Practices for Scalable Data Observability 25

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
FIGURE 4-1: The formula for data downtime.

Here’s how you define these terms:

»» Number of data incidents (N): This reflects the total


number of errors or anomalies across all data pipelines. It
can indicate areas where you need to dedicate more
resources to optimizing the data systems and processes.
»» Time-to-detection (TTD): This metric measures the median
time from when an incident is created until it’s been
identified either manually or via a data observability tool.
Faster response times can help mitigate the impact of data
incidents downstream and prevent serious implications.
»» Time to Resolution (TTR): This measures the median time
from when an incident is detected until the incident is fixed.
Efficient resolution times contribute to minimizing data
downtime.

As you get started with data observability, you need to be stra-


tegic, choose the right metrics, and try to improve continuously.
Data observability isn’t just about data reliability; it’s about turn-
ing data into reliable insights.

26 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5
Top Ten Reasons
to Invest in Data
Observability

D
ata observability allows your business to measure and
improve the overall reliability and quality of your company’s
data. For this reason, it has become one of the most powerful
technologies in modern data engineering. Consider the top ten rea-
sons you should invest in data observability. Data observability:

»» Improves the reliability of your data. The primary data


observability use case is to improve data quality to mitigate
data downtime. This is accomplished by reducing data
incidents and detecting and fixing them faster. The other
main use cases include preventing, detecting, and resolving
schema changes, data freshness issues, data volume issues,
and data distribution issues.
»» Mitigates the risk of code, data, and system failures.
Data observability platforms quickly identify and address
coding errors or changes that impact data quality. They
ensure data accuracy and timeliness, and detect system
failures, improving data integrity and reliability.
»» Saves time. Saving data engineers’ time is among the most
valuable use cases of data observability. It frees your team to
focus on what’s important and allows the entire organization
to perform at its best. Recent studies suggest that data

CHAPTER 5 Top Ten Reasons to Invest in Data Observability 27

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
observability can save two days per week of a data engineer’s
time. (https://www.montecarlodata.com/state-of-data-
quality/ and https://resources.montecarlodata.com/
resources/data-quality-survey-1?lx=2D5AuS)
»» Increases revenue. Higher quality data means a higher quality
product. A data observability platform ensures that you have
better data quality, which improves your marketing campaigns’
effectiveness, enables more innovation, and delights your
customers, among other positive outcomes for your business.
»» Avoids rising or additional data infrastructure costs. Data
observability platforms help you avoid unnecessary data
infrastructure charges by alerting your business if (according to
your contract) you’re close to exceeding your data storage and
compute limits. It can also help you avoid breaking data rules
and incurring non-compliance or regulatory fines, as well as
deleting stale or duplicate data that’s racking up storage costs.
»» Improves DataOps processes. DataOps is a method that
helps you handle and use data better and faster. It combines
different strategies from agile development, DevOps, and lean
manufacturing to provide high-quality data. Data observability
helps DataOps by letting teams see and fix issues in the data
process in real time. This makes the data more reliable and
accurate, which improves how teams manage and use data.
»» Improves data visibility and transparency. Data observ­
ability tools offer visibility and transparency by providing a
comprehensive view of your organization’s data health. This
allows data teams to showcase the overall quality of the data.
»» Builds trust with the business. Building data trust hinges
on proactive data quality management. Data teams can
quickly identify and fix issues by utilizing data observability
and reducing the time to detection and resolution of data
issues, thereby fostering trust among business stakeholders.
»» Accelerates self-service analytics. Data observability
solutions speed up self-service analytics by ensuring
high-quality data for users to discover, understand, and
trust. It also catches issues and anomalies before stakehold­
ers do, which improves data quality.
»» Scales artificial intelligence (AI) and machine learning
(ML) reliability. Utilizing a data observability platform, data
teams monitor the quality and reliability of the data used to
train generative AI and ML systems. This leads to improved
trust in models and their output.

28 Data Observability For Dummies, Monte Carlo Special Edition

These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.

You might also like