Professional Documents
Culture Documents
Dummies For Data Observability
Dummies For Data Observability
Dummies For Data Observability
by Stephanie Diamond
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Data Observability For Dummies®, Monte Carlo Special Edition
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2023 by John Wiley & Sons, Inc., Hoboken, New Jersey
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
the prior written permission of the Publisher. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, and related trade
dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in
the United States and other countries, and may not be used without written permission. Monte
Carlo and MC and related trade dress are registered trademarks of Monte Carlo Data, Inc. in the
United States, and may not be used without written permission. All other trademarks are the
property of their respective owners. John Wiley & Sons, Inc., is not associated with any product
or vendor mentioned in this book.
For general information on our other products and services, or how to create a custom For
Dummies book for your business or organization, please contact our Business Development
Department in the U.S. at 877-409-4177, contact info@dummies.biz, or visit www.wiley.com/go/
custompub. For information about licensing the For Dummies brand for products or services,
contact BrandedRights&Licenses@Wiley.com.
ISBN 978-1-394-21844-8 (pbk); 978-1-394-21845-5 (ebk)
Publisher’s Acknowledgments
We’re proud of this book and of the people who worked on it. Some of the
people who helped bring this book to market include the following:
Project Manager: Rebecca Senninger Client Account Manager:
Acquisitions Editor: Traci Martin Molly Daugherty
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Table of Contents
INTRODUCTION................................................................................................ 1
About This Book.................................................................................... 1
Icons Used in This Book........................................................................ 1
Beyond the Book................................................................................... 2
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Disadvantages................................................................................ 18
How Data Observability Goes Beyond Testing
and Monitoring.................................................................................... 18
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
A
s data becomes increasingly important to today’s busi-
nesses, ensuring its quality and reliability has never been
more critical. High-quality data is vital for any organiza-
tion’s success, from building new products, driving accurate
decision-making, and more recently, powering AI and ML use
cases.
The Tip icon highlights information that can make doing things
easier or faster.
Introduction 1
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
The Remember icon points out things you need to remember
when searching your memory bank.
The Warning icon alerts you to things that can harm you or your
company.
If you want resources beyond what this short book offers, visit
(montecarlodata.com/blog) to discover more about the follow-
ing topics:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Taking a modern approach to data
quality management
Chapter 1
Introducing Data
Observability
Y
ou’ve likely heard the phrase “Data is the new software.”
This statement illustrates data’s great importance to most
businesses in today’s economy. Like software applications,
data is a strategic resource, but if the data is incomplete, inaccu-
rate, or unreliable, it can have detrimental impacts on the entire
company.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Do you think quality won’t impact your company? Think again.
According to Gartner, through 2025, 80 percent of organiza-
tions seeking to scale digital business will fail because they
didn’t take a modern approach to data and analytics governance
(https://blogs.gartner.com/andrew_white/2021/01/12/
our-top-data-and-analytics-predicts-for-2021).
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Exploring Why Data Observability
Matters Now
One of the main reasons data observability matters more now
than ever before is the impact of data downtime. Monte Carlo
has found that data downtime tops the list of pain points for
the many worldwide data leaders they work with (https://www.
montecarlodata.com/blog-what-is-data-observability/).
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
As the time to detect issues increases, you see a severe impact on
your business, which includes:
These are just two examples of how unreliable data can have dire
consequences for your organization. It makes investing in a data
observability platform a way to future-proof your organization.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Exploring the Business Case for
Data Observability
When you think about data observability, you probably
think about how it helps you avoid the consequences of data
downtime — wasted resources, customer churn, and reputational
risk, to name a few. However, there are many more benefits
derived from using data observability.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
BEST-IN-CLASS DATA
OBSERVABILITY
A best-in-class data observability framework provides the following
tactical benefits, including:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
want your data engineers to be innovating and building new data
products to drive growth, not fixing ad hoc data quality problems.
However, one crucial issue can break down the entire process.
Your business users need to be able to trust the data they are
using. Your self-service model will fail if they don’t trust their
dashboards and reports, or the data fueling their most critical
data products and services is erroneous. That’s why it’s critically
important that you have reliable data. The best way to do that is
to invest in data observability.
To support this shift, data teams are moving from the CFO’s office
to the Engineering organization, where they’re held accountable
to supporting or embedding within functional teams across the
company. Even titles and roles are changing to match this new
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
reality, with Heads of Data and Analytics becoming Heads of Data,
ML, and AI, and data engineering teams newly responsible for
supporting both data and AI use cases.
To meet this need, we need to look beyond just the data to achieve
data reliability. Unreliable data doesn’t live in a silo. It’s impacted
by all three ingredients of the data ecosystem: data, code, and
infrastructure.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Introducing the five pillars of data
observability
Chapter 2
Discovering the
Five Pillars of Data
Observability
H
igh-quality data holds the power to drive strategies and
shape outcomes. It provides a roadmap to driving business
growth and enhanced decision-making.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Ask: When was the last time this table was updated? How
up-to-date is your data? Is this table updating at the right
time each day?
»» Pillar 2: Volume: Volume gives you an insight into the
completeness of your data tables and the health of your
data sources.
Ask: Do you have too many or too few rows? If there is a
significant change, will you know? Significant changes in
volume should be noted and investigated.
»» Pillar 3: Schema: Schema refers to the organization of your
data. In fact, schema changes are the number one cause of data
downtime issues, and major changes can indicate broken data.
Ask: Has the organization of the data changed? Who makes
changes to data schema and when.
»» Pillar 4: Quality: The Quality pillar tells you if your collected
data falls within an expected range. It gives you insight into
whether you can trust your tables based on what you
typically expect from your data.
Ask: Is the data in a normal range? Are there one too many
null values? Is it properly formatted and complete?
»» Pillar 5: Lineage: Lineage maps your data’s upstream sources
to downstream ingestor and paints an end-to-end, compre-
hensive picture of your data landscapes. Monitoring this helps
you determine where errors or outages have occurred.
Ask: How are data assets connected across your data stack,
upstream and downstream? Which dashboards are powered
by a given table? And at which points in the pipeline is the
data transformed?
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
collaboration functionalities, and end-to-end lineage across your
data environments down to the BI layer.
Supporting collaboration
A vital benefit of a best-in-class data observability platform is
that it facilitates increased collaboration among data engineers,
data analysts, and data scientists. A platform:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Achieving end-to-end visibility
Your platform should provide a single view into data health across
your entire ecosystem, including data lakes, warehouses, ETL,
business intelligence tools, and catalogs via rich, end-to-end
data lineage. Data lineage leverages metadata to aggregate logs
and other information about your data without accessing the data
itself.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Looking at the limits of traditional
approaches
Chapter 3
Recognizing the Limits
of Testing and Data
Quality Monitoring
C
ompanies have relied on testing and data monitoring for
several years to manage data quality. While these tradi-
tional methods have their place, they often fall short as
organizations’ data landscapes become more complex and
dynamic. As data increases in volume and complexity, traditional
approaches just aren’t enough. You can no longer rely on investi-
gating a single point of failure.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Advantages
Testing brings many benefits to an organization. These positives
include the following:
Disadvantages
As important as testing is, it has several disadvantages when used
at scale. These include:
It’s not doable to add tests immediately to each new dataset. Even
if you could double your capacity to write tests, it might not be a
good use of your company’s time, resources, and budget.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
CASE STUDY: CHECKOUT.COM
Challenge: Checkout.com, a growing fintech company, faced chal-
lenges maintaining data quality and observability as data volume
increased. Manual testing and monitoring became impractical, lead-
ing to visibility issues and delayed data resolution.
Advantages
You can’t understate the importance of data quality monitoring.
It’s crucial for any organization that wants to leverage its data
efficiently and effectively. It provides:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Unknown unknowns refer to data quality issues you had no way of
predicting. The bad news is that they don’t often make themselves
known until they’ve impacted downstream systems. An example
would be a code change that causes an API to stop collecting data
feeding an important new product.
Disadvantages
Although data quality monitoring is vital, its use has some short-
comings. These include:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
As with software systems relying on DevOps best practices to
ensure highly reliable and performant software, data systems
require similar processes to maintain reliability at scale. Data
observability uses automated tracking, rulemaking, root cause
analysis, and impact analysis to catch and mitigate these unex-
pected problems.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
CASE STUDY: MERCARI
Challenge: Mercari, an e-commerce marketplace, faced challenges
ensuring reliable data at scale, including managing data infrastruc-
ture, effective pipeline monitoring, and maintaining data trust.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Establishing your data observability
strategy
Chapter 4
Applying Best Practices
for Scalable Data
Observability
Y
ou need to establish a clear strategy when you start work-
ing with a data observability program. Your strategy pro-
vides you with a roadmap to a successful launch and helps
your data teams perform better and foster a higher level of data
trust with stakeholders.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Step 1: Set up domains
To start, it’s essential to set up domains for all your teams that
utilize different datasets. Domains provide a workspace for indi-
vidual teams and reduce white noise for those unaffected by inci-
dents impacting these datasets. Specific domains allow teams to
see only the data they are interested in.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
data volume may indicate a potential issue with your source
system or pipeline. (An example would be a failure in data
ingestion or a surge in data generation.)
»» Schema monitors: Schema monitors observe changes in
the structure or format of the data. As data sources evolve,
there can be changes in the data schema. If these changes
are not adequately managed, they can lead to downstream
issues in the data pipelines, potentially causing data loss or
inaccuracies.
Make sure to create tests to show that the monitors identify and
alert you to data quality issues. Also, establish a regular review
process to decide whether the metrics and data you monitor are
still relevant to the business.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Service Level Indicators (SLIs): This is a measure that
offers insight into the specific aspects of a system or service’s
performance. It serves as an indicator of the level of service
a customer is experiencing at any given time.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Using Data Quality Metrics
That Actually Matter
Consider measuring data downtime if you want to understand
which data quality metrics matter most. Monte Carlo calls data
downtime a data leader’s North Star KPI. This is because it
provides crucial insights into several key areas:
The formula is stated as follows: For each data incident, you add
up the time it took to detect the incident and the time it took to
resolve it and then multiply that sum by the total number of such
incidents over a given period of time.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
FIGURE 4-1: The formula for data downtime.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5
Top Ten Reasons
to Invest in Data
Observability
D
ata observability allows your business to measure and
improve the overall reliability and quality of your company’s
data. For this reason, it has become one of the most powerful
technologies in modern data engineering. Consider the top ten rea-
sons you should invest in data observability. Data observability:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
observability can save two days per week of a data engineer’s
time. (https://www.montecarlodata.com/state-of-data-
quality/ and https://resources.montecarlodata.com/
resources/data-quality-survey-1?lx=2D5AuS)
»» Increases revenue. Higher quality data means a higher quality
product. A data observability platform ensures that you have
better data quality, which improves your marketing campaigns’
effectiveness, enables more innovation, and delights your
customers, among other positive outcomes for your business.
»» Avoids rising or additional data infrastructure costs. Data
observability platforms help you avoid unnecessary data
infrastructure charges by alerting your business if (according to
your contract) you’re close to exceeding your data storage and
compute limits. It can also help you avoid breaking data rules
and incurring non-compliance or regulatory fines, as well as
deleting stale or duplicate data that’s racking up storage costs.
»» Improves DataOps processes. DataOps is a method that
helps you handle and use data better and faster. It combines
different strategies from agile development, DevOps, and lean
manufacturing to provide high-quality data. Data observability
helps DataOps by letting teams see and fix issues in the data
process in real time. This makes the data more reliable and
accurate, which improves how teams manage and use data.
»» Improves data visibility and transparency. Data observ
ability tools offer visibility and transparency by providing a
comprehensive view of your organization’s data health. This
allows data teams to showcase the overall quality of the data.
»» Builds trust with the business. Building data trust hinges
on proactive data quality management. Data teams can
quickly identify and fix issues by utilizing data observability
and reducing the time to detection and resolution of data
issues, thereby fostering trust among business stakeholders.
»» Accelerates self-service analytics. Data observability
solutions speed up self-service analytics by ensuring
high-quality data for users to discover, understand, and
trust. It also catches issues and anomalies before stakehold
ers do, which improves data quality.
»» Scales artificial intelligence (AI) and machine learning
(ML) reliability. Utilizing a data observability platform, data
teams monitor the quality and reliability of the data used to
train generative AI and ML systems. This leads to improved
trust in models and their output.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.