The Anatomy of A Data Product Data Products Are Building Blocks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

23.08.2023, 13:26 The Anatomy of a Data Product.

my of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

This member-only story is on us. Upgrade to access all of Medium.

Member-only story

The Anatomy of a Data Product


Eric Broda · Follow
Published in Towards Data Science
11 min read · Aug 19, 2022

Listen Share More

Data Products are the foundational building block of an enterprise Data Mesh. But what
exactly is a Data Product, how do they work, how can they be identified, and how can they
be built quickly?

Photo by Vitor Santos on Unsplash

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 1/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Making Data Easy to Find, Share, Consume, and Govern


I think of Data Products by the value they provide to an organization: Data Products (and
the Data Mesh within which they operate) make data easy to find, consume, share, and
govern. And to deliver this value, our job as a practitioner is to make Data Products easy to
build, deploy, secure, and manage.

In this article I will answer two key questions:

How are data products designed, and how to they work such that they make data easy
to find, consume, share, and govern?

What capabilities, APIs, and lifecycle needs to be established to make Data Products
easy to build, deploy, secure, and manage?

Simply put, if you can answer these questions, then, first, you will be able to explain why
Data Products are foundational to your Data Mesh journey, and second, you will
understand the capabilities necessary to accelerate the adoption and buildout of Data
Products in your enterprise Data Mesh.

Before you start, this article assumes that you have a high-level understanding of Data
Mesh. If you need some background information on Data Mesh, there are a number of
great articles are available here (patterns), here (architecture), here (principles) and here
(lessons learned). For interested readers, a full set of Data Mesh patterns are available here
and here.

Data Product = Data Domains + Product Thinking


In her fantastic book, Data Mesh, Delivering Data-Driven Value at Scale, Zhamak Dehghani
says that Data Products are the “architecture quantum” in a Data Mesh. They are “the
smallest unit of architect that can be independently deployed and managed.” She goes on
to say that Data Products are “discoverable, understandable, trustworthy, addressable,
interoperable, and composable, secure, natively accessible, and valuable on its own”.

I would offer a complementary definition: A data product is the combination of a “data


domain” and “product thinking”.

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 2/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Figure 1, Data Products = Data Domains + Product Thinking

Let’s unpack this a bit starting with “product thinking”. I like some insights found in a
recent article in Harvard Business Review: First, a product marshals an organization’s
production capabilities to “deliver and capture value”. Second, there is an “end customer
who purchases and uses that product”. Lastly, a Product has an owner and team that
creates a long-term plan to ensure that “products can be continuously improved to make
them more successful” delivered by a group that focusses on “outcomes instead of
outputs”.

To paraphrase, product thinking means that ensuring your product meets a specific
business need and delivers some tangible value, has a long-term time horizon, and has a
clear and empowered owner that acts in not only the enterprise’s but also the customer’s
interest.

Unfortunately, defining “data domain” is not as simple since this term tends to be quite
ambiguous in large enterprises. For the Chief Data Officer, governance, regulation, and
privacy are a central concern leading to coarsely grained domains: All customers instead
of current customers, or Canadian customers, for example.

Similarly, the data architect may consider customers to be a subset of the “party” domain
which includes, current clients as well prospects. And the application developer may view
customers as unique identifier linking a customer’s accounts and transactions.

For the purposes of this article, I define a data domain as a set of identifiable, real, related
data that is managed consistently, and which has some measure of quality and accuracy.

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 3/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

So, now let’s combine these ideas and create a practical definition of a Data Products. A
Data Product has/is:

Clear boundaries, to establish an identifiable set of related data

An empowered owner, to provide the organizational resources and decision making


needed to make data valuable and trustworthy, and provide a long-term view of the
product’s evolution

Part of an ecosystem of consumers and producers, that demands data interoperability,


consistency, and quality to deliver value to the enterprise

Enabled by a platform, that makes data discoverable, addressable, accessible, and


interoperable

Published metadata, that enables discovery and self-serve while making data
understandable

Federated governance, recognizes the power of local autonomy to implement


enterprise policies and make data secure

Architecture of a Data Product


With this definition in hand, let’s explore the architecture of a data product.

Figure 2, Architecture of a Data Product

A Data Product architect has components that make it:

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 4/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Interoperable: Interoperable interfaces — queries as well as APIs, pipelines, files, and


events — are available to address consumption and ingestion needs. Additional
interfaces, typically implemented as APIs, are also available to observe, operate,
secure, and manage the data product. Each interface has a contract (for example, APIs
use OpenAPI specifications) that formalize interactions.

Bounded: Data products store any type of data that has a clearly defined boundary and
owner; While analytic data is a primary use case, both operational and engagement
data can also be managed with in a data product.
Open in app

Self-Aware: Automatically capturing changes and information about itself; All data
product changes can be captured and distributed as “events” within the data product,
to other data products, or to interested parties across the enterprise.

Discoverable: Each data product contains its own “Registry” that publishes its data
product metadata, ownership information, policies, and any additional enabling
behaviours; The data product registry is the “one-stop-shop” for developers, data
scientists, and data analysts to find, consume, share, and govern data managed by a
specific data product. It also is the entry point to behaviours specific to that data
product enabling sophisticated interactions allowing users to request access to data, or
“owners” to create new data products.

Secure: Data products ensure that all data is secure both at-rest and in-motion; Our
objective is to ensure that all data products operate in a “Zero-Trust”
container/environment.

Historical and Temporal: Changes to data state or exceptions using the data product
are captured and managed in an immutable log to support a federated governance,
diagnosis of security issues, and (when data state changes are aggregated) provide data
lineage.

Shareable: A Data product has “ports” that allow data managed by the data product to
be ingested or consumed. Information and events (for example, a data change or an
API call) can be communicated using bulk pipelines or in near real-time inside the
data product domain, between data products, as well as across the organization using a
robust, reliable, and resilient backbone.

Data Product Interoperable Interfaces: The Core Data Product Enabler


When we think of interoperable interfaces, there are two that are top-of-mind: the
interfaces (pipelines, APIs etc) that ingest data into a data product, and the queries used to
consume data in a data product.

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 5/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

However, in large enterprises interoperable interfaces have several expectations (in some
cases mandatory requirements):

Formal Contracts: Each ingestion or consumption path — either pipelines, queries, or


APIs, or events — should be defined by a formal and published contract. In some cases,
the contracts will be specific to the tool used (DBT, etc) but in other cases — APIs or
events — formal specifications such as OpenAPI and JSON Schemas, respectively, are
common.

Formal Versioning: Contracts should be versioned thereby allowing backward


compatibility. Now, in fairness, smaller environments where data is shared with few
participants this may not be important. However, in larger enterprises where data is
shared widely it is crucial to ensure that downstream systems do not choke when
upstream systems change data formats.

Formal Security: This is tricky — each tool may offer a different security approach, and
worse, some may not have a robust nor complete security model. Still, this does not
negate the need for securing your producer and consumer interfaces — rather, it just
makes it harder to do.

Figure 3, Data Product Interfaces

While the producer and consumer interfaces are important, we should not overlook the
crucial nature of interfaces that enable discovery, observability, and manageability. In fact,
most of these interfaces are implemented as APIs which means that you can take
advantage of the capabilities offered by OpenAPI specifications:

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 6/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Formal Contracts: OpenAPI and AsyncAPI specifications offer well documented,


battle-tested specification that acts as a formal synch / asynch contracts for use within
enterprises.

Formal Versioning: OpenAPI specifications permit a flexible method of versioning


APIs to gracefully manage contract changes over time.

Formal Security: OpenAPI specifications provide a robust, well understood, and well
documented approach to defining security schemas that define to “scopes” which map
directly to roles; with a little bit of due diligence, these scopes can be implemented
using OAUTH2 (a common security approach) and connected to an enterprise’s
identity book of record.

Data Product Value Chain


A Data Product’s value increases proportionately to its use in the enterprise. At its earliest
stage, a Data Product (in a Data Mesh) is discoverable, addressable, interoperable, self-
describing, trustworthy, and secure. According to the originator of Data Mesh, Zhamak
Dehghani, these are the basic characteristics of a data product and constitute the building
blocks of all further successive value offered by the data product.

With these basic attributes in place, a data product can begin to be used in the enterprise.
And if designed well, then the data product can now make data easy to find, consume,
share, and govern. And as data is more easily and frequently consumed and shared,
newfound agility and speed result. And with this agility and speed come true business
value:

Faster and better insights, that are key to creating an outstanding customers
experience or quickly addressing changing market needs.

Improved time-to-market, especially for end consumer products heavily reliant upon
data.

Lower Delivery Costs, as speed and agility shorten delivery durations.

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 7/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Figure 4, Data Product Value Chain

But how can data products be delivered quickly, consistently, and securely? That is where
the “Data Product Factory” comes in.

A Data Product Factory establishes repeatable steps to make data products:

Easy to build, by providing templates that simplify the building of a Data Product;
these templates generate microservices/APIs with built-in discoverability (the
“/discover” endpoint) and observability (“/observe”, “/usage”, “/logs”, and “/alerts”
endpoints.

Easy to secure, by providing extensions to the aforementioned templates to enable


OAUTH2 based security for each of API/microservice; and with a bit more due
diligence, these templates can also target the generation a “zero-trust” run-time
environment for our data product APIs/microservices and its data.

Easy to deploy, by providing extensions to the aforementioned templates to generate


the APIs/microservices (and if needed, data) into a “container” (for example, Docker)
or a Kubernetes Pod making it easy to deploy Data Products either to on-premises or
cloud environments and relatively easily included in a DevSecOps pipeline.

Easy to manage, by hooking the generated APIs/microservices (“/logs” and “/alerts”


endpoints, for example) into enterprise management and monitoring tools.

Data Product Identification


Now we can see why Data Products are so important in our Data Mesh journey. And we
have also seen how we can accelerate the delivery of Data Products using our Factory.
https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 8/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Figure 5, Data Product Identification

So, clearly Data Products make a lot of sense! But how do we identify them? Fortunately,
there are a lot of hints that help us find Data Products in an enterprise:

Conway’s Law: Applying Conway’s Law (to paraphrase, your systems and data will
follow your organization structure) to Data Product means that ownership migrates to
groups aligned closely to organizational units (lines of business, etc), that have deep
knowledge of the data as well as direct accountability for delivering results with, and
hence decision and funding rights, for the data.

CDO Data Domains: A data domain map (enterprise or group) identifies business
entities that are of significant value to the enterprise. These entities provide “hints”
that may identify data product candidates. However, note that in may cases enterprise
domains may need to be sub-divided into finer grained domains to map to Data
Products.

Business Architecture: A business architecture (enterprise or group) identifies


important business capabilities. These capabilities usually translate quite easily to data
domains which, like CDO domains, provide “hints” that may identify data product
candidates. Once again, enterprise domains may need to be sub-divided into finer
grained domains to map to Data Products.

Industry / Commercial Models: Commercial models, constructed based upon many


decades of experience, identify core entities in particular industries. In Financial
Services, there are several established commercial models including those from BIAN
and Teradata (FSLDM, Financial Services Logical Data Model).
https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 9/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

But there is one lessons learned that I would be remiss not to share: Granularity matters!
So-called enterprise data domains — for example: “enterprise client” — are too coarsely
grained to suit a data product making it quite difficult to define data boundaries and
owners. Rather finer granularity data boundaries map much better to “owners” and hence,
to data products (“commercial lending clients in UK”).

Data Mesh: An Ecosystem of Data Products


No data product stands alone. Rather all data products are part of, and operate in, an
ecosystem. We call this ecosystem a “Data Mesh”.

Figure 6, Data Mesh: An Ecosystem of Data Products

With this simple observation, we can now delegate several simple yet specific
responsibilities to the Enterprise Data Mesh.

Chief Prognosticator of Data Mesh concept: The Data Mesh first and foremost a
concept — a marketing message, an executive imperative, a demarcation for an
enterprise data journey — whose primary purpose is to describe and communicate the
organizational construct and logical architecture abstraction that that binds data
products into an ecosystem.

The Agent of Data Product Discoverability: Data Mesh is the owner of the “Enterprise
Data Product Registry”, that makes Data Products easy to find, consume, share, and
govern.

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 10/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Keeper of Data Product Protocols: Data Mesh establishes the protocols by which data
can be shared both inside a data product, between data products, and with the broader
organization. As a result, becomes a key consumer and/or stakeholder in an
enterprise’s common communications, pipeline, and/or event streaming backbone.

Concluding Thoughts
In this article I discussed how data products work such that they make data easy to find,
consume, share, and govern. And I also introduced the “Data Product Factory” that makes
Data Products easy to build, deploy, secure, and manage.

I am hopeful that with these insights from this article that, first, you will be able to explain
why Data Products are foundational to your Data Mesh journey; And second, you will
understand the capabilities necessary to accelerate the adoption and buildout of Data
Products in your enterprise Data Mesh.

***

All images in this document except where otherwise noted have been created by Eric Broda (the
author of this article). All icons used in the images are stock PowerPoint icons and/or are free
from copyrights.

The opinions expressed in this article are mine alone and do not necessarily reflect the views of my
clients.

Data Mesh Data Product Data Science Machine Learning AI

Follow

Written by Eric Broda


1.7K Followers · Writer for Towards Data Science

I write at the intersection of Data Mesh, Data-as-a-Product, APIs, Event Management, and the digital ecosystem.

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 11/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

More from Eric Broda and Towards Data Science

Eric Broda in Towards Data Science

Towards A Practical Data Mesh Roadmap


The journey to an enterprise Data Mesh can be challenging. It can be made a bit easier and quicker by
using a practical data mesh roadmap.

· 14 min read · Nov 17, 2022

251 2

130 ML Tricks And Resources Curated Carefully From 3 Years (Plus Free eBook)
Bex T. in Towards Data Science

130 ML Tricks And Resources Curated Carefully From 3 Years (Plus Free
eBook)
Each one is worth your time

· 48 min read · Aug 1

2.9K 10

Fine-Tune Your Own Llama 2 Model in a Colab Notebook


https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 12/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Maxime Labonne in Towards Data Science

Fine-Tune Your Own Llama 2 Model in a Colab Notebook


A practical introduction to LLM fine-tuning

· 12 min read · Jul 25

1.7K 31

An Operating Model for Data Products


Eric Broda in Towards Data Science

An Operating Model for Data Products


Your operating model will dictate the success of your data mesh journey. Here is what you need to
know about a data mesh operating model.

· 12 min read · Oct 20, 2022

154 1

See all from Eric Broda

See all from Towards Data Science

Recommended from Medium

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 13/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Analytics at Meta

The future of the data engineer — Part I


Introduction

10 min read · Apr 3

632 9

The Past, Present, and Future of Data Architecture


Diogo Silva Santos

The Past, Present, and Future of Data Architecture


A journey through time and the introduction to data mesh

9 min read · Mar 8

459 14

Lists

Predictive Modeling w/ Python


20 stories · 296 saves

Principal Ti Practical Guides to Machine Learning


ComponentSeriel 10 stories · 312 saves
Analysis for
Anac
https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 14/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Image The New Chatbots: ChatGPT, Bard, and Beyond


by
vectorjuice 13 stories · 89 saves

Natural Language Processing


536 stories · 154 saves

Maggie Hays in Towards Data Science

Data Governance, but Make It a Team Sport


There’s no I in Governance 🙃
6 min read · Jan 21

199 1

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 15/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

Maria Beckles

Demystifying Data Governance


Unlocking the Power of Your Data

3 min read · May 10

12

Jake Holmquist in Google Cloud - Community

Exploring the new Data Quality Dashboard in Google Discovery AI

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 16/17
23.08.2023, 13:26 The Anatomy of a Data Product. Data Products are building blocks of… | by Eric Broda | Towards Data Science

And how to use Generative AI in BigQuery to Generate Product Catalog Descriptions and Improve
Data Quality

12 min read · Aug 13

18 1

Caspar Mahoney in Product Coalition

Moving From a Project to a Product Mindset


What mindsets do you need to bring to the Product game? and how do these differ from the Project
mindset?

· 8 min read · Jan 11

587 15

See more recommendations

https://towardsdatascience.com/the-anatomy-of-a-data-product-d3140f068311 17/17

You might also like