Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

www.hcltech.

com

Data Mesh: A Business


Oriented Framework for
Quicker Insights

WHITEPAPER
TABLE OF CONTENTS
About Centralization of Data Platforms and their evolution 3

Organizational Setup –
Major Organizations with Central Data Platform Setups 4

There are challenges… and opportunities 6

What is a Data Mesh? 7

Organizational Setup – Data Mesh Scenario 9

Reference Architecture – Cloud Data Mesh 10

Data Mesh in Manufacturing Enterprise 12

Conclusion 14

Author Info 15

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 2


About Centralization of Data
Platforms and their evolution
The last four decades have seen an evolution of data platforms. It started off with building
enterprise data warehouses and data marts, to data lakes. Data lakes also evolved from being
implemented on-premises, to implementation over the cloud.

There are subtle differences and many similarities in the way these platforms have evolved. ETL
(Extract Transformation and Load) came into being with the advent of data warehousing, while
ELT (Extract Load Transform) became popular with MPPs (Massive Parallel Processing) and
data lakes. Data warehouse, in conjunction with data marts, aims at supporting data analysts
for operational reporting and data mining needs. Whereas, data lakes are the platforms for data
scientists who aim to discover patterns in the data and put it to use. There are differences in
terms of accessing the data as well, like in data warehouse, a SQL (Structured Query Language)
interface is standard. Whereas in data lakes, raw files, and APIs (Application Programming
Interfaces) are the preferred choices.

Among these differences, there lies a similarity, that is, having it all built around a central data store,
with a common development paradigm – Data extraction from many sources; transformation
and loading to the central data store. The sequence of transformation and loading may differ
with data warehouse and data lake implementations.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 3


Organizational Setup – Major
Organizations with Central
Data Platform Setups
As organizations have grown over time, they have been majorly divided into two parts. One,
which is focusing on the core business components or business areas, and reporting into the
COO (Chief Operating Officer). The second is the infrastructure services and technology services
reporting into the CIO (Chief Information Officer) organization. It has been seen that there is
usually a dedicated data services group in a majority of these organizations. The data services
group is structured as a one-to-one map with the business groups as shown in Figure 1.

Each of these groups under Data services focuses on their respective business area. They
further have a hierarchy under a business relationship manager and an engineering manager. A
Business Relationship Manager is a liaison between business users and the development team.
The Business Relationship Manager is supported by a team of business analysts who may be
tagged with one or many business users. In terms of the engineering team, the technical leads,
and the development team report into the engineering manager, which works in parallel with
the BRM (Business Relationship Manager).

Figure 1: Organizational Setup – Major Organizations with Central Data Platform

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 4


The problem with this structure is that the teams are siloed i.e., the domain’s operational systems
team is a cross functional team. Also, the team of data scientists and business analysts, are also
cross functional. In between, there is a team of data platform engineers who are specialized
in technology, which work in the absence of a business domain perspective. Moreover, there
is a separation of development and business team, which leads to complexity, and in the end,
slows down the activities. It is also difficult to find information about data. These groups seem
to appear to work in tandem with each other, but they have quite different views of the system.
There is a lack of uniformity between these views. Also, whenever a critical issue is met, it is
exceedingly difficult to find and correct those issues leading to an increase in the resolution
time.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 5


There are challenges… and
opportunities
Despite the organizational challenges, the centralized data platforms have served well. However,
on the technical side, there is always a challenge to support, standardize, and promote agile
development, migration, and optimization of these platforms.

During the implementation of centralized data platforms, the focus is majorly on data
management pipeline and type of storage. There is always a challenge of decision making on an
efficient data strategy i.e., something that minimizes data movement and copying.

With the centralized development and management of these platforms, there is a lack of focus
on domain driven design and product thinking. It is a challenge as the emphasis is more on
cloud adoption, data science, and data lakes, and other latest trends, which leads to a lack of
focus on creating self-sufficient product thinking among teams.

Given these challenges, there is also an opportunity to leap towards another evolution - Moving
away from a central data platform to an arrangement that allows for distributed ownership i.e.,
Data Mesh. In a Data Mesh setup, the business case decides the location of the data, rather
than technology options. Moreover, it enables stitching of the datasets together rather than
duplication. It brings focus on distributed domain-driven teams to focus on creating data sets
rich for consumption.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 6


What is a Data Mesh?
The term Data Mesh was first coined in 2019, as an architectural & organizational paradigm. It
challenged the assumption that data needs to be centralized for analytical use. It claims that
innovation ownership must be federated among data domain owners.

The data domain owners handle supplying their data as products. The creation of these data
products is supported by a self-serve data platform to abstract technical complexity in serving
data products. It requires the adoption of federated governance through automation to enable
the interoperability of domain-oriented data products.

There are four major constituents of a Data Mesh. They are,

• Domain-Driven Distributed Data Architecture: It is an approach for developing the


structure and language of the system to match the business domain. Each domain deals
with one specific aspect of the business. Domains can be segregated based on their
placement. For e.g. The domain can align with the source, where they will be dealing with
facts and reality of business. The data is usually final, time series or historical. Changes to
such data are less frequent and are permanently captured. Whereas a domain which is
aligned with consumption focuses on making data fit for consumption. The data is majorly
a collection or a projection. The change to such data is frequent, and it can be recreated
from source whenever needed. Each of the domain may be applying the concepts of
acquisition, transformation and loading as needed, and sharing data with other domains in
the enterprise. In summary, domains are considered the first level of partition, while data
pipelines come second.

• Domain data set as a product: Datasets are produced by these domains and hence are
termed as data products. Within a domain, various pipelines could be generating these
data sets. The pipelines can be polyglot, that is, they could be dealing with different
formats and modes like streams or batch. The products that these pipelines generate are
discoverable, i.e., there is metadata that can help consumers find the datasets which are
being created. Data products must be addressable, which means that they can be uniquely
identified within the enterprise. Data products are trustworthy, i.e., they are defined and
checked for certain Service Level Objectives (SLOs). Data products are self-describable,
which relates to information available (metadata), which the consumers can refer to before
making use of the data set. Data products are interoperable, i.e., they are governed by
global standards which make them easier for use with other domains in the enterprise.
Data products are secured i.e., they are governed by global access and not federated
access.

• Data Infrastructure as a platform: It supplies the tools, processing frameworks, and storage
solutions to the various domain owners, so that they can apply and create the various
data pipelines and data products using them. Data infrastructure as a platform is scalable,
with polyglot storage on demand. The platform enables encryption for data at rest and in
motion, as well as supply a unified data access control across all storage systems. It enables
the data product to be discoverable, along with metrics collection and sharing with various
stakeholders. The platform supplies self-service tools or templates and is domain agnostic.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 7


Figure 2: Data Mesh Components

• Ecosystem Governance: It enables interoperability and discoverability. Interoperability is


important for the domains to work together. Discoverability allows the domains to share the
information with each other about the datasets and data products for them to be used in
a suitable manner, and to ensure that it is used for the right purpose. Federated and global
ecosystem governance act as an umbrella on top of the data mesh, thereby supplying
automated federated identity and automated policy enforcement, encryption, access
control, etc.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 8


Organizational Setup – Data
Mesh Scenario
As domain become the first level of partition, the change to the organization structure is very
noticeable. The core business groups are more capable in defining the domains as they are near
to the action. Domain teams under these business groups deals with the data products and the
data sets.

Figure 3: Organizational Setup – Data Mesh Scenario

As shown in figure 3, the organization under the CIO (Chief Information Officer) has shrunk. The
Business Relationship Managers (BRM) have been tasked to perform the role of product owners.
The business analysts, which were earlier part of the CIO organization, have been moved to
work along with the product owners within the domain. Also, the engineering manager’s role is
majorly diminished or merged with the product owner. Erstwhile technical team leads and the
development team under the engineering manager, are now made a part of the business group
itself.

It means that within a domain, product owners now have direct control over both the cross
functional as well as the specialized teams, thereby reducing the hierarchical complexity and
the latency which was present earlier. The organization under CIO is focused on building
the frameworks and the tools required by the domain teams, including self-service tools,
accelerators, templates, and solutions, thereby bringing all the domain teams to the same level.
This leads to standardization, automated policy enforcement, and providing interoperability and
discoverability support.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 9


Reference Architecture – Cloud
Data Mesh
In terms of the technical implementation of the Data Mesh, there are certain important modules
and components that must be covered. Figure 4 provides a reference architecture by expanding
each of the Data Mesh constituents – Domains, Data Products, Data Infrastructure as a platform,
and Federated Global Governance Ecosystem.

Figure 4: Reference Architecture – Cloud Data Mesh

A domain constitutes of various data products such as data pipelines or data sets. Each
data pipeline, in turn, follows extraction, transformation, and loading flow. Instead of directly
integrating with the sources and targets, it makes use of standard ports. Each port is different
in types, such as files, streams, and events, and provides an abstract standardized way to
integrate. Besides these, there are two more ports. One of the ports is used to share metrics and
enabling auditing. The second port enables sharing the metadata. The metadata helps discover
the datasets.

To enable these data pipelines and data products, the domain team makes use of certain
developer tools that are exposed to them by the underlying platform. The Data Infrastructure
as a Platform provides common components of compute, database and storage, processing,
and management. For compute, we can look at both the dedicated, virtualized, or serverless
computing engines. Whereas on the database and storage, we can have block storage,
relational databases, and other specialized storage mediums. For processing, the engines can
be dedicated or even serverless to help process data at scale. For management, a set of tools
provide information about the platform components to the domain owners or the product
owners and their team to gain an understanding of their data products and the platform.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 10


These tools are provided to the domain teams in an abstracted framework form or as self-
service tools to ensure consistency and standardization. It also allows to build controls for
interoperability, discoverability, and identity management. Also, as data mesh advocates
keeping the data around the source, hence Data Virtualization solutions are preferred to ensure
minimal replication/copying to other storages.

On top of this, there is a layer of Federated and global ecosystem governance which deals
with the access and policy management, interoperability, and discoverability for the whole
implementation, by enforcing controls over access and policy management. Authentication
services like IAM (Identity & Access Management) from AWS (AMAZON WEB SERVICES) act
as a security blanket around the services and tools. The architecture depicted in figure 4 makes
use of various AWS components and is for reference only. The actual implementation may have
components from other cloud providers, open source, or hybrid.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 11


Data Mesh in Manufacturing
Enterprise
Let us understand it using an example from the manufacturing segment. In this example of a
manufacturing enterprise, we are building a recall management solution, for which there is a
need to acquire data from various sources across different business areas. These include supply
chain, manufacturing operations, CRM (Customer Relationship Management), data about the
dealers and partners, and stitching it all together for the recall management solution.

In case of implementation involving a central data store, it would have been an ideal case to
build a data warehouse. Wherein a relational data model can be built to connect the data across
various segments, or one could even use a more performant graph-based data store to create
the knowledgebase as well. Keeping the organizational structure in perspective, to build such a
system, the BRM (Business Relationship Manager) would have to liaison with business owners
of the various segments or with other BRMs supporting the required business segments. BRM
is usually a part of the IT organization which works in tandem with the business owners of the
segments, and tries to bridge the gap between the business unit and the development team.

It means that to acquire data for manufacturing operations, supply chain, and other groups,
there is a need to conduct a big exercise to know about the data which can be acquired, what
are the data formats, the schema associated with it, the frequency, and the volume of the data
as well. What makes it complex is that the attributes – format, schema, volume etc. keep on
evolving as the business requirements keep on changing. With a change in requirements and
data attributes, the pipelines, and processes around them need be changed as well, leading to
a vicious cycle of capturing the changes, analyzing the impact, planning incremental changes,
deployment & maintenance. It will be difficult to find the root cause of the problem as every
time it needs to be back tracked. This ongoing exercise is required for making sure that the
systems are correctly ingesting the data and analyzing & aggregating them in the central data
store.

Apart from this, ensuring data quality is another challenge. For e.g. In case of a manufacturing
enterprise like an automotive manufacturer, due to a certain data discrepancy, the engine number,
the chassis number, and sometimes the customer ID may not get reflected in the system. To
figure out where the issue was, it is required to back track and validate each of the pipelines.

In an alternate implementation involving data mesh, the ideal setup would have each of the
business segments be tagged to a domain. For e.g., Supply Chain, Manufacturing operations,
CRM, Recall Management will be individual domains. Each domain is responsible for the data
products (dataset) they provide. Each data product has a metadata associated with it. This
includes data product identifier (unique across the enterprise), schema, format, frequency,
volume information etc. The data products are created by a team that understands the functional
aspects of the segment, and work closely to design, build, and publish it. During this process,
they may be subscribing to datasets from other domains, enrich and transform it, validate it
for data quality before sharing it with consumers. Consumers are the actual users of the data
product. Consumers can be individuals or other domains in the enterprise. Domains are free to

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 12


pick and apply the tools as provided to them by the underlying platform, thereby making the
tools standardized and easier to maintain.

Issues related to data quality like missing data, lack of precision etc., or a related to change in
the schema are usually not prevalent in the data mesh implementation. The consumer domain in
this example is of Recall Management, which can recreate the knowledgebase based on the raw
data in case of a major data discrepancy, if all it happens.

Summarizing the differences between the two approaches:

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 13


Conclusion
Data mesh is fairly a new concept, and it counters the basis of having a central data platform,
be it the data warehouse, or a data Lake. These central data platforms have served us well over
the last many decades. It is evident that implementation of the data mesh paradigm requires
transformation at enterprise technological and data architectural levels.

The way business and IT teams are structured today may change drastically. Other technical
challenges include unified data access across various data storage systems, linking various
domains and products for scale and data infrastructure as self-serve platform. These are the
top 3 technical challenges. There are certain edge cases which could be encountered in any
enterprise implementation of the data mesh. For e.g., what if a data domain has multiple
owners? Also, with implementation over cloud for the data mesh, there is an extremely high
probability that we would tend to settle with the PaaS (Platform as a Service) offering of the
cloud service provider. They do work well if they are the sole constituents of the infrastructure
platform. The things get complex as we introduce other storage and data processing systems
like Oracle, Teradata, Snowflake, and others, which have their own closed proprietary access
and management controls. Also, on the governance side, global interoperability, access control,
etc. is a challenge today, even on-prem and single cloud implementations. Given that the future
implementations will move towards hybrid and multi cloud setups, it is going to take a lot of
coordination between enterprise IT, cloud service providers, and incumbent vendors.

Data mesh certainly has a lot of advantages over a central data store implementation, and it can
continue to have so, if there is a right balance and acceptance of changes in the organizational
structure, processes, and technology.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 14


Author Info
Puneet Sachdeva

Puneet has an industry experience of around 14 years covering a


wide spectrum of tools, platforms and frameworks in Big Data, Data
Warehousing and Business Intelligence.

All through out, he has been actively involved in the conception,


implementation and execution of initiatives comprising data
integration, data processing and data analytics.

DATA MESH: A BUSINESS ORIENTED FRAMEWORK FOR QUICKER INSIGHTS 15


HCL Technologies (HCL) empowers global enterprises with technology for the next decade today. HCL’s
Mode 1-2-3 strategy, through its deep-domain industry expertise, customer-centricity and entrepreneurial WI-103217316747485-EN00GL
culture of ideapreneurship™ enables businesses to transform into next-gen enterprises.
HCL offers its services and products through three lines of business - IT and Business Services (ITBS),
Engineering and R&D Services (ERS), and Products & Platforms (P&P). ITBS enables global enterprises
to transform their businesses through offerings in areas of Applications, Infrastructure, Digital Process
Operations, and next generation digital transformation solutions. ERS offers engineering services and
solutions in all aspects of product development and platform engineering while under P&P. HCL provides
modernized software products to global clients for their technology and industry specific requirements.
Through its cutting-edge co-innovation labs, global delivery capabilities, and broad global network, HCL
delivers holistic services in various industry verticals, categorized under Financial Services, Manufacturing,
Technology & Services, Telecom & Media, Retail & CPG, Life Sciences, and Healthcare and Public Services.

www.hcltech.com As a leading global technology company, HCL takes pride in its diversity, social responsibility, sustainability,
and education initiatives. As of 12 months ending on March 31, 2020, HCL has a consolidated revenue of
US$ 10 billion and its 159,000 ideapreneurs operate out of 50 countries.
For more details contact: ers.info@hcl.com
Follow us on twitter: http://twitter.com/hclers and our blog http://ers.hclblogs.com/
Visit our website: http://www.hcltech.com/engineering-services/

You might also like