Professional Documents
Culture Documents
Data Observability & Discovery Platform - OpenMetadata - by Amit Singh Rathore - Geek Culture - Medium
Data Observability & Discovery Platform - OpenMetadata - by Amit Singh Rathore - Geek Culture - Medium
Search Medium
Member-only story
Data discovery is a crucial first step of the data consumption workflow. Data
discovery answers different aspects of data like what is the source, where it is
stored, what is the meaning of this data, how recent/relevant this data is, how
this data is used by others, and how this data came into its current form
(lineage), etc. So, Data Discovery becomes an essential part of a data platform.
Based on the tools selection for four major capabilities like search(solr),
attribute lookup (databases), entity relation(graph databases), and regular
refresh of metadata (schedulers/queues) multiple companies have built their
own versions of metadata platforms. Few of the major ones are Amundsen,
DataHub, Atlas, Metacat, Databook, and Marquez. Each product has its own way
and specification of collecting metadata. Some support a certain number of
sources while some have very limited integration.
In general, the catalog/metadata segment of the data platform has the following
shortcomings.
5. Undiscoverable ML assets
An open standard for collecting metadata could become a sound solution to the
lack of efficient discovery and observability and a solid foundation for the next-
gen data platform.
OpenMetadata
OpenMetadata is touted as Open Standard for Metadata. A single place to
discover, collaborate and get your data right.
OpenMetadata has its own specification, which can be found here. Each schema
definition is mapped to the data/asset entity type.
SAML Protected Metadata APIs — for producing and consuming metadata built
on schemas for User Interfaces and Integration of tools, systems, and services.
OpenMetadata User Interface — Easy to use User interface for users to discover,
and collaborate on all data.
OpenMetadata components
Server — UI & API
Elastic search— Search & Analytics engine
Ingestion — Airflow
OpenMetadata features
Support for personas using RBAC
Activity Feeds — shows all change events linked to assets in a single view
Task workflow for raising Request objects for data owners for any changes
Metadata versioning
Happy cataloging!!!!
116 1
Arslan Ahmad in Geek Culture
1.6K 10
171 4
82 1
AWS Data Lake vs. Data Warehouse: Choosing the Right Data Storage
Organizations face a constant struggle in today’s data-driven world: how to efficiently manage,
store, and exploit their data. AWS (Amazon…
41
Nam Huynh Thien
131
Lists
New_Reading_List
174 stories · 133 saves
Icon Design
30 stories · 106 saves
Nicholas Leong
446 5
63
Thosan Girisona in Data Engineering Indonesia
40