Professional Documents
Culture Documents
01 - IBM Data Lake Solutions & Technologies - Le Nhan Tam
01 - IBM Data Lake Solutions & Technologies - Le Nhan Tam
—
Solutions & Technologies
Fastest growing
market: Asia-Pacific
Difficult to understand
And difficult to trust
3 © IBM Corporation
What problem(s) are we trying to solve with a Data Lake?
Ideally, the data is well organized and can be found easily
• Ideally, data from throughout
the organization will be
categorized, integrated and
can be easily found, like a
The desire… well-managed library.
4 © IBM Corporation
What problem(s) are we trying to solve with a Data Lake?
The objective of an effective Data Lake
• As we collect data
• Can we preserve clarity?
• Do we know what we are
• Allow users to easily find the data they need collecting?
• ‘Shop for data’ • Can we find the data we need?
• Collect and aggregate data from multiple sources
• Minimize the need for lengthy IT involvement
Systems of
Insight
5 © IBM Corporation
What is a Data Lake?
IBM’s view point on Data Lake
A group of repositories
to trusted data
6 © IBM Corporation
© 2019 IBM Corporation
EL-T
SQL
7 © IBM Corporation
© 2019 IBM Corporation
Evolving to address the challenges
AI Data Lake
Need for automation (DataOps and ML/Ops), focus on ML/AI workload, hybrid cloud
Design Principle: Microservices, Business Outcome.
Data Warehouse
Proliferation of data silos and need for enterprise Insight, led to data warehouse.
Design principle: Focus on KPI , known matrix, Business intelligence.
8 © IBM Corporation
© 2019 IBM Corporation
Major Architecture and Technology shifts runaway
9 © IBM Corporation
IBM AI Data Lake
• An analytics sandbox for exploring data to gain insight.
• An enterprise-wide catalog to find data across the
enterprise and to link from business term to technical
metadata.
• An environment where users can access vast amounts raw
data at low cost.
• Tools and technologies for processing large volume of data.
AI Data Lake
10
Business Value
What is a Data Lake?
IBM’s Data Lake – designed for data access – with safeguards
11 © IBM Corporation
What is a Data Lake?
Personas/roles supported by the Data Lake
Systems of
Automation
Other Data
Lakes
Information Management and Governance Fabric Data Lake
Operations
12 © IBM Corporation
What is a Data Lake?
The Data Lake sub-systems
Systems of
Automation
Other Data
Lakes
Information Management and Governance Fabric Data Lake
Operations
13 © IBM Corporation
What is a Data Lake?
Who benefits from a Data Lake?
Campaign IT Security
Manager Business Data Scientist/ Data Data &
LOB Analyst Developer Strategist Steward Governance
Business IT
•LOB users, business analysts and data scientists can easily find the information
they need without extensive IT involvement.
•Data strategists and data stewards can make information available to users in
an organized and well-governed manner.
•IT security and governance teams can be assured that information is governed
according to well-defined organizational and regulatory policies.
14 © IBM Corporation
IBM Reference Architecture for AI Data Lake
Micro Services based architecture deployed on OpenShift RedHat Maybe Existing
Data Sources
Data Catalog + governance
Machine and sensor
data Dashboard/Reporting
Streaming
Image and video
Message Hub
Enterprise Data
Warehouse
Enterprise content Data Virtualization
ML Model
SQL + RestAPI Data Science tools
Transaction and Data Integration Deployment
& Transformation
application data
Spark cluster Hadoop cluster
(ad-hoc query) (transformation)
Social data
Data as a Service
Third-party data Object Store –
Raw data + processed data (parquet file)
Data Sources
Data Catalog + governance
Machine and sensor CP4D-WKC
data Dashboard/Reporting
ML Model
SQL + RestAPI Data Science tools
Transaction and Data Integration Deployment
& Transformation
application data
IBM BigIntegrate Spark cluster Hadoop cluster
(ad-hoc query) (transformation)
Social data CP4D-Analytics IBM Advanced Data
Engine Preparation
Data as a Service
Third-party data Object Store – IBM ESS
Raw data + processed data (parquet file)
Streaming
I mage and video
2
Enterprise Data
stack.
ML Model
Transaction and Deployment
application data
Data as a Service
Third-party data Object Store –
Raw data + processed data (parquet file)
Self-Services Data
Auto Assign
Preparation
Transaction and Business Terms
application data
Automated Core Governance & Master Data Management Services
Social data Policy Management Consent Business Glossary
Data Lineage
&Enforcement Management Management
Data Archival & Model Governance & Entity Management Data Quality
Disposal Bias Reporting & Resolution Management
Third-party data
AI, ML &
I mage and video Flat Files NoSQL Optimization
Data Integration
Compliance
Enterprise content Object Store Relational DV Data Access Reporting
Data Movement Engine
Data Replication (SQL, APIs, NLQ) Discovery &
…
Transaction and
application data
Hadoop
Exploration
On-prem
“AI Data Lake” – Think Big, Start Small …
Existing Hadoop Lite Data Lake New Data Lake Initiatives and
Infrastructure ( Cloud Object Storage + Spark ) Data Warehouse replacements
Upgrade existing Hortonwork/Cloudera Look from use cases:
Lead with minimal Cloud Pak for
to CDP Customer behavior analytics,
Data & Object store Data warehouse modernization,
Augment with Cloud Pak for Data - Audit and compliance analytics
DataOps/MLOps …
Object store
BigSQL BI/Dashboard
WKC
24
Cloud Pak for Data
Delivers the foundational platform for deploying an information architecture for AI, on
any cloud
Virtualized
Data Access
Data Search for
Data
Decision Data
AutoAI ML/DL Python/R Notebooks
Optimization Pipelin
Models Functions
es
Machine Learning Runtimes Deep Learning Runtimes Deployment methods
Batc Edge
h
Scalable & modern infrastructure Hadoop Execution Engine On x-86, Power + GPUs Management & Monitoring
2
9
IBM Watson Machine Learning
Embed Machine Learning and Deep Learning
in your Business