Professional Documents
Culture Documents
WED 0830 McKnight William COLOR 10015
WED 0830 McKnight William COLOR 10015
Data
Maturity
ies
# Compan
Business
Success
2
Low Maturity Organization
,
4
AI is disruptive
Data is the Foundation
A New Dataset: Bio Data
6
Reference Data Architecture
STREAMING DATA LAKE METADATA ADVANCED
ANALYTICS
Wearables Devices Apps SELECTION CRITERIA
Discovery &
Innovation
SELECTION CRITERIA SELECTION CRITERIA MACHINE
• Scalable consumption
• Understand consumption/usage patterns, e.g., to • Data catalog LEARNING
• Guaranteed delivery study patient outcomes via IoT devices • Governance & HIPAA VISUALIZATION
• Persistent storage • Decide architecture: Hadoop, relational, or cloud compliance
• Conditional routing • Data-centric security • Enable self-service DASHBOARDS/KPI
• Can manage technical, business, and operational • Privacy & protection
Transcripts R&D Encounters REPORTING
metadata • Support for hybrid cloud/
• Mature data prep and deep support of data multiple platforms
DATA SELECTION CRITERIA
Improvements
SELECTION CRITERIA provisioning and workflow
lake does not become a “swamp” with decentralized self-
Records External data Transactions • Support for governance
and management of data service
assets • Development/integration
DATA WAREHOUSE VIRTUALIZATION support for R (or other
• Data acquisition from
ops to analytics stats tool)
SELECTION CRITERIA
MDM Support for MDM
SELECTION CRITERIA • Interactive visual
SELECTION CRITERIA • • Query optimization
• High performance at scale in terms of data exploration
• Workflow & business process • Data consistency • Supports hybrid cloud
volume and concurrent users • Scalability and data model
management with KPI/measurement between operational architecture
applications • Cloud deployable complexity
capabilities • In-memory/caching
Strategic Vision
Interenterprise data • Separate compute and storage architecture • Governed data discovery
• Loading, synchronization & integration • capability
sharing • Flexible, but predictable, pricing model • Extranet deployment and
• Multi domain and governance support • Data services (API) layer
• Data services for broader • Fit within the ecosystem mobile authoring
• Analytical MDM support • Security and governance
architecture approaches • Vendor customer satisfaction • Sharing and collaboration
integration
• Support for hybrid cloud • Embedded content
• Ease of use and visual
appeal
Billing CxO HR
D
a
t Data
Warehouse
a
C
u Data
Mart
l
t
i Data
Lake
v
a
t
i
o
n Usage Understanding by the Builders
HDFS vs Cloud Storage
• Cloud Storage is more scalable, persistent
and available, and less expensive
• Public Cloud Providers back up Cloud
Storage and support compression, making
the cost of big data less
• HDFS has better query performance
• HDFS has storage formats Parquet & ORC
that cannot be used on Cloud Storage
• Cloud Storage object size limits and PUT
size limits
9
Beyond the Data Warehouse
Data Lake
DM
DW
DM
10
Operational Big Data Platform Selection
Key-Value
Document
SQL
Data
Size
Column Store
SQL Graph
Workload Complexity
NoSQL and NewSQL
NewSQL is
• Scale-out IMC for OLTP
– 100,000+ transactions per second ingest
– 1-2 millisecond response times
• ACID compliant
• SQL
• Supports Sharding
• Use cases: Capital markets data feeds, financial trade,
telco record streams, sensor-based distribution systems,
wireless, online gaming, fraud detection, digital ad
exchanges and micro transaction systems
Every CURATED Analytic Database Is A…
• THE Data Warehouse
– Value-Added Components: Modeling for Access,
Data Quality, Tooling, Conformed Dimensions,
Data Governance, Etc.
• Staging for the Data Warehouse
• A Dependent Data Mart (Fed from the Data
Warehouse)
• A Data Lake
• A Big Data Cluster
• An Independent Data Mart, reasonably
created
13
Get the Benefits of Cloud Computing with
your Cloud Deployment
• On-Demand and Self Service
• Broad Network Access
• Resource Pooling
• Rapid Elasticity
• Measured Service
Source: Cloud Security and Privacy. An Enterprise Perspective on Risks&Compliance (Mather, Kumaraswamy & Latif)
14
The Money Tree Doesn’t Exist
15
Every Project is Burdened (with Grander
Opportunity)
16
Managing Master Data is a Repetitive Task
17
Stream Processing
• ETL is Insufficient for this combination:
– Data platforms operating at an enterprise-wide scale
– A high variety of data sources
– Real-time/streaming data
• Enter Message-Oriented
Multi-Threaded
Middleware
Cluster
aka Streaming
Web Services Big Data
andParallel
message
queuing technology
Math Libraries support API Analysis Tools
Request - Response
Change logs
Messaging or
Streaming Stream processing
Platform
DW Hadoop
18
Streaming data pipelines
Data Virtualization