Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Comparing Data

Platforms and Creating a


Modern Data
Architecture

Presented by: William McKnight


President, McKnight Consulting Group
williammcknight
www.mcknightcg.com
(214) 514-1444
Data Maturity is Highly Correlated to Business
Success

Data
Maturity

ies
# Compan

Business
Success
2
Low Maturity Organization

,
4
AI is disruptive
Data is the Foundation
A New Dataset: Bio Data

6
Reference Data Architecture
STREAMING DATA LAKE METADATA ADVANCED
ANALYTICS
Wearables Devices Apps SELECTION CRITERIA

Discovery &
Innovation
SELECTION CRITERIA SELECTION CRITERIA MACHINE
• Scalable consumption
• Understand consumption/usage patterns, e.g., to • Data catalog LEARNING
• Guaranteed delivery study patient outcomes via IoT devices • Governance & HIPAA VISUALIZATION
• Persistent storage • Decide architecture: Hadoop, relational, or cloud compliance
• Conditional routing • Data-centric security • Enable self-service DASHBOARDS/KPI
• Can manage technical, business, and operational • Privacy & protection
Transcripts R&D Encounters REPORTING
metadata • Support for hybrid cloud/
• Mature data prep and deep support of data multiple platforms
DATA SELECTION CRITERIA

Outcome & Operational


integration for chosen architecture
INTEGRATION • Agile and centralized BI
• Data management practices to ensure the data

Improvements
SELECTION CRITERIA provisioning and workflow
lake does not become a “swamp” with decentralized self-
Records External data Transactions • Support for governance
and management of data service
assets • Development/integration
DATA WAREHOUSE VIRTUALIZATION support for R (or other
• Data acquisition from
ops to analytics stats tool)
SELECTION CRITERIA
MDM Support for MDM
SELECTION CRITERIA • Interactive visual
SELECTION CRITERIA • • Query optimization
• High performance at scale in terms of data exploration
• Workflow & business process • Data consistency • Supports hybrid cloud
volume and concurrent users • Scalability and data model
management with KPI/measurement between operational architecture
applications • Cloud deployable complexity
capabilities • In-memory/caching

Strategic Vision
Interenterprise data • Separate compute and storage architecture • Governed data discovery
• Loading, synchronization & integration • capability
sharing • Flexible, but predictable, pricing model • Extranet deployment and
• Multi domain and governance support • Data services (API) layer
• Data services for broader • Fit within the ecosystem mobile authoring
• Analytical MDM support • Security and governance
architecture approaches • Vendor customer satisfaction • Sharing and collaboration
integration
• Support for hybrid cloud • Embedded content
• Ease of use and visual
appeal

Billing CxO HR
D
a
t Data
Warehouse
a

C
u Data
Mart
l
t
i Data
Lake
v
a
t
i
o
n Usage Understanding by the Builders
HDFS vs Cloud Storage
• Cloud Storage is more scalable, persistent
and available, and less expensive
• Public Cloud Providers back up Cloud
Storage and support compression, making
the cost of big data less
• HDFS has better query performance
• HDFS has storage formats Parquet & ORC
that cannot be used on Cloud Storage
• Cloud Storage object size limits and PUT
size limits
9
Beyond the Data Warehouse

Data Lake

DM
DW
DM

10
Operational Big Data Platform Selection

Key-Value
Document
SQL
Data
Size
Column Store

SQL Graph

Workload Complexity
NoSQL and NewSQL
NewSQL is
• Scale-out IMC for OLTP
– 100,000+ transactions per second ingest
– 1-2 millisecond response times
• ACID compliant
• SQL
• Supports Sharding
• Use cases: Capital markets data feeds, financial trade,
telco record streams, sensor-based distribution systems,
wireless, online gaming, fraud detection, digital ad
exchanges and micro transaction systems
Every CURATED Analytic Database Is A…
• THE Data Warehouse
– Value-Added Components: Modeling for Access,
Data Quality, Tooling, Conformed Dimensions,
Data Governance, Etc.
• Staging for the Data Warehouse
• A Dependent Data Mart (Fed from the Data
Warehouse)
• A Data Lake
• A Big Data Cluster
• An Independent Data Mart, reasonably
created
13
Get the Benefits of Cloud Computing with
your Cloud Deployment
• On-Demand and Self Service
• Broad Network Access
• Resource Pooling
• Rapid Elasticity
• Measured Service

Source: Cloud Security and Privacy. An Enterprise Perspective on Risks&Compliance (Mather, Kumaraswamy & Latif)

14
The Money Tree Doesn’t Exist

Hitch your Architecture and Maturity Efforts to an


Application Budget

15
Every Project is Burdened (with Grander
Opportunity)

16
Managing Master Data is a Repetitive Task

17
Stream Processing
• ETL is Insufficient for this combination:
– Data platforms operating at an enterprise-wide scale
– A high variety of data sources
– Real-time/streaming data
• Enter Message-Oriented
Multi-Threaded
Middleware
Cluster
aka Streaming
Web Services Big Data
andParallel
message
queuing technology
Math Libraries support API Analysis Tools

Technical IDE / Developer


Support GUI

Request - Response
Change logs

Messaging or
Streaming Stream processing
Platform
DW Hadoop

18
Streaming data pipelines
Data Virtualization

Enterprise Data Virtualization

Data Warehouses Big Data


Marts & Cubes Operational File Systems
Data Stores Transactional
Sources

Enterprise-wide data fabric


providing consistent and timely access
to all structured and semi-structured data
19
William’s grab bag
• You might have a need for an in-memory
database
• For high unstructured content in a
workload, use an Unstructured Data
Platform
• Platform differently, approach application
curation differently
• More columnar
• Are GPU databases the future?
• Are hybrid databases the future?
Comparing Data
Platforms and Creating a
Modern Data
Architecture

Presented by: William McKnight


President, McKnight Consulting Group
williammcknight
www.mcknightcg.com
(214) 514-1444

You might also like