Professional Documents
Culture Documents
Solution Design & Target Architecture For Hadoop As An Enterprise-Wide Shared Service
Solution Design & Target Architecture For Hadoop As An Enterprise-Wide Shared Service
Solution Design & Target Architecture for Hadoop as an Enterprise-Wide Shared Service
What data goes into
Hadoop?
New data types
Documents
Email
Voice to Text
Web Logs
Tableau
Microstrategy
Pentaho
1. SAS
2. Mahout
1. R statistical libraries
3. Spark
Click Streams
What is Hadoop?
Data Integration
& Governance
Data Access
Social Networks
Batch
Script
SQL
Online DB
Real-Time
Streams
Inmemory
Search
Graph
Others
MapReduce
Pig
Hive
HBase
Storm
Spark
Solr
Giraph
Apache
Drill
Accumulo
Spark
Streaming
ElasticSearch
Graph-X
Sensors
Data Workflow
Data Lifecycle
Falcon
Impala
Flume
Sqoop
WebHDFS
NFS
Metadata Management
(HCatalog)
Machine Generated
Operations
Authentication, Authorization,
Accountability, Data Protection
a
cross Storage: HDFS
Data Protection: Snapshots,
Disaster Recovery/Business
Continuity
Resources: YARN
A
ccess: Hive, Drill,
Spark SQL, Impala
P
ipeline: Falcon
Cluster: Knox
Real-time and
Batch Ingest
Spark SQL
Security
Scheduling
Oozie
Data Management
Multitenant Processing: YARN
Geolocation
Storage: HDFS
US$ 000s
CRM
SCM etc.
Environment
NAS
Linux
Windows
Engineered System2
Deployment Model
On Premise
Appliance
sponsored by
HADOOP
EDW / MPP
Virtualize
Commodity HW
Cloud/Hosted
Min
Max
1
Hardware, software, installation
2
E.g., Oracle Exadata
SAN
0.250 to 1
10 to 20
12 to 18
20 to 80
36 to 180
0 5 10 15 20 25 30 35 40