Professional Documents
Culture Documents
Hadoop Week 1
Hadoop Week 1
Hadoop Week 1
Week 1 Week 5
– Introduction to HDFS – HIVE
Week 2 Week 6
– Setting Up Hadoop Cluster – HBASE
Week 3 Week 7
– Map-Reduce Basics, types and formats – ZOOKEEPER
Week 4 Week 8
– PIG – SQOOP
What are we going to cover today?
Part 1
• Understand what is Big Data
• What is Hadoop
• Limitation of Existing EDW solutions
• Hadoop Differentiating Factors & Why Hadoop
• Hadoop Eco-System Components
Part 2
• Introduction to HDFS
• HDFS Anatomy
Take Away from Week 1 Training
• Basics of Hadoop
• Basics of HDFS
Part 1
What is Big Data?
Lots of Data(Terabytes or Petabytes)
Systems / Enterprises generate huge amount of data from Terabytes to and
even Petabytes of information.
A airline jet collects 10 terabytes of sensor
data for every 30 minutes of flying time
Source: http://www.emc.com/leadership/programs/digital-
universe.htm, which was based on the 2011 IDC Digital
Universe Study
Hidden Treasure
– HDFS Map
• Hadoop Distributed File System Reduce DataNodes /
TaskTracker
JobTracker :
– MapReduce
• Programming model for processing DataNodes /
and generating large datasets TaskTracker
15
Problems with the Current System
Solution: A Combined Storage Computer Layer
Differentiating Factors
Vertical scalability is not always the solution: upgrading server and storage.