Professional Documents
Culture Documents
Chapter 2 - Hadoop System
Chapter 2 - Hadoop System
Chapter 2 - Hadoop System
HADOOP SYSTEM
VB
LT
Module Contents
3
VB
LT
Big Data and its Challenges
4
VB
LT
Why Hadoop?
6
VB
LT
Hadoop and its Characteristics
7
VB
LT
Hadoop and its Characteristics
8
VB
LT
Module Contents
9
VB
LT
Hadoop Core Components
10
Map Reduce
HDFS
YARN
VB
LT
Hadoop Architecture
11
VB
LT
Map Reduce
12
VB
LT
Map Reduce
13
Map
It takes data and set then divides it into chunks
such that they are converted into a new format
which would be in the form of a key-value pair.
Reduce
Key/Value pairs are reduced to tuples.
MapReduce enables us to perform various
operations over the big data such as Filtering
VB
and Sorting and many such similarLT
ones.
HDFS
14
NameNode
DataNode
Secondary NameNode
VB
LT
HDFS Components
16
NameNode
Centralized piece of the HDFS (known as Master)
Store the metadata
Responsible for monitoring the Health Status of
the Slaves Nodes and to assign Tasks to the Data
Nodes.
VB
LT
HDFS Components
17
DataNode
Actual unit which stores the data (known as the
Slave)
Responds to the Name Node about its Health
Status and the task status in the form of the
HeartBeat.
If it fails to respond to the Name Node, then the Name
Node considers this Slave Node to be Dead and
reassigns the tasks to the Next available Data Node.
VB
LT
HDFS Components
18
Secondary NameNode
Not a backup of the Name
Node, acts as the Buffer to
the Name Node
Stores the immediate
updates the FS-image of the
Name Node in the Edit-log
and updates the information
to the FinalFS-image when
VB
the Name Node is inactive. LT
YARN
19
VB
LT
YARN Components
20
VB
LT
YARN Components
21
Resource Manager
Core component of YARN, considered as the Master.
Responsible for providing generic and flexible frameworks
to administer the computing resources in a Hadoop Cluster.
Node Manager
It is the Slave and it serves the Resource Manager.
Node Manager is assigned to all the Nodes in a Cluster.
Main responsibility of the Node Manager is to monitor the
Status of the Container and App Manager.
VB
LT
YARN Components
22
App Manager
Manages data processing in the Container and
request the Container resources from the
Resource Manager.
Container
Container is where the actual data processing
takes places.
VB
LT
Module Contents
23
VB
LT
Hadoop Ecosystem
24
Data Storage
General Purpose Execution Engines
Database Management Tools
Data Abstraction Engines
Real-time Data Streaming
Graph-Processing Engines
Machine Learning
VB
Cluster Management LT
Hadoop Ecosystem
25
VB
LT
Hadoop Ecosystem
26
VB
LT
Hadoop Ecosystem
27
VB
LT
References
28
https://www.edureka.co
Google
VB
LT
Q&A
29
VB
LT