Chapter 2 - Hadoop System

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

BIG DATA ESSENTIALS

HADOOP SYSTEM

Le Thi Minh Chau


Faculty Of Information Technology
HCMC University Of Technology And Education
Module Contents
2

 Introduction to Big Data and Hadoop


 Hadoop Core Components
 Hadoop Ecosystem

VB
LT
Module Contents
3

 Introduction to Big Data and Hadoop


 Hadoop Core Components
 Hadoop Ecosystem

VB
LT
Big Data and its Challenges
4

 Big data is a term for a collection of data sets so


large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data. processing applications.
 System/Enterprises generate huge amount of
data from Terabytes to and even Petabytes of
information.
  It’s very difficult to manage such huge data…
VB
LT
Big Data and its Challenges
5

VB
LT
Why Hadoop?
6

VB
LT
Hadoop and its Characteristics
7

 Apache Hadoop is a framework that allows the


distributed processing of large data sets
across clusters of commodity computers using
a simple programming model.
 It is an Open-source Data Management
technology with scale-out storage and
distributed processing.

VB
LT
Hadoop and its Characteristics
8

VB
LT
Module Contents
9

 Introduction to Big Data and Hadoop


 Hadoop Core Components
 Hadoop Ecosystem

VB
LT
Hadoop Core Components
10

 Map Reduce
 HDFS
 YARN

VB
LT
Hadoop Architecture
11

VB
LT
Map Reduce
12

 Distributed data processing model and


execution environment that runs on large
clusters of commodity machines.
 Also called MR.
 Programs are inherently parallel.

VB
LT
Map Reduce
13

 Map
 It takes data and set then divides it into chunks
such that they are converted into a new format
which would be in the form of a key-value pair.
 Reduce
 Key/Value pairs are reduced to tuples.
 MapReduce enables us to perform various
operations over the big data such as Filtering
VB
and Sorting and many such similarLT
ones.
HDFS
14

 Apache Hadoop File System - HDFS


 Hadoop Distributed File System
 A distributed, scalable, and portable file system
written in Java for the Hadoop framework.
 Provides high-throughput access to application
data.
 Runs on large clusters of commodity machines.
 Is used to store large datasets.
VB
LT
HDFS Components
15

 NameNode
 DataNode
 Secondary NameNode

VB
LT
HDFS Components
16

 NameNode
 Centralized piece of the HDFS (known as Master)
 Store the metadata
 Responsible for monitoring the Health Status of
the Slaves Nodes and to assign Tasks to the Data
Nodes.

VB
LT
HDFS Components
17

 DataNode
 Actual unit which stores the data (known as the
Slave)
 Responds to the Name Node about its Health
Status and the task status in the form of the
HeartBeat.
 If it fails to respond to the Name Node, then the Name
Node considers this Slave Node to be Dead and
reassigns the tasks to the Next available Data Node.
VB
LT
HDFS Components
18

 Secondary NameNode
 Not a backup of the Name
Node, acts as the Buffer to
the Name Node
 Stores the immediate
updates the FS-image of the
Name Node in the Edit-log
and updates the information
to the FinalFS-image when
VB
the Name Node is inactive. LT
YARN
19

 Yet Another Resource Negotiator


 The Update to Hadoop since 2nd version
 Responsible for Resource management and
Job Scheduling

VB
LT
YARN Components
20

VB
LT
YARN Components
21

 Resource Manager
 Core component of YARN, considered as the Master.
 Responsible for providing generic and flexible frameworks
to administer the computing resources in a Hadoop Cluster.
 Node Manager
 It is the Slave and it serves the Resource Manager.
 Node Manager is assigned to all the Nodes in a Cluster.
 Main responsibility of the Node Manager is to monitor the
Status of the Container and App Manager.
VB
LT
YARN Components
22

 App Manager
 Manages data processing in the Container and
request the Container resources from the
Resource Manager.
 Container
 Container is where the actual data processing
takes places.

VB
LT
Module Contents
23

 Introduction to Big Data and Hadoop


 Hadoop Core Components
 Hadoop Ecosystem

VB
LT
Hadoop Ecosystem
24

 Data Storage
 General Purpose Execution Engines
 Database Management Tools
 Data Abstraction Engines
 Real-time Data Streaming
 Graph-Processing Engines
 Machine Learning
VB
 Cluster Management LT
Hadoop Ecosystem
25

VB
LT
Hadoop Ecosystem
26

VB
LT
Hadoop Ecosystem
27

VB
LT
References
28

 https://www.edureka.co
 Google

VB
LT
Q&A
29

VB
LT

You might also like