Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 17

Shivajirao Kadam Institute of Technology

and Management-Technical Campus,


Indore (M.P.)
Department of Computer Science and Engineering

Lecture

on

“YARN in Hadoop”

.
Introduction to YARN
 YARN stands for “Yet Another Resource
Negotiator”. It was introduced in Hadoop 2.0 to
remove the bottleneck on Job Tracker which was
present in Hadoop 1.0.

 YARN is the main component of Hadoop v2.0.


YARN helps to open up Hadoop by allowing to
process and run data for batch processing, stream
processing, interactive processing and graph
processing which are stored in HDFS.
Introduction to YARN
 It helps to run different types of distributed
applications other than MapReduce.

 YARN enabled the users to perform operations as


per requirement by using a variety of tools
like Spark for real-time processing, Hive for
SQL, HBase for NoSQL and others.

 Apart from Resource Management, YARN also


performs Job Scheduling.
Introduction to YARN
In MapReduce
 It assigned map and reduce tasks on a number of
subordinate processes called the Task Trackers. The
Task Trackers periodically reported their progress to
the Job Tracker.
WhyYARN?
 In Hadoop version 1.0 which is also referred to as
MRV1(MapReduce Version 1), MapReduce
performed both processing and resource
management functions.

 It consisted of a Job Tracker which was the single


master. The Job Tracker allocated the resources,
performed scheduling and monitored the processing
jobs.
Why YARN?
Why YARN?
 This design resulted in scalability bottleneck due to
a single Job Tracker.

 The practical limits of such a design are reached


with a cluster of 5000 nodes and 40,000 tasks
running concurrently.

 Apart from this limitation, the utilization of


computational resources is inefficient in MRV1
Why YARN?
 To overcome all these issues, YARN was introduced
in Hadoop version 2.0 in the year 2012 by Yahoo
and Hortonworks.

 The basic idea behind YARN is to relieve


MapReduce by taking over the responsibility of
Resource Management and Job Scheduling.

 YARN started to give Hadoop the ability to run non-


MapReduce jobs within the Hadoop framework.
Why YARN?
 In Hadoop 2.0, The concept of Application Master
and Resource Manager was introduced by YARN.
Across the cluster of Hadoop, the utilization of
resources is monitored by the Resource Manager.

 YARN is being extensively used for writing


applications by Hadoop Developers. It lets them
create applications, work with huge amounts of data,
and manipulate them in an efficient manner.
YARN vs. MapReduce

Criteria YARN MapReduce

Type of Processing Real-time, batch, and Batch processing with a


interactive processing single engine
with multiple engines
Cluster Resource Excellent due to central Average due to fixed
Optimization resource management Map and Reduce slots
Suitable for MapReduce and non- Only MapReduce
MapReduce applications applications
Managing Cluster Done by YARN Done by JobTracker
resource
Namespace With YARN, Hadoop Only one namespace
supports multiple could be supported, i.e.,
namespaces HDFS
YARN Features
 Multi-tenancy

 YARN lets you access various proprietary and


open-source engines for deploying Hadoop as a
standard for real-time, interactive, and batch
processing tasks that are able to access the
same dataset and parse it
YARN Features
 Multi-tenancy

 YARN lets you access various proprietary and


open-source engines for deploying Hadoop as a
standard for real-time, interactive, and batch
processing tasks that are able to access the
same dataset and parse it
YARN Features
 Cluster Utilization

 YARN lets you use the Hadoop cluster in a


dynamic way, rather than in a static manner by
which MapReduce applications were using it,
and this is a better and optimized way of
utilizing the cluster.
YARN Features
 Scalability

 The scheduler in Resource manager of YARN


architecture allows Hadoop to extend and
manage thousands of nodes and clusters
YARN Features
 Compatibility
 YARN tool is highly compatible with the
existing Hadoop MapReduce applications, and
thus those projects that are working with
MapReduce in Hadoop 1.0 can easily move on
to Hadoop 2.0 with YARN without any
difficulty, ensuring complete compatibility
thank you

You might also like