Introduction To YARN

Shivajirao Kadam Institute of Technology
and Management-Technical Campus,

Indore (M.P.)
Department of Computer Science and Engineering
Lecture
on
“YARN in Hadoop”
.
Introduction to YARN
 YARN stands for “Yet Another Resource
Negotiator”. It was introduced in Hadoop 2.0 to
remove the bottleneck on Job Tracker which was
present in Hadoop 1.0.
 YARN is the main component of Hadoop v2.0.

YARN helps to open up Hadoop by allowing to
process and run data for batch processing, stream
processing, interactive processing and graph
processing which are stored in HDFS.
 It helps to run different types of distributed
applications other than MapReduce.
 YARN enabled the users to perform operations as

per requirement by using a variety of tools
like Spark for real-time processing, Hive for
SQL, HBase for NoSQL and others.
 Apart from Resource Management, YARN also

performs Job Scheduling.
In MapReduce
 It assigned map and reduce tasks on a number of
subordinate processes called the Task Trackers. The
Task Trackers periodically reported their progress to
the Job Tracker.
WhyYARN?
 In Hadoop version 1.0 which is also referred to as
MRV1(MapReduce Version 1), MapReduce
performed both processing and resource
management functions.
 It consisted of a Job Tracker which was the single

master. The Job Tracker allocated the resources,
performed scheduling and monitored the processing
jobs.
Why YARN?
Why YARN?
 This design resulted in scalability bottleneck due to
a single Job Tracker.
 The practical limits of such a design are reached

with a cluster of 5000 nodes and 40,000 tasks
running concurrently.
 Apart from this limitation, the utilization of

computational resources is inefficient in MRV1
Why YARN?
 To overcome all these issues, YARN was introduced
in Hadoop version 2.0 in the year 2012 by Yahoo
and Hortonworks.
 The basic idea behind YARN is to relieve

MapReduce by taking over the responsibility of
Resource Management and Job Scheduling.
 YARN started to give Hadoop the ability to run non-

MapReduce jobs within the Hadoop framework.
Why YARN?
 In Hadoop 2.0, The concept of Application Master
and Resource Manager was introduced by YARN.
Across the cluster of Hadoop, the utilization of
resources is monitored by the Resource Manager.
 YARN is being extensively used for writing

applications by Hadoop Developers. It lets them
create applications, work with huge amounts of data,
and manipulate them in an efficient manner.
YARN vs. MapReduce
Criteria YARN MapReduce
Type of Processing Real-time, batch, and Batch processing with a

interactive processing single engine
with multiple engines
Cluster Resource Excellent due to central Average due to fixed
Optimization resource management Map and Reduce slots
Suitable for MapReduce and non- Only MapReduce
MapReduce applications applications
Managing Cluster Done by YARN Done by JobTracker
resource
Namespace With YARN, Hadoop Only one namespace
supports multiple could be supported, i.e.,
namespaces HDFS
YARN Features
 Multi-tenancy
 YARN lets you access various proprietary and

open-source engines for deploying Hadoop as a
standard for real-time, interactive, and batch
processing tasks that are able to access the
same dataset and parse it
YARN Features
 Multi-tenancy
 YARN lets you access various proprietary and

open-source engines for deploying Hadoop as a
standard for real-time, interactive, and batch
processing tasks that are able to access the
same dataset and parse it
YARN Features
 Cluster Utilization
 YARN lets you use the Hadoop cluster in a

dynamic way, rather than in a static manner by
which MapReduce applications were using it,
and this is a better and optimized way of
utilizing the cluster.
YARN Features
 Scalability
 The scheduler in Resource manager of YARN

architecture allows Hadoop to extend and
manage thousands of nodes and clusters
YARN Features
 Compatibility
 YARN tool is highly compatible with the
existing Hadoop MapReduce applications, and
thus those projects that are working with
MapReduce in Hadoop 1.0 can easily move on
to Hadoop 2.0 with YARN without any
difficulty, ensuring complete compatibility
thank you

Introduction To YARN

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To YARN

Uploaded by

Copyright:

Available Formats

Shivajirao Kadam Institute of Technology

and Management-Technical Campus,

 YARN is the main component of Hadoop v2.0.

 YARN enabled the users to perform operations as

 Apart from Resource Management, YARN also

 It consisted of a Job Tracker which was the single

 The practical limits of such a design are reached

 Apart from this limitation, the utilization of

 The basic idea behind YARN is to relieve

 YARN started to give Hadoop the ability to run non-

 YARN is being extensively used for writing

Criteria YARN MapReduce

Type of Processing Real-time, batch, and Batch processing with a

 YARN lets you access various proprietary and

 YARN lets you access various proprietary and

 YARN lets you use the Hadoop cluster in a

 The scheduler in Resource manager of YARN

You might also like