Professional Documents
Culture Documents
Yarnbdhadoop
Yarnbdhadoop
Advancements in YARN
Resource Manager, Fig. 1
100
50
Time
Application variety Users’ interest has ex- ready, general-purpose resource manager that
panded from batch analytics applications (e.g., can support a wide range of applications and
MapReduce) to include streaming, iterative user requirements over large shared clusters. In
(e.g., machine learning), and interactive the remainder of this entry, we first give a brief
computations. overview of YARN’s architecture and dedicate
Large shared clusters Instead of using dedi- the rest of the paper to the new functionality that
cated clusters for each application, diverse was added to YARN these last years.
workloads are consolidated on clusters
of thousands or even tens of thousands
of machines. This consolidation avoids YARN Architecture
unnecessary data movement, allows for better
resource utilization, and enables pipelines YARN follows a centralized architecture in which
with different application classes. a single logical component, the resource manager
High resource utilization Operating large clus- (RM), allocates resources to jobs submitted to the
ters involves a significant cost of ownership. cluster. The resource requests handled by the RM
Hence, cluster operators rely on resource man- are intentionally generic, while specific schedul-
agers to achieve high cluster utilization and ing logic required by each application is encap-
improve their return on investment. sulated in the application master (AM) that any
Predictable execution Production jobs typically framework can implement. This allows YARN
come with Service Level Objectives (SLOs), to support a wide range of applications using
such as completion deadlines, which have to the same RM component. YARN’s architecture
be met in order for the output of the jobs to be is depicted in Fig. 2. Below we describe its main
consumed by downstream services. Execution components. The new features, which appear in
predictability is often more important than orange, are discussed in the following sections.
pure application performance when it comes
to business-critical jobs. Node Manager (NM) The NM is a daemon
running at each of the cluster’s worker nodes.
This diverse set of requirements has
NMs are responsible for monitoring resource
introduced new challenges to the resource
availability at the host node, reporting faults,
management layer. To address these new
and managing containers’ life cycle (e.g., start,
demands, YARN has evolved from a platform
monitor, pause, and kill containers).
for batch analytics workloads to a production-
Advancements in YARN Resource Manager 3
Application State
YARN Client
Cluster State
NM Servcice
AM Servcice
Service
Scheduler
Opportunistic
Scheduler
Guaranteed
Scheduler
Advancements in YARN Resource Manager, Fig. 2 YARN architecture and overview of new features (in orange)
Resource Manager (RM) The RM runs on a Application Master (AM) The AM is the job
dedicated machine, arbitrating resources among orchestrator (one AM is instantiated per sub-
various competing applications. Multiple RMs mitted job), managing all its life cycle aspects,
can be used for high availability, with one of including dynamically increasing and decreasing
them being the master. The NMs periodically resource consumption, managing the execution
inform the RM of their status, which is stored flow (e.g., running reducers against the output of
at the cluster state. The RM-NM communication mappers), and handling faults. The AM can run
is heartbeat-based for scalability. The RM also arbitrary user code, written in any programming
maintains the resource requests of all applica- language. By delegating all these functions to
tions (application state). Given its global view AMs, YARN’s architecture achieves significant
of the cluster and based on application demand, scalability, programming model flexibility, and
resource availability, scheduling priorities, and improved upgrading/testing.
sharing policies (e.g., fairness), the scheduler An AM will typically need to harness re-
performs the matchmaking between application sources from multiple nodes to complete a job.
requests and machines and hands leases, called To obtain containers, the AM issues resource
containers, to applications. A container is a log- requests to the RM via heartbeats, using the AM
ical resource bundle (e.g., 2 GB RAM, 1 CPU) Service interface. When the scheduler assigns a
bound to a specific node. resource to the AM, the RM generates a lease
YARN includes two scheduler implementa- for that resource. The AM is then notified and
tions, namely, the Fair and Capacity Schedulers. presents the container lease to the NM for launch-
The former imposes fairness between applica- ing the container at that node. The NM checks
tions, while the latter dedicates a share of the the authenticity of the lease and then initiates the
cluster resources to groups of users. container execution.
Jobs are submitted to the RM via the YARN In the following sections, we present the main
Client protocol and go through an admission advancements made in YARN, in particular with
control phase, during which security credentials respect to resource utilization, scalability, support
are validated and various operational and admin- for services, and execution predictability.
istrative checks are performed.
4 Advancements in YARN Resource Manager
performance and predictability of jobs that opt for the AM to be executed, gets the subcluster
out of overcommitted resources. URL from the State Store, and redirects the A
application submission request to the appro-
priate subcluster RM.
Cluster Scalability AMRM Proxy This component runs as a service
at each NM of the cluster and acts as a proxy
A single YARN RM can manage a few thousands for every AM-RM communication. Instead of
of nodes. However, production analytics clusters directly contacting the RM, applications are
at big cloud companies are often comprised of forced by the system to access their local
tens of thousands of machines, crossing YARN’s AMRM Proxy. By dynamically routing the
limits (Burd et al. 2017). AM-RM messages, the AMRM Proxy pro-
YARN’s scalability is constrained by the re- vides the applications with transparent ac-
source manager, as load increases proportionally cess to multiple YARN RMs. Note that the
to the number of cluster nodes and the appli- AMRM Proxy is also used to implement the
cation demands (e.g., active containers, resource local scheduler for opportunistic containers
requests per second). Increasing the heartbeat and could be used to protect the system against
intervals could improve scalability in terms of misbehaving AMs.
number of nodes, but would be detrimental to
This federated design is scalable, as the
utilization (Vavilapalli et al. 2013) and would
number of nodes each RM is responsible for
still pose problems as the number of applications
is bounded. Moreover, through appropriate
increases.
policies, the majority of applications will be
Instead, as of Apache Hadoop 2.9 (YARN
executed within a single subcluster; thus the
Federation 2017), a federation-based approach
number of applications that are present at
scales a single YARN cluster to tens of thousands
each RM is also bounded. As the coordination
of nodes. This approach divides the cluster into
between subclusters is minimal, the cluster’s size
smaller units, called subclusters, each with its
can be scaled almost linearly by adding more
own YARN RM and NMs. The federation system
subclusters. This architecture can provide tight
negotiates with subcluster RMs to give appli-
enforcement of scheduling invariants within a
cations the experience of a single large cluster,
subcluster, while continuous rebalancing across
allowing applications to schedule their tasks to
subclusters enforces invariants in the whole
any node of the federated cluster.
cluster.
The state of the federated cluster is coordi-
A similar federated design has been followed
nated through the State Store, a central compo-
to scale the underlying store (HDFS Federation
nent that holds information about (1) subcluster
2017).
liveliness and resource availability via heartbeats
sent by each subcluster RM, (2) the YARN sub-
cluster at which each AM is being deployed,
Long-Running Services
and (3) policies used to impose global cluster
invariants and perform load rebalancing.
As already discussed, YARN’s target applications
To allow jobs to seamlessly span subclusters,
were originally batch analytics jobs, such as
the federated cluster relies on the following com-
MapReduce. However, a significant share of
ponents:
today’s clusters is dedicated to workloads that
Router A federated YARN cluster is equipped include stream processing, iterative computa-
with a set of routers, which hide the presence tions, data-intensive interactive jobs, and latency-
of multiple RMs from applications. Each ap- sensitive online applications. Unlike batch
plication gets submitted to a router, which, jobs, these applications benefit from long-lived
based on a policy, determines the subcluster containers (from hours to months) to amortize
6 Advancements in YARN Resource Manager
container initialization costs, reduce scheduling required to be collocated (affinity) to reduce net-
load, or maintain state across computations. Here work costs or separated (anti-affinity) to mini-
we use the term services for all such applications. mize resource interference and correlated fail-
Given their long-running nature, these appli- ures. For optimal service performance, even more
cations have additional demands, such as sup- powerful constraints are useful, such as complex
port for restart, in-place upgrade, monitoring, and intra- and inter-application constraints that collo-
discovery of their components. To avoid using cate services with one another or put limits in the
YARN’s low-level API for enabling such opera- number of specific containers per node or rack.
tions, users have so far resorted to AM libraries When placing containers of services, clus-
such as Slider (Apache Slider 2017). Unfortu- ter operators have their own, potentially con-
nately, these external libraries only partially solve flicting, global optimization objectives. Examples
the problem, e.g., due to lack of common stan- include minimizing the violation of placement
dards for YARN to optimize resource demands constraints, the resource fragmentation, the load
across libraries or version incompatibilities be- imbalance, or the number of machines used. Due
tween the libraries and YARN. to their long lifetimes, services can tolerate longer
To this end, the upcoming Apache Hadoop 3.1 scheduling latencies than batch jobs, but their
release adds first-class support for long-running placement should not impact the scheduling la-
services in YARN, allowing for both traditional tencies of the latter.
process-based and Docker-based containers. This To enable high-quality placement of services
service framework allows users to deploy existing in YARN, Apache Hadoop 3.1 introduces sup-
services on YARN, simply by providing a JSON port for rich placement constraints (Placement
file with their service specifications, without hav- constraints 2017).
ing to translate those requirements into low-level
resource requests at runtime.
The main component of YARN’s service Jobs with SLOs
framework is the container orchestrator, which
facilitates service deployment. It is an AM that, In production analytics clusters, the majority of
based on the service specification, configures the cluster resources is usually consumed by produc-
required requests for the RM and launches the tion jobs. These jobs must meet strict Service
corresponding containers. It deals with various Level Objectives (SLOs), such as completion
service operations, such as starting components deadlines, for their results to be consumed by
given specified dependencies, monitoring their downstream services. At the same time, a large
health and restarting failed ones, scaling up number of smaller best-effort jobs are submitted
and down component resources, upgrading to the same clusters in an ad hoc manner for
components, and aggregating logs. exploratory purposes. These jobs lack SLOs, but
A RESTful API server is developed to allow they are sensitive to completion latencies.
users to manage the life cycle of services on Resource managers typically allocate
YARN via simple commands, using framework- resources to jobs based on instantaneous
independent APIs. Moreover, a DNS server enforcement of job priorities and sharing
enables service discovery via standard DNS invariants. Although simpler to implement and
lookups and greatly simplifies service failovers. impose, this instantaneous resource provisioning
makes it challenging to meet job SLOs without
Scheduling services Apart from the aforemen- sacrificing low latency for best-effort jobs.
tioned support for service deployment and man- To ensure that important production jobs will
agement, service owners also demand precise have predictable access to resources, YARN was
control of container placement to optimize the extended with the notion of reservations, which
performance and resilience of their applications. provide users with the ability to reserve resources
For instance, containers of services are often over (and ahead of) time. The ideas around
Advancements in YARN Resource Manager 7
reservations first appeared in Rayon (Curino This guarantees that periodic production jobs will
et al. 2014) and are part of YARN as of Apache have guaranteed access to resources and thus A
Hadoop 2.6. predictable execution.
nodes with public IPs, and 40% of queue A has Apache Hadoop 2.9 includes a major redesign
to be on dev machines. of TS (YARN TS v2 2017), which separates
the collection (writes) from the serving (reads)
Changing queue configuration Several compa- of data, and performs both operations in a dis-
nies use YARN’s Capacity Scheduler to share tributed manner. This brings several scalability
clusters across non-coordinating user groups. A and flexibility improvements.
hierarchy of queues isolates jobs from each de- The new TS collects metrics at various
partment of the organization. Due to changes in granularities, ranging from flows (i.e., sets of
the resource demands of each department, the YARN applications logically grouped together)
queue hierarchy or the cluster condition, oper- to jobs, job attempts, and containers. It also
ators modify the amount of resources assigned collects cluster-wide data, such as user and queue
to each organization’s queue and to the sub- information, as well as configuration data.
queues used within that department. However, The data collection is performed by collectors
queue reconfiguration has two main drawbacks: that run as services at the RM and at every NM.
(1) setting and changing configurations is a te- The AM of each job publishes data to the col-
dious process that can only be performed by lector of the host NM. Similarly, each container
modifying XML files; (2) queue owners cannot pushes data to its local NM collector, while the
perform any modifications to their sub-queues; RM publishes data to its dedicated collector. The
the cluster admin must do it on their behalf. readers are separate instances that are dedicated
To address these shortcomings, Apache to serving queries via a REST API. By default
Hadoop 2.9 (OrgQueue 2017) allows configu- Apache HBase (Apache HBase 2017) is used as
rations to be stored in an in-memory database the backing storage, which is known to scale to
instead of XML files. It adds a RESTful API large amounts of data and read/write operations.
to programmatically modify the queues. This
has the additional benefit that queues can
be dynamically reconfigured by automated Conclusion
services, based on the cluster conditions or on
organization-specific criteria. Queue ACLs allow YARN was introduced in Apache Hadoop at the
queue owners to perform modifications on their end of 2011 as an effort to break the strong ties
part of the queue structure. between Hadoop and MapReduce and to allow
generic applications to be deployed over a com-
Timeline server Information about current and mon resource management fabric. Since then,
previous jobs submitted in the cluster is key for YARN has evolved to a fully fledged production-
debugging, capacity planning, and performance ready resource manager, which has been de-
tuning. Most importantly, observing historic data ployed on shared clusters comprising tens of
enables us to better understand the cluster and thousands of machines. It handles applications
jobs’ behavior in aggregate to holistically im- ranging from batch analytics to streaming and
prove the system’s operation. machine learning workloads to low-latency ser-
The first incarnation of this effort was the ap- vices while achieving high resource utilization
plication history server (AHS), which supported and supporting SLOs and predictability for pro-
only MapReduce jobs. The AHS was superseded duction workloads. YARN enjoys a vital commu-
by the timeline server (TS), which can deal with nity with hundreds of monthly contributions.
generic YARN applications. In its first version,
the TS was limited to a single writer and reader
that resided at the RM. Its applicability was
therefore limited to small clusters.
Advancements in YARN Resource Manager 9