Professional Documents
Culture Documents
YARN - YoussefEtman
YARN - YoussefEtman
Architecture
Before we start there are 6 components that we need to get familiar with (Resource manager – Scheduler –
ApplicationsManager – ApplicationMaster – Node Manager ‐ container)
YARN provides its core services via two types of long‐running daemon: a resource manager (one per cluster) to manage
the use of resources across the cluster, and node managers running on all the nodes in the cluster to launch and monitor
containers.
Resource manager:
: ﻭ ﺑﻳﺗﻛﻭﻥ ﻣﻥ ﺣﺎﺟﺗﻳﻥ. ﻛﻠﻬﺎcluster ﻋﻧﺩﻱ ﻣﻧﻪ ﻭﺍﺣﺩ ﺑﺱ ﻋﻠﻲ ﻣﺳﺗﻭﻱ ﺍﻝ
ApplicationsManager .1
Scheduler .2
ApplicationsManager:
The ApplicationsManager is responsible for:
ﻛﻔﺎﻳﺔ ﺍﻧﻪ ﻳﻘﻭﻡ ﻋﻠﻳﻪresources ﻋﻠﻳﻪnode manager ﺑﻳﺭﻭﺡ ﻳﺷﻭﻑjob ﻟﻣﺎ ﺑﻳﺎﺧﺩ ﻣﻧﻲ ﺍﻝsubmit jobﻳﻌﻧﻲ ﻫﻭ ﺃﻭﻝ ﺣﺎﺟﺔ ﺑﺭﻭﺡ ﺍﻛﻠﻣﻬﺎ ﻋﺷﺎﻥ ﺃ
. ﻭ ﻟﺳﺔ ﻫﻧﺷﺭﺡ ﺩﻭﻝ ﻳﻌﻧﻲ ﺍﻳﻪApplicationMaster ﻋﺷﺎﻥ ﻳﺷﻐﻝ ﻋﻠﻳﻪ ﺍﻝcontainer
. ﺗﺎﻧﻳﺔNodeManager ﻋﻠﻲ ﺍﻱrestart ﻳﻘﺩﺭ ﻳﻌﻣﻠﻪApplicationsManager ﺣﺻﻠﻪ ﻣﺷﻛﻠﺔ ﺍﻝApplicationMaster ﻟﻭ ﺍﻝ
ApplicationMaster:
application ﻋﻧﺩﻱ ﻣﻧﻪ ﻭﺍﺣﺩ ﻟﻛﻝ
The per‐application ApplicationMaster has the responsibility of:
1. Negotiating appropriate resource containers from the Scheduler.
2. Tracking the containers’ status and monitoring for application progress.
ApplicationMaster ﺍﻟﻠﻲ ﺑﻳﺭﻭﺡ ﻳﻘﻭﻡApplicationsManager ﻟﻝapplication/job ﻟﻝsubmit ﺍﺗﻔﻘﻧﺎ ﺍﻭﻝ ﺣﺎﺟﺔ ﺑﻌﻣﻝ
Scheduler:
The Scheduler is responsible for allocating resources to the various running applications.
The Scheduler performs no monitoring or tracking of status for the application. Also, it offers no guarantees about
restarting failed tasks either due to application failure or hardware failures.
There are 3 scheduler types available: FIFO scheduler, Fair scheduler & Capacity scheduler.
available resources ﻋﻧﺩﻫﺎNodeManager ﺑﻌﺩﻫﺎ ﺭﺍﺡ ﻳﻛﻠﻡscheduler ﺍﻝ.scheduler ﻣﻥ ﺍﻝresources ﻁﻠﺏApplicationMaster ﺑﻌﺩ ﻣﺎ ﺍﻝ
. ﺩﻩApplication ﺧﺎﺻﺔ ﻟﻝcontainer ﻋﺷﺎﻥ ﻳﻁﻠﺏ ﻣﻧﻬﺎ ﺗﻘﻭﻡ
NodeManager:
The NodeManager is the per‐machine framework agent who is responsible for containers, monitoring their resource usage
(cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.
Container:
A container executes an application‐specific process with a constrained set of resources (memory, CPU, and so on).
1. A client contacts the resource manager (specifically ApplicationsManager) and asks it to run an application master
process/container. (Step 1)
2. The resource manager then finds a node manager that can launch the application master in a container (steps 2a
and 2b).
3. The ApplicationMaster could simply run a computation in the container it is running in and return the result to the
client. Or it could request more containers from the resource manager (specifically the Scheduler) (step 3).
4. The ApplicationMaster then uses these containers to run a distributed computation (steps 4a and 4b).
Note: When the ApplicationMaster requests resources from the Scheduler it requests 2 things:
Container locality means that the ApplicationMaster requests that the containers being launched are launched on
nodes where the data exists. For example: if the container is going to process an HDFS block, the ApplicationMaster
will request that the container gets launched on any of the nodes that holds a replica of that block.
MapReduce 1 vs YARN:
The distributed implementation of MapReduce in the original version of Hadoop (version 1 and earlier) is
sometimes referred to as “MapReduce 1” to distinguish it from MapReduce 2, the implementation that uses YARN
(in Hadoop 2 and later).
In MapReduce 1, there are two types of daemon that control the job execution process: a jobtracker and one or
more tasktrackers.
In Hadoop 1, JobTracker was responsible for scheduling jobs to TaskTrackers, and also monitoring the progress for
these tasks.
In Hadoop 2, the scheduling is now the ResourceManager’s responsibility & the task progress monitoring is the
ApplicationMaster’s responsibility.
In Hadoop 1, TaskTrackers were responsible for running the tasks and sending progress reports back to the
JobTracker.
In Hadoop 2, NodeManager is responsible for running the tasks and reporting back to the ResourceManager.
Scheduler types:
As we said before there are 3 types of schedulers: FIFO scheduler, capacity scheduler & fair scheduler.
The FIFO Scheduler has the merit of being simple to understand and not needing any configuration, but it’s not suitable for
shared clusters (clusters where there are multiple users submitting multiple jobs). Large applications will use all the
resources in a cluster, so each application has to wait its turn. On a shared cluster it is better to use the Capacity Scheduler
or the Fair Scheduler. Both of these allow long‐running jobs to complete in a timely manner, while still allowing users who
are running concurrent smaller ad hoc queries to get results back in a reasonable time.
In the following examples we have 2 jobs running. A large job that takes a lot of time (job 1) and a small job (job 2).
1.FIFO scheduler
3.Fair scheduler
With the Fair Scheduler (iii), there is no need to reserve a set amount of capacity, since it will dynamically balance
resources between all running jobs. Just after the first (large) job starts, it is the only job running, so it gets all the
resources in the cluster. When the second (small) job starts, it is allocated half of the cluster resources so that each job is
using its fair share of resources.
Note that there is a lag between the time the second job starts and when it receives its fair share, since it has to wait for
resources to free up as containers used by the first job complete. After the small job completes and no longer requires
resources, the large job goes back to using the full cluster capacity again. The overall effect is both high cluster utilization
and timely small job completion.
دﻋواﺗﻛم