Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 7

Assignment 2

1. Draw a diagram representing on how the Active Resource Manager writes its states into the
ZooKeeper and how the Fila-Over is handled if the Active Resource Manager fails.

Resource Manager running on only one node and when that node fails, we have a problem and we won’t
be able to run any jobs in our Hadoop cluster. To avoid Resource Manager to be a single point of failure
we need to enable High Availability with Resource Manager. This means we will run another Resource
Manager instance on another node in stand by mode. When the active Resource Manager fails with high
availability configuration in place the stand by resource manager will take the role of the active resource
manager

Persistent Node

Resource manager 1 will also create another znode named znode named ActiveBreadCrumb.
ActiveBreadCrumb znode is not an ephemeral node, it is another type of znode called persistent node
meaning when resource manager 1 goes down or losses the session with ZooKeper, unlike
ActiveStandbyElectorLock znode, ActiveBreadCrumb, which is a persistent znode, does not gets deleted.
Persistent Node

Resource manager 1 will also create another znode named znode named ActiveBreadCrumb.
ActiveBreadCrumb znode is not an ephemeral node, it is another type of znode called persistent node
meaning when resource manager 1 goes down or losses the session with ZooKeper, unlike
ActiveStandbyElectorLock znode, ActiveBreadCrumb, which is a persistent znode, does not gets deleted.

2. Draw a diagram representing how the YARN seperates Resource Management Layer from the
processing layer. Even highlight the Job Tracker's responsibility to split between resource manager and
application manager.
3. a. Write the command in YARN Daemon which allows us to stop a Name Node.

hadoop-daemon.sh stop namenode


b.Write the command in YARN Daemon which allows us to start a datanode and a command
to stop a datanode.

hadoop-daemon.sh start datanode


hadoop-daemon.sh stop datanode

4 . Write a short piece of code which demonstrates on getId() method which helps us to get the globally
unique identifier for the container

public ContainerFactory(ApplicationAttemptId appAttemptId, long appIdLong) {

this.nextId = new AtomicLong(1);

ApplicationId appId =

ApplicationId.newInstance(appIdLong, appAttemptId.getApplicationId().getId());

this.customAppAttemptId =

ApplicationAttemptId.newInstance(appId, appAttemptId.getAttemptId());

5. Draw a diagram representing the Hadoop Speculative Execution which highlights breaking the job into
tasks and to make them run parallely rather than sequential.
6. Draw diagram that shows the architectural design of the HDFS Federation. The diagram should
highlight multiple pools connected to each individual DataNodes

Multiple Namenodes/Namespaces

In order to scale the name service horizontally, federation uses multiple independent
Namenodes/namespaces. The Namenodes are federated; the Namenodes are independent and do not
require coordination with each other. The Datanodes are used as common storage for blocks by all the
Namenodes. Each Datanode registers with all the Namenodes in the cluster. Datanodes send periodic
heartbeats and block reports. They also handle commands from the Namenodes.
Block Pool

A Block Pool is a set of blocks that belong to a single namespace. Datanodes store blocks for all the block
pools in the cluster. Each Block Pool is managed independently. This allows a namespace to generate
Block IDs for new blocks without the need for coordination with the other namespaces. A Namenode
failure does not prevent the Datanode from serving other Namenodes in the cluster.

A Namespace and its block pool together are called Namespace Volume. It is a self-contained unit of
management. When a Namenode/namespace is deleted, the corresponding block pool at the Datanodes
is deleted. Each namespace volume is upgraded as a unit, during cluster upgrade.

7. Write a short piece of code, which sets the full address and IPC port of the NameNode process thus
resulting in two separate configuration options. Use hdfs-site.xml file as a reference in your code.

dfs.ha.namenodes.[nameservice ID] - unique identifiers for each NameNode in the nameservice

Configure with a list of comma-separated NameNode IDs. This will be used by DataNodes to determine
all the NameNodes in the cluster. For example, if you used “mycluster” as the nameservice ID previously,
and you wanted to use “nn1”,“nn2” and “nn3” as the individual IDs of the NameNodes, you would
configure this as such:

<property>

<name>dfs.ha.namenodes.mycluster</name>

<value>nn1,nn2,nn3</value>

</property>

dfs.namenode.rpc-address.[nameservice ID].[name node ID] - the fully-qualified RPC address for each
NameNode to listen on

<property>

<name>dfs.namenode.rpc-address.mycluster.nn1</name>

<value>machine1.example.com:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>machine2.example.com:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.nn3</name>

<value>machine3.example.com:8020</value>

</property>

dfs.namenode.http-address.[nameservice ID].[name node ID] - the fully-qualified HTTP address for each
NameNode to listen on

<property>

<name>dfs.namenode.http-address.mycluster.nn1</name>

<value>machine1.example.com:9870</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.nn2</name>

<value>machine2.example.com:9870</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.nn3</name>

<value>machine3.example.com:9870</value>

</property>

ClusterID

A ClusterID identifier is used to identify all the nodes in the cluster. When a Namenode is formatted, this
identifier is either provided or auto generated. This ID should be used for formatting the other
Namenodes into the cluster.

You might also like