Professional Documents
Culture Documents
DISK LATENCY AWARE BALANCING AND BLOCK PLACEMENT STRATEGY-2017
DISK LATENCY AWARE BALANCING AND BLOCK PLACEMENT STRATEGY-2017
Tula: A Disk Latency Aware Balancing and Block Placement Strategy for Hadoop
Abstract—Heterogeneity could occur due to various rea- of Hadoop. [6] handles heterogeneity of nodes by finding a
sons in a Hadoop cluster. This work primarily focuses on metric called compute ratio for measuring the performance
heterogeneity occurring due to varying read/write latency of of the nodes on a relative scale by running a job of 1 GB
disks. In case of similar disk latencies among the datanodes,
balancing the blocks uniformly is a suitable choice. However, size on each node.
with time, the disk latencies can increase due to mechanical Among these works, only [4], [5] use the rebalancing
problems and bad sectors. Further, the disks which crash and approach to load balance the cluster. The rest of them work
become non-functional are replaced with newer disks which on placing the blocks as per their placement strategy when
could be of newer generation and can have greater RPM.
This leads to heterogeneity in terms of disk in the cluster
a file is uploaded to the cluster. However, these works do
which is otherwise homogeneous, and balancing uniformly not take the disk read/write latencies into account while
according to disk utilization may not give optimal job runtime. performing the balancing operation. In this work, we observe
To address this issue we propose a disk latency aware balancer, that the disk’s performance may degrade with time due
which balances the cluster taking both disk latency and disk to mechanical failures and bad sectors. Also, disks may
space utilization into consideration. This strategy for balancing
makes sure that a low latency disk gets higher number of blocks
crash completely and get replaced with newer disks with
in comparison to high latency disk. Furthermore, we introduce better RPM. This creates heterogeneity in the cluster and
a custom block placement strategy considering disk latency and balancing with disk utilization might not be optimal. The
other factors. Our preliminary results show an improvement tasks running on nodes having high disk latency will be
of upto 20% in job runtime. slower since most of the jobs are data intensive. Moreover,
Keywords-Load Balancing, Block Placement, Disk Latency tasks running on low latency nodes will finish fast. These
nodes will be used to run more tasks than the slower nodes.
I. I NTRODUCTION When the blocks are uniformly distributed, tasks running
Hadoop [1] has become the de facto standard for Big on the fast node will also access the blocks from the slower
data analytics in commodity clusters. The hadoop distributed nodes. This causes the slow nodes an additional overhead of
file system (HDFS) [2] is one of the core components streaming the blocks to a remote task which further degrades
of Hadoop. In HDFS, a file is partitioned into blocks of the performance of tasks on these nodes. Hence, to improve
equal size (usually 64 MB, 128 MB, ...) and each block the performance of jobs in such a scenario, we propose Tula,
is replicated (default 3) across the cluster. These replicas a disk latency aware balancer (LatencyBalancer) and block
are distributed in the cluster according to the default block placement strategy for Hadoop clusters.
placement policy. The default block placement policy does The major contributions of this paper are:
not ensure uniform block distribution in the cluster due to • A disk latency aware block placement strategy.
various competing considerations. Moreover, imbalance also • A modified balancer algorithm called LatencyBalancer
occurs due to node failures, addition of nodes or racks, node which uses disk latencies as well as disk utilization to
upgrades, etc. Hadoop provides a utility called balancer1 , balance the cluster.
which balances the HDFS according to disk utilization. • Experimental evaluation and comparison of Tula with
Load balancing in Hadoop has been explored in many Hadoop’s default block placement and default balancer.
previous works. [3] proposes a block placement policy which
This paper is organized as follows. Section II discusses
adapts the block placement according to the disk utilization.
the related background on Hadoop, default balancer and
[4] proposes a fully distributed approach to load rebalancing.
block placement strategy, Zero Copy API, Message filter
[5] proposes a dynamic algorithm to balance the workload
framework and discusses the related work. Section III de-
between the racks of a cluster by analysing the log files
scribes the architecture and implementation of Tula. Section
1 http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ IV evaluates it empirically. Finally, we conclude and discuss
HdfsUserGuide.html#Balancer the future work in Section V.
2854
is present) close to the node where map task is running. and syscallwrapper.cc and one filter file is created Filesys-
Reading non local data by mapper tasks can cause network Filter.cc. sendfile64 gets triggered when Hadoop blocks are
congestion and can degrade the performance of the current read as discussed in Section II.
job, as well as other jobs running in the cluster. Although, The sendfile64 entry point happens to be the macro
one can reduce this problem by increasing the replication SY SCALL DEF IN E4(sendf ile64, ..) which is defined
factor, but it is costly from a cloud provider’s perspective. in readwrite.c. This calls the sys sendf ile64 wrapper.
In a homogeneous cluster, where all the nodes perform This in turn call f ilter sendf ile64 in FilesysFilter.cc which
the same, it is easy to balance the blocks in the cluster checks if it is a Hadoop block read request. For doing this, it
based on the disk utilization. The default balancer utility uses getName(fileDescriptor) function to return the filename
provided by Hadoop comes to our rescue. But, when there with complete path in the local file system corresponding to
is some level of heterogeneity in the cluster due to disk the file descriptor supplied. For a hadoop block read request,
failure or disk performance degradation, balancing uniformly it measures the latency for a cumulative read requests of 64
will not yield good results. The primary reason for disk MB (default size). It exports this value in /proc filesystem,
failure is bad blocks, which can be inspected by monitoring which is later used by HDFS. Given below is the pseudo
SMART values [11]. An increase in reallocated sector count code for important changes in message filter.
indicates increase in bad sectors [12], [13]. Moreover, non-
functional disks get replaced with newer ones with better Listing 1: Find read latency of datanode
RPMs in cluster. Also the disks that have many bad sectors class FileSysFilter:public MessageFilter
will perform poorly. Due to this heterogeneity, the map tasks {
running on the nodes with faster disks can perform better protected:
compared to the slower nodes. Hence, these nodes should MessageFilter *filesysfilter;
public:
be allocated more blocks of data compared to slower nodes. FileSysFilter(MessageFilter
Otherwise, tasks on faster nodes might access non local data, *filesysfilter);
that too from slower nodes. This motivates us to develop a int filter_open(const char* filename,int
new balancer and block placement algorithm for Hadoop flags,umode_t mode);
which takes disk latency into account. ssize_t filter_sendfile64(int out_fd, int
in_fd,loff_t pos,size_t count, char
III. T ULA : A RCHITECTURE AND I MPLEMENTATION *flag);
};
ssize_t FileSysFilter::filter_sendfile64(int
out_fd, int in_fd,loff_t pos,size_t
count, char *flag) {
ssize_t size=1;
struct timeval start,end;
char* filename = getName(in_fd);
if(filename != NULL) {
if(file is stored in HDFS ) {
Export latency if cumulative read
requests is 64 MB.
}
}
return size;
}
2855
by protoc compiler, and a java file DatanodeProtocolPro- init function is used to classify the datanode as high,
tos.java with all getter and setter functions for all the types aboveAverage, belowAverage and low w.r.t., a threshold
defined in protocol buffer message structure like setLa- value. After this step, the Latency Balancer tries to balance
tency(long value) etc., gets generated. the cluster in multiple iterations by first matching high with
The file DatnodeProtcol.java in the package low, then high with belowAverage and aboveAverage with
org.apache.hadoop.hdfs.server.protocol declares the belowAverage, and then moves the blocks. For each block
interface DatanodeProtocol which is used by datanode to be moved, a proxy source (which is comparatively free
to communicate with the namenode. Both the namenode in terms of CPU, memory and network utilization) is found
and datanode implements it. which can be the node itself (since every block has many
The file DatanodeProtocolClientSideTranslatorPB.java in replicas). Then the block is moved to the destination node.
the package org.apache.hadoop.hdfs.protocolPB defines the After the operation completes successfully, the block at the
class DatanodeProtocolClientSideTranslatorPB which im- actual source node is deleted.
plements the DatanodeProtocol interface. It translates the
client side request made on DatanodeProtocol interface Algorithm 1 Initialize average latency utilization product
to RPC server implementing DatanodeProtocolPB. This is function initAvgOfLatencyUtilizationProduct
where the BlockReports or Heartbeat messages gets con- Input: List of DatanodeStorageReport, reports
HashMap of datanode IP address and latency, laten-
structed using and namenode rpcProxy object. In the block- cies
Report function, while building the block report we add the foreach StorageType t : StorageTypes do
latency of the datanode. double sum = 0;
The file DatanodeProtocolServerSideTranslatorPB.java foreach DatanodeStorageReport r : reports do
in the package org.apache.hadoop.hdfs.protocolPB defines foreach StorageReport s : r.getStorageReports() do
if t == s.getStorage().getStorageType() then
the class DatanodeProtocolServerSideTranslatorPB which sum += latencies.
implements the DatanodeProtocolPB. The blockReport func- get(r.getDatanodeInfo().getIpAddr())*
tion is used by the namenode to retrieve the value of getUtilization(r, t);
latencies of all datanodes by using the getter function on
the request object of type BlockReportRequestProto. avgProducts.set(t, sum/reports.size());
2856
the datanode latencies from respective /proc filesystem to Preliminary results show that the job runtime has improved
the namenode. The namenode maintains a Hashmap of read by up to 20%.
latencies for each datanode in the cluster. Given below is the
A. Default Balancer vs Latency Balancer
algorithm for the custom block placement. The algorithm
1) Environment: All experiments were done on a 3-node
(VMs) homogeneous cluster connected by 1Gbps LAN. The
VMs have been configured such that master node’s disk
latency is half of the other two nodes. Each node has Intel(R)
Xeon(R) CPU E5-1620 processor with 4 cores having 3.6
Ghz CPU frequency, L3 cache of 10240 KB, 4 GB RAM
and 8 GB swap space, and runs BOSS MOOL operating
system. We use Hadoop 2.7.1 source code for implementing
the Latency Balancer.
2) Datasets: We use Pizza & Chilli corpus [14] as the
input data for word count and word mean programs. The
input file is 10 GB.
2857
2) Datasets: Two datasets of size 2 GB and 2.5 GB is R EFERENCES
generated using Teragen program given in Hadoop. [1] “Apache Hadoop,” http://hadoop.apache.org.
3) Results: We use a complete heterogeneous setting to
compare the default block placement and our custom block [2] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The
placement policy by running Terasort job on 2 GB and 2.5 hadoop distributed file system,” in 2010 IEEE 26th sym-
GB data respectively. Figure 4a and 4b show the blocks posium on mass storage systems and technologies (MSST).
IEEE, 2010, pp. 1–10.
allocated to various nodes by the default block placement
policy of Hadoop and our custom block placement policy for [3] N. E. Pius, L. Qin, F. Yang, and Z. H. Ming, “Optimizing
2 GB and 2.5 GB datasets respectively, which are generated hadoop block placement policy & cluster blocks distribution,”
by Teragen. Figure 4c show the corresponding runtime vol, vol. 6, pp. 1224–1231, 2013. [Online]. Available:
comparison. We observe that job’s performance improves http://waset.org/Publications?p=70
in a heterogeneous environment when blocks are allocated [4] H.-C. Hsiao, H.-Y. Chung, H. Shen, and Y.-C. Chao, “Load
using custom block placement strategy. rebalancing for distributed file systems in clouds,” IEEE
transactions on parallel and distributed systems, vol. 24,
no. 5, pp. 951–962, 2013.
2858