Professional Documents
Culture Documents
A Review On Storage and Task Scheduling in Heterogeneous Hadoop Clusters
A Review On Storage and Task Scheduling in Heterogeneous Hadoop Clusters
Volume 2, Issue 10, October - 2015. ISSN 2348 4853, Impact Factor 1.317
INTRODUCTION
The data explosion caused by the growth of internet during the last and current decade has urged the
need for Big Data storage and management tool. Data explosion i.e. generation of huge amounts of data
made it difficult to store, manage, and retrieve information from traditional data processing techniques.
As a solution to this problem, Hadoop, a distributed data management platform was developed. Hadoop
is an open source tool developed by Yahoo! Due to its efficiency in handling Big Data, various internet
service providers such as Yahoo, Facebook, Amazon, Twitter, Alibaba and their likes prefer Hadoop for
storage, management and analysis of data.
Hadoop has two components1. Hadoop Distribute File System (HDFS)
2. MapReduce
www.ijafrc.org
www.ijafrc.org
www.ijafrc.org
Size of
Advantage
cluster
Job tracker pulls Small
Efficient
and
job which came
simple
first
from
the
implementation,
oldest queue.
Does not depend
on priority or
size of job.
Fair
scheduler
Capacity
scheduler
Resource
Aware
Scheduling
Deadline
constraint
LATE
SAMR
Technique
Using
historical Small
information read
from
node
to
estimate executing
task
Limitations
Environment
Higher resource
utilization and
overall
performance,
Decrease
workload
Decrease
execution time,
Reduce delay,
Maximum
number of tasks
run on a cluster
Scalable,
Nodes
are
robust,
Improve
response time
Scalable,
Decrease
execution time,
Save
system
resource
Over
Both
burdening of
high capacity
nodes
www.ijafrc.org
L. C., Feng Yan, "Heterogeneous Cores For MapReduce Processing Opportunity or Challenges?,"
IEEE, pp. 1-7, 2014.
[2]
H. L., W. D., F. S. Wei Liu, "Energy-Aware Task Clustering Scheduling," IEEE, pp. 1-4, 2011.
[3]
J. Z., K. L., R. L., Zhuo Tang, "MTSD: A task scheduling algorithm for MapReduce base on deadline
constraints," IEEE, pp. 1-7, 2012.
[4]
A.M., P. H., G., Vrushali Ubarhande, "Novel Data-Distribution Technique for Hadoop in
Heterogeneous Cloud Environments," IEEE, pp. 1-8, 2015.
[5]
N. G., S. M., Mark Yong, "Towards A Reaource Aware Scheduler in Hadoop," 2009.
[6]
Y. B., D. C., M. I., M. S., M. I., I. W., Jord`a Polo, "Deadline-Based MapReduce Workload
Management," IEEE Transaction, pp. 1-14, June 2013.
[7]
S. G., Quan Chen Daqiang Zhang Minyi Guo Qianni Deng, "SAMR: A Self-adaptive MapReduce
Scheduling Algorithm," IEEE, pp. 1-8, 2010.
www.ijafrc.org
X. Zhang, Y. Feng, S. Feng, J. Fan, and Z. Ming, An effective data locality aware task scheduling
method for MapReduce framework in heterogeneous environments, in CSC 11. Hong Kong: IEEE,
Dec. 2011, pp. 235242.
[9]
[10]
[11]
[12]
[13]
[14]
L. C., X. W., Xiaolong Xu, "Adaptive Task Scheduling Strategy Based on Dynamic Workload
Adjustment for Heterogeneous Hadoop Clusters," IEEE Systems Journal, pp. 1-12, 2014.
www.ijafrc.org