Professional Documents
Culture Documents
AN ENHANCED HADOOP HEARTBEAT MECHANISM FOR MAPREDUCE TASK-2018
AN ENHANCED HADOOP HEARTBEAT MECHANISM FOR MAPREDUCE TASK-2018
15 16
Node3
Hadoop Default
Best case
Node2
Node1
Execution Time
15 16 23 10 20 04 33 32 18 30
11 12 27 26 03 35 09 36 17 31
39 21 13 05 19
Slave 3
24 25 38 07 37 14 34 08 22
10s
20s
40s
50s
30s
24 25 38 07 37 14 34 08 22
Slave 2
06 02 29 28 01
06 02 29 28 01
Slave 1
Execution Time
Slave 2 25 07 37 21 14 13 05 24 38 20 23 17 32 34 05 08 12 22 35 01 09 04 18 15 26 28 19
Slave 1 34 28 08 06 02 29 03 27 37 33 07 21 30 16 10 25 13 05 11 36 01 39 19 14 22 31
15 16 23 10 20 04 33 32 18 30 Slave 2 22 01 19 13 05
24 11 12 38 27 26 03 35 09 36
Slave 1 05 01 19 22 25 19
17 06 31 02 39 29
15 16 23 10 20 04 33 32 18 30 24 11 12 38 27 26 03 35 09 36 17 06 31 02 39 29 05 19
Slave 3
25 07 37 21 14 13
10s
20s
30s
40s
25 07 37 21 14 13 22
Slave 2
34 28 08
34 28 08 28
Slave 1
Execution Time
5
0 0 respectively.
0 In this section, HMTS-DC is analyzed to
Hadoop DDP HMTS-DC
capacity(1,2,7) capacity(1,3,11) identify areas where task scheduling could
be improved further. Based on finding an
Fig. 10. Block movement with nodes of different capacity (static env).
enhanced version of HMTS-DC, EHMTS-
DC is proposed by augmenting HMTS-DC
with a historical run time record. The exper-
imental results show that EHMTS-DC has a
Make-span in seconds; Total better performance than HMTS-DC in terms
Block:300; Rep:2; Dynamic Env.
of makespan and data transfer. The access to
4000 3371
3022 2785 historical run time relative computational ca-
2235
1955 1878 pacity allows EHMTS-DC to start reservation
Time in Seconds
2000
early; thus, more local data blocks can be re-
served. EHMTS-DC reduces data movement,
0
Hadoop DDP HMTS-DC and thus the makespan, effectively.
capacity (1,2,7)
5 3 as outlined in Table 3.
1
0 0
0 5.1 Prototype environment: Static
Hadoop DDP HMTS-DC
and Dynamic
capacity(1,2,7) capacity(1,3,11)
Two environments are created to test the task
schedulers. (1) a static environment where all
compute nodes are dedicated nodes and no
Fig. 12. Block Movement of Hadoop, DDP, and HMTS-DC with capacity setting
as in fig.4.14 (dynamic env). other programs/processes are run, with the
exception of the map-reduced jobs assigned to
the poorest, at 2763 s with a data movement of them. (2) The second environment is dynamic,
34. where non-dedicated compute nodes are used.
In figure 19 and figure 20, the setup is In addition to running the map-reduced jobs
five slaves with the replication set to three. assigned to them, these nodes run other pro-
Since it represents a dynamic environment, grams/processes.
DDP has the worst overall performance with To evaluation the performance of the Ha-
a makespan of 1596 and 4796 s, respectively doop scheduler, the hardware configuration
(no data movement for DDP is recorded as and virtual machine described in Table 4 are
the replication for DDP is one). The proposed used and the makespans of jobs with different
Number of Block
3000 2458 2390 5
6
Moved
1646 1604 4
2000 3
4
1000 2
0 0
capacity(1,2,7) capacity
(1,3,11) capacity(1,2,7) capacity(1,3,11)
HMTS-DC EHMTS-DC
HMTS-DC EHMTS-DC
Fig. 13. Makespan of HMTS-DC and EHMTS-DC with different ca- Fig. 14. Block moved of HMTS-DC and EHMTS-DC with different
pacities (static env). capacities.
2000 2
NUmber of Block
1
Moved
1000 1
0 0
capacity(1,2,7) capacity capacity(1,2,7) capacity
(1,3,11) (1,3,11)
Fig. 15. Makespan of HMTS-DC and EHMTS-DC with different ca- Fig. 16. Block moved of HMTS-DC and EHMTS-DC with different
pacities (dynamic env). capacities (dynamic env).
2000
20
1500
893 800 813 10
1000
10
500 2 2
0 0
0 0
Hadoop DDP EHMTS-DC Hadoop DDP EHMTS-DC
Fig. 17. Makespan of Hadoop, DDP and HMTS-DC in 5-slave clus- Fig. 18. Block movement of Hadoop, DDP and HMTS-DC in 5-slave
ter with replication=3 (static). cluster and replication=3 (static).
data sizes are recorded. To evaluate the DDP the tasks assigned. Then, based on the ratio
scheduler, the historical record of the relative captured in the historical record, an appro-
computing speed, i.e., the ratio, of the compute priate number of data blocks are assigned to
nodes is benchmarked by running map-reduce each compute node. More powerful nodes are
jobs and recording the makespan of the job assigned more blocks. To evaluate the HMTS-
and the time taken by the node to complete DC scheduler, jobs assigned to the compute
Number of Block
10000 6
Time In Seconds
Moved
4796 4
5000 3203 2710 2
1083 1596 843 2
0 0 0
0 0
Hadoop DDP EHMTS-DC Hadoop DDP EHMTS-DC
Fig. 19. Makespan of Hadoop,DDP and HMTS-DC in 5-slave clus- Fig. 20. Block movement of Hadoop, DDP and HMTS-DC in 5-slave
ter and replication=3 (dynamic env). cluster with replication=3 (dynamic env).
nodes are executed and Heartbeat is transmit- doop and HMTS-DC (Table 6 and 7) the total
ted between JobTracker and TaskTracker. In block allocated is 120 since data replication is
Hadoop, heartbeats are send to the job tracker, set to two. Therefore, the number of blocks to
containing information such as task status, be completed is half of the total block allocat-
task counters, and data read/write. Following ed, i.e., 120/2=60 blocks. Block allocation is
this, based on the dynamic information cap- dynamically assigned by Hadoop and HMTS-
tured from the heartbeat, a ratio expressing DC.
the relative computing power of each node is Figure 21 depicts the experiment outcome
computed. Unprocessed local blocks within of the three task schedulers. The proposed
each compute node are re-assigned based on HMTS-DC has an average makespan of 757
this ratio. More local blocks are reserved for s, i.e., an improvement of approximately 15%
more powerful nodes. over Hadoop. DDP has the best performance.
690 s. This is followed by HMTS-DC and Ha-
5.2.1 Static prototype environment
doop. DDP outperforms HMTS-DC because
Table 5, 6, and 7 provide the detailed results it is able to optimize the overall block-ratio
of the static environment experiment. In Table (in this case, 60 blocks) to be assigned to each
6, the job to be processed by DDP comprises compute node, whereas HMTS-DC, only op-
60 blocks that are assigned manually to the timizes part of the local blocks (in this case,
compute nodes; only one set of data and no approximately 20 blocks) within each com-
data replication is involved for DDP. For Ha- pute node. HMTS-DC outperforms Hadoop
800 690
Time In Seconds
950
600
900 882
400
200 850
0 800
Hadoop DDP HMTS-DC Hadoop DDP HMTS-DC
Fig. 21. Makespan (static env). Fig. 22. Makespan (dynamic environment).
FIFO because HMTS-DC is able to reserve of by Slave3. DDP is able to complete all 40
some of the local tasks within the faster node. local blocks using the fastest node (slave 3),
By doing so, HMTS-DC reduces the num- whereas HMTS-DC and Hadoop FIFO only
ber of data blocks to be transferred from one finish 38 and 37 blocks, respectively, using the
compute node to the other. In this experiment, fastest node. In this experiment, DDP outper-
from Table 5, 6, and 7, Slave3 of HMTS-DC forms HMTS-DC in a static environment. This
and Hadoop only manages to execute 38 and may not be the case if the computing resources
39 of the 40 local tasks, respectively. Some lo- are dynamic, as shown in the next experiment.
cal tasks in Slave3 are replicated in Slave1 or
5.2.2 Dynamic Prototype Environment
Slave2 and these tasks were executed on the
slower nodes (either Slave1 or Slave2) instead In this experiment, the computational envi-