Professional Documents
Culture Documents
Improved Heuristic Job Scheduling Method To Enhance Throughput For Big Data Analytics
Improved Heuristic Job Scheduling Method To Enhance Throughput For Big Data Analytics
Improved Heuristic Job Scheduling Method To Enhance Throughput For Big Data Analytics
Abstract: Data-parallel computing platforms, such as Hadoop and Spark, are deployed in computing clusters for big
data analytics. There is a general tendency that multiple users share the same computing cluster. The schedule of
multiple jobs becomes a serious challenge. Over a long period in the past, the Shortest-Job-First (SJF) method has
been considered as the optimal solution to minimize the average job completion time. However, the SJF method
leads to a low system throughput in the case where a small number of short jobs consume a large amount of
resources. This factor prolongs the average job completion time. We propose an improved heuristic job scheduling
method, called the Densest-Job-Set-First (DJSF) method. The DJSF method schedules jobs by maximizing the
number of completed jobs per unit time, aiming to decrease the average Job Completion Time (JCT) and improve
the system throughput. We perform extensive simulations based on Google cluster data. Compared with the SJF
method, the DJSF method decreases the average JCT by 23.19% and enhances the system throughput by 42.19%.
Compared with Tetris, the job packing method improves the job completion efficiency by 55.4%, so that the computing
platforms complete more jobs in a short time span.
Key words: big data; job scheduling; job throughput; job completion time; job completion efficiency
Resource density. If the set of jobs has a short time span and a large
Job 2 number of jobs, then the job set is dense; otherwise, the
Job 1 Job 3
Job 4 job set is sparse. Thus, the heuristic method is called the
Job 5
4s 8.2s Time Densest-Job-Set-First (DJSF) method. Our experiments
(a) Illustration of the SJF method. The SJF method prefers to schedule the shortest demonstrate that the DJSF method can decrease the
job, i.e., Job 1. However, Job 1 occupies all resources, and other jobs are delayed. The
average JCT is 7.36 s.
average JCT and increase the system throughput by
improving the job completion efficiency.
Resource
Job 2
Before the DJSF method schedules jobs, we need
Job 3 Job 1 to pack jobs into the set of jobs. Existing packing
Job 4
Job 5 schedulers, such as Tetris[6] , use vectors to sketch the
4.2s 8.2s Time
(b) Illustration of a heuristic job scheduling method. Because Jobs 2–5 occupy less
amount of resources requested by jobs and the available
resources, they are packed together and executed concurrently. The average JCT is 5 s. amount of resources on machines. Tetris matches a job to
Fig. 1 Comparison of the SJF method and a heuristic job a machine if the dot product between the resources vector
scheduling method. The heuristic job scheduling method of the job and that of the machine is high. However,
%.
decreases the average JCT by 32.07% traditional packing methods aim to maximize resources
usage but do not take the completion efficiency of the job
Jobs 1–5. The JCT of Job 1, which is the shortest among set into consideration. As a result, long and resources-
all jobs, is 4 s. However, Job 1 consumes all resources. expensive jobs may be packed together with short and
The other four jobs have longer JCT but request a small resources-cheap jobs. This condition leads to low job
amount of resources. The JCT of the other jobs is 4.2 s. completion efficiency.
We adopt two heuristic methods, the SJF method and a In addition, the job completion efficiency of the
heuristic job scheduling method, to schedule the batch of job set determines the performance of the DJSF
jobs. The job scheduling examples of the two methods method. To improve the completion efficiency of
are illustrated in Figs. 1a and 1b, respectively. Although the job set, we formalize the job packing problem,
the SJF method schedules the shortest job with the which involves the minimization of resources fragments
highest priority, the average JCT in Fig. 1a is still high. and the optimization of the job completion efficiency.
By contrast, the heuristic job scheduling method prefers Nevertheless, the optimization model is computationally
to schedule other jobs. Because Jobs 2–5 request less expensive to solve. To solve the problem, we propose a
resources, these jobs are packed together. This condition new job packing method to pack jobs in a dense manner.
provides a desirable opportunity to increase the number The job packing method enhances the job completion
of parallel jobs and decrease the average JCT. Compared efficiency of packed jobs in two ways. Firstly, we employ
with the SJF method, the heuristic job scheduling method a JCT alignment technique to align the JCT of the job
decreases the average JCT by 32.07%. set. The JCT alignment method adopts the k-means
The heuristic method performs better because of the clustering algorithm to group jobs by the length of JCT.
high job completion efficiency, which we define as the With the JCT alignment, jobs that have roughly the same
number of completed jobs per unit time. The value of JCT are placed in the same group. We analyze the JCT
job completion efficiency is the rate of the number of distribution in the Google trace[7] , which shows that the
completed jobs to the time span. In Fig. 1b, we regard majority of jobs are very close in terms of the JCT. This
Jobs 2–5 as the set of jobs. The time span of the job factor provides a desirable opportunity to align the length
set is 4.2 s. The job completion efficiency of the job set of the JCT. After the JCT alignment, grouped jobs are
is 4=4:2, which is higher than that of Job 1. From the packed by the job packing method, aiming to improve
perspective of the throughput, the high job completion the job completion efficiency. The job packing method
efficiency indicates that the heuristic scheduler takes a prefers to pack jobs that request a small amount of
short amount of time to complete a large number of jobs. resources into the job set. With the job packing method,
Inspired by this observation, we propose an improved the number of parallel jobs in the same job set increases.
heuristic job scheduling method, which schedules jobs The workflow of our scheduler is shown in Fig. 2.
by prioritizing the job set that has the highest job We design the improved heuristic job scheduling
completion efficiency. Intuitively, the job completion method in Fig. 2 as a pluggable module, which
efficiency is regarded as an analogy of a physics term can be used in current resources managers, such as
346 Tsinghua Science and Technology, April 2022, 27(2): 344–357
7HWULVYV'5) 7HWULVYV)S
make JCT alignments. In Section 5, we evaluate the 100
Function (CDF)
N2 .T1 C T2 /. In this case, the sum of JCT in J1 and Next, we expand Theorem 1 to the case where over
J2 is three job sets, i.e., J1 , J2 , and J3 , are scheduled. We
N1 T1 C N2 .T1 C T2 / (2) sort them by the value of the job completion efficiency.
Assume that JCE1 > JCE2 > JCE3 . According to
Case 2 If a scheduler prefers to schedule the job
Lemma 1, we regard J2 and J3 as a new set of jobs.
set with a smaller job completion efficiency, then the
scheduling preference is J2 ! J1 . The completion time Let J2 0 denote the new job set. JCE02 denotes the job
of J2 is T2 . The sum of JCT of all jobs in J2 is completion efficiency of J2 0 . According to Lemma 1,
approximately N2 T2 . Similarly, the completion time we have JCE2 > JCE02 > JCE3 . Then the scheduling
of J1 is T1 C T2 . The sum of JCT in J1 is approximately problem becomes the case in Theorem 1, where two job
N1 .T1 C T2 /. In this case, the sum of JCT in J1 and sets, J1 and J2 0 , are scheduled. According to Theorem
J2 is 1, we prefer to schedule J1 because the job completion
N2 T2 C N1 .T1 C T2 / (3) efficiency of J1 is larger than those of the other job sets.
Similarly, after scheduling J1 , we prefer to schedule
Because J1 has a larger job completion efficiency J2 rather than J3 . The final scheduling preference is
than J2 , we have N1 =T1 > N2 =T2 . Hence, we have J1 ! J2 ! J3 , which achieves the smallest average
N1 T2 > N2 T1 . We compare Formula (2) with JCT.
Formula (3). The sum of JCT in Case 1 is smaller than To sum up, Theorem 1 can be recursively applied in
that in Case 2. We make a conclusion that for any two job multiple-job-set cases. We make the following general
sets J1 and J2 , the job set with a larger job completion theorem.
efficiency should be preferred for scheduling to achieve Theorem 2 Given multiple job sets, the job set with
a smaller average JCT. the largest job completion efficiency should be scheduled
We extend the above conclusion to a new case where with the highest priority to decrease the average JCT.
three job sets are scheduled. To extend the conclusion,
we introduce the following lemma. 3.2 DJSF method
Lemma 1 Given that any two job sets Ja and Jb can According to Theorem 2, we design an improved
be executed serially, the two job sets are scheduled in heuristic job scheduling method called the DJSF method.
a successive order. We regard Ja and Jb as a new job As the job completion efficiency is the rate of the number
set, denoted by Jc . Let JCEa and JCEb denote the job of jobs to the length of the time span, we intuitively
completion efficiency of Ja and Jb , respectively. We regard the job completion efficiency as a physics term
have density. If the set of jobs has a short time span and
minfJCEa ; JCEb g < JCEc < maxfJCEa ; JCEb g: a large number of jobs, then the set of jobs is dense.
Proof Let Na and Nb denote the number of jobs Otherwise, the set of jobs is sparse if it has a few jobs
in Ja and Jb , respectively. Let Ta and Tb denote the and a long time span. Algorithm 1 shows the core idea
time span of Ja and Jb , respectively. We have JCEa D of the DJSF method. The input of the DJSF method
Na =Ta and JCEb D Nb =Tb . For the job set Jc , the is a batch of job sets. The DJSF method calculates the
number of jobs in Jc is Na C Nb . The time span of job completion efficiency of each job set. The DJSF
Jc is Ta C Tb . Thus, the job completion efficiency of method prefers to allocate resources to the job set with
Jc is JCEc D .Na C Nb /=.Ta C Tb /. Assume that the
job completion efficiency of Ja is larger than that of Jb . Algorithm 1 DJSF method
Owing to JCEa > JCEb , i.e., Na =Ta > Nb =Tb , we have Input: The set of jobs j1 , j2 , : : : ; jN .
Na Tb > Nb Ta . Hence, we have .Na C Nb / Tb > Input: The available resources capacity C .
1: for the m-th job set, m D 1; : : : ; M do
Nb .Ta C Tb /. The following formula holds:
2: Calculate the job completion efficiency JCEm .
.Na C Nb /=.Ta C Tb / > Nb =Tb (4)
3: Sort all job sets in decreasing order of their job completion
Hence, JCEc > JCEb is proven. efficiency.
Similarly, because Na Tb > Nb Ta , we have Na 4: for the m-th job set, m D 1; : : : ; M do
.Ta C Tb / > .Na C Nb / Ta , 5: if remaining resources capacity C is sufficient then
6: Allocate resources to Jm .
Na =Ta > .Na C Nb /=.Ta C Tb / (5) 7: Update the amount of remaining resources.
Hence, JCEa > JCEc is proven.
Machine resource
<0.2, 0.2, 0.2>
<CPU, RAM, disk>
40
efficiency.
We leverage the cumulative distribution curve of JCT 30
to make a JCT alignment. Figure 5 shows the illustrative 20
example of the cumulative distribution curve of JCT. All
jobs are sorted by the length of JCT. Jobs that have the 10
same JCT are represented as a dot on the curve. To make 0
a JCT alignment, the jobs that are at the same point on 1 5 25 125 625 3125 15 625
the curve should be packed together. Real JCT-log5 scale (s)
Owing to Constraint (11), the sum of resources Fig. 6 Number of jobs changes with real JCT.
Zhiyao Hu et al.: Improved Heuristic Job Scheduling Method to Enhance Throughput for Big Data Analytics 351
finds which group, a data point belongs to. The k-means than X2 . Assume that we pack the jobs, of which the
clustering algorithm is one of the simplest clustering JCT is larger than X1 but smaller than X2 , into a new
techniques, which is widely used in biometrics, medical job set. The number of jobs in the job set is n2 n1 . The
imaging, and related fields. The k-means clustering time span of the job set is X2 . Thus, the job completion
algorithm determines the group where data points should efficiency of the job set is .n2 n1 /=X2 . To enhance the
be assigned into rather than the user instructing to build value of the job completion efficiency, the job packing
the group of data points based on heuristic rules. method maximizes the value of n2 n1 .
We are motivated to leverage the k-means Algorithm 3 shows the main steps of the job packing
clustering algorithm[12] to group jobs with different JCT. method. After the JCT clustering method outputs the
Algorithm 2 shows the main steps of the JCT clustering group of jobs, the job packing method packs jobs into the
algorithm. The k-means clustering based job packing job set. The job packing method calculates the average
method places a job to the group of jobs if the sum of JCT of each group. Then the job group that has smaller
the difference between the centroid of the group and the average JCT is preferred to be used in Algorithm 3.
JCT of the jobs is at a minimum. A less variation in the Within each group, the job packing method sorts jobs
group results in the jobs within the group to have similar by the amount of resources. Jobs that request a small
JCT. The k-means based JCT clustering algorithm is amount of resources are packed into the job set with high
known to have a time complexity of O.N 2 /[13] . priorities. If the remaining resources is not enough for
In addition to the JCT alignment, the job packing packing a job into the job set, then the job is packed into
method should increase the number of jobs packed in the next job set. When the last job is packed, the job
the job set. Given a group of jobs, the job packing packing method stops. Algorithm 3 has two For loops.
method prefers to pack jobs that request a small amount The complexity of the job packing method is O.GN /,
of resources. This method aims to enhance the number where G is the number of groups.
of parallel jobs in the job set. In Fig. 5, n1 is the number
of jobs, of which the JCT is smaller than X1 . Similarly, 5 Evaluation
n2 is the number of jobs, of which the JCT is smaller We evaluate the performance of our method, Tetris, and
Algorithm 2 k-means-based JCT clustering method
Input: A batch of jobs J D fj1 ; j2 ; : : : ; jN g, the resources Algorithm 3 Job packing method
fR1 ; R2 ; : : : ; RN g, the JCT fJCT1 ; JCT2 ; : : : ; JCTN g, and Input: A batch of jobs fj1 ; j2 ; : : : ; jN g, the
the available resources capacity C . amount of resources fR1 ; R2 ; : : : ; RN g, the JCT
N
X fJCT1 ; JCT2 ; : : : ; JCTN g, and the available resources
1: Estimate the number of groups G D Rn =C .
capacity C .
nD1
2: Initialize the centroid for each group. 1: Record remaining amount of resources, Cremain D C .
5: Take all jobs in the i -th group. 4: Call k-means based JCT clustering method.
6: Compute the average value of their JCT JCT. 5: Calculate the average JCT of jobs in each group.
9: Clustering(J , G). 8: Get Gi , i.e., the number of jobs in the i -th group.
10: else if Centroids of all groups are not updated then 9: Get fj1i ; : : : ; jG
i
i
g, i.e., all jobs in the i-th group.
11: Stop the algorithm. 10: Sort jobs in increasing order of their resources.
12: return The group of jobs. 11: for the n-th job, n D 1; : : : ; Gi do
12: Get Rni , i.e., resources that are requested by Jni .
function Clustering(J , G) 13: if Rni 6 Cremain then
13: for the n-th job, n D 1; : : : ; N do 14: Get k, the number of job sets.
14: for the i-th group, i D 1; : : : ; G do 15: Pack Jni into the k-th job set.
15: Get Wi , the centroid of the i -th group. 16: Update Cremain D Cremain Rni .
16: Calculate the difference Di D jWi JCTn j. 17: else
17: Calculate i 0 D arg minG
iD1 Di . 18: Pack Jni into the .k C 1/-th job set.
= Find the group with a minimal difference = 19: k D k C 1.
18: Assign the n-th job to the i 0 -th group. 20: Update Cremain D C Rni .
352 Tsinghua Science and Technology, April 2022, 27(2): 344–357
the SJF method. We perform simulations in a computer better than the FS, CS, and DRF methods, and thus, we
equipped with an Intel(R) Core(TM) i7-4700MQ CPU compare Tetris with our heuristic method for simplicity.
2.40 GHz and 8 GB RAM. Our workloads are taken from 5.2 Simulation result
the Google trace that is widely accepted as a data-parallel
benchmark[14–16] . We measure the completion time of 1000 jobs and show
the JCT distribution of Tetris, SJF method, and DJSF
5.1 Simulation settings method. In addition, we record the changing cluster
Workload: Google cluster data provide a detailed state, including the system throughput, CPU utilization,
task duration, resources requirements, and machine memory, and disk space. We evaluate the job completion
constraints. We extract 1000 jobs from the Google trace. efficiency of Tetris and our job packing method. Then
The 1000 jobs are submitted in one batch. Note that we evaluate the algorithm overhead incurred by Tetris,
it is very slow to simulate the execution of 1000 jobs SJF method, and DJSF method.
because the real durations in the Google trace are long. JCT distribution of completed jobs: We record the
We set the simulated program to run 10 times faster than JCT of 1000 jobs, which are completed by Tetris, SJF
the trace. The JCT distribution of 1000 jobs is shown in method, and DJSF method. The JCT distribution of
Fig. 7. The average JCT is 685.39 s. Tetris is shown in Fig. 8a. Tetris completes a different
Methods comparison: We compare our methods number of jobs when the time increases from 0 to over
with two scheduling methods, Tetris and the SJF method. 3000 s. In the beginning, Tetris completes jobs at a high
Tetris is a multi-resources packing method to enhance speed, which lasts for a short time and then slows down
resources utilization. Then we evaluate the SJF method. halfway. As a comparison, the SJF method completes
The SJF method prefers to schedule the job with the jobs at a stable but low speed, as shown in Fig. 8b. The
least JCT. According to past research[6] , Tetris performed average JCT of Tetris and SJF method is 1696.3 and
better than default scheduling algorithms in YARN and 1695.4 s, respectively. The DJSF method achieves the
Mesos, including the FS, CS, and DRF methods, as highest job completion efficiency.
shown in Fig. 3. The result shows that Tetris performs In Fig. 8c, large considerable jobs are completed at
the first 1000 s. We infer that the DJSF method prefers to
60
schedule jobs in the job set that has a high job completion
50 efficiency. After the first 1000 s, the DJSF method
completes jobs at a low speed. We infer that the DJSF
40
Number of jobs
80 80 80
The number of jobs
Number of jobs
Number of jobs
60 60 60 6060
40 40 40 4040
20 20 20 2020
0 0 0 0 0
0 1000
1000 2000
2000 3000
3000 0 0 10001000
100020002000
2000 3000
3000 0 0 1000
1000 2000
2000 3000
The JCT (s) The
The JCT
JCT(s)
(s) The JCT (s)
(a) JCT Distribution of the completed jobs by Tetris (b) JCT distribution of the completed jobs by SJF method (c) JCT distribution of the completed jobs by DJSF method
Fig. 8 JCT distribution of 1000 jobs that are completed by Tetris, SJF method, and DJSF method. The majority of jobs that
are completed by the DJSF method have shorter JCT.
80
Zhiyao Hu et al.: Improved Heuristic Job Scheduling Method to Enhance Throughput for Big Data Analytics 353
Throughput
and the SJF method, respectively. The minority of 1000 0.06
jobs suffer from the negative value of the JCT reduction.
0.04
This finding indicates that the DJSF method takes a
longer time to complete these jobs than Tetris and the SJF 0.02
method. Nevertheless, the DJSF method achieves a good
job reduction for the majority of jobs. The highest JCT 0 1 2 3 4
Time ( 103 s)
reduction of the DJSF method against other compared
methods is up to 80%. Fig. 10 System throughput when the scheduling time
System throughput: We evaluate the system increases.
throughput of the simulated Spark cluster. The system DJSF method more effective, we propose the k-means
throughput P is calculated as follows: job packing method to pack jobs in a dense manner
N so that the job completion efficiency of the job set is
P D (13)
T improved. Figure 11 shows the job completion efficiency
where T denotes the cumulative working time in the of all job sets.
simulated Spark cluster and N denotes the cumulative Tetris packs all jobs into 125 job sets, whereas our
number of jobs that are completed in a simulated Spark packing method packs all jobs into 127 job sets. The job
cluster. Of note, Eq. (13) is similar to the job completion completion efficiency of Tetris changes from 0.023 to
efficiency in Eq. (1). 0.058. This is because Tetris does not consider the job
Figure 10 shows the system throughput when 1000 completion efficiency of job sets. In the job set packed
jobs are scheduled by Tetris, SJF method, and DJSF by Tetris, although the resources packing score is high,
method, respectively. The system throughput of the the job completion efficiency is low. As a comparison,
SJF method is the least and hardly changes from the the k-means job packing method maximizes the number
beginning to the end. Tetris achieves a higher system of jobs that are packed together. Thus, the highest
throughput than the SJF method at the first 1000 s. This job completion efficiency is up to approximately 0.12.
finding indicates that Tetris can complete more jobs than Nevertheless, the job completion efficiency of individual
the SJF method at the first 1000 s. The DJSF method job set is low and approximately 0.04. Thus, long and
achieves the highest system throughput at the first 3000 s. resources-expensive jobs are packed in these job sets.
Job completion efficiency of the job set: We Resource utilization: We measure the resources
compare the k-means job packing method with Tetris usage when the jobs are scheduled by Tetris, SJF method,
in terms of the job completion efficiency. To make the and DJSF method. The bottleneck resource is memory.
Fraction of jobs
1.00
0.12
0.75 Tetris
0.50 X 40.00 k-means job packing
0.25 X 20.00
Y 0.566
0.10
0 Y 0.285
JCE of job sets
--20 0 20 40 60 80
JCT reduction (%) 0.08
(a) JCT reduction of DJSF method against SJF method
0.06
Fraction of jobs
1.00
0.75
0.50 X 40.00 0.04
Y 0.7
0.25 X 20.00
Y 0.351
0
--20 0 20 40 60 80 0.02
0 20 40 60 80 100 120
JCT reduction (%) Number of job sets
(b) JCT reduction of DJSF method against Tetris
Fig. 11 Job completion efficiency of different job sets. The
Fig. 9 JCT reduction of DJSF method against SJF method average job completion efficiency of Tetris and k-means job
and Tetris. packing method is 0.0426 and 0.0662, respectively.
354 Tsinghua Science and Technology, April 2022, 27(2): 344–357
Owing to the limited amount of memory, no more jobs available resources to increase the number of completed
can be executed concurrently. The amount of remaining jobs at an early scheduling time. We regard this as an
CPU and disk space resources is sufficient in Figs. 12a important cause that the DJSF method achieves the least
and 12c. However, the memory resources are fully used average JCT.
in Fig. 12b. Algorithm overhead: We evaluate the algorithm
Compared with Tetris and the SJF method, the DJSF overhead incurred by Tetris, SJF method, and our
method leads to a higher CPU and disk space resource method. Our scheduling method incurs two main
consumption at the first 1000 s. Then the CPU and disk algorithm overheads that are incurred by the job packing
space resource consumption decreases at the remaining method and the DJSF method. Table 3 shows the
time. The DJSF method completes more jobs at the first algorithm overhead incurred by Tetris, SJF method, and
1000 s and thus, its resource consumption is higher. our method. The algorithm overhead incurred by the
To sum up, the DJSF method makes full use of heuristic job scheduling method is far smaller than that
by the other methods.
1.0
Tetris The SJF method incurs the least algorithm overhead
SJF
0.8 DJSF because the SJF method sorts the JCT only once. When
free resources are available, the SJF method does not
CPU utilization
Ernest was tightly bound to the application type, which the JCT. The prediction error changes the job scheduling
limited the cross-application use of Ernest models. order in a straightforward way. The results show that
Bei et al.[19] proposed to construct two ensembles of the prediction error degrades the performance of the SJF
performance models using a random-forest approach method. The Tetris method is a job packing method, and
for the map and reduce stages, respectively. However, thus, Tetris is not impacted by the prediction error of
it is difficult to model the duration of each operation. JCTs.
Yu et al.[20] proposed a Hierarchical Model (HM), Besides, several works involve the scheduling
aiming to combine a number of individual sub-models problem. Reducing the energy consumption of the
in a hierarchical manner. The HM was agnostic to the storage systems disk read/write requests plays an
execution of various operations. Hu et al.[21] designed a important role in improving the overall energy efficiency
deep learning based prediction model, which employed of high-performance computing systems[22] . Static
a graph convolutional network to learn the execution of placement and dynamic management are two types of
Spark jobs. To sum up, the above prediction models can Virtual Machine (VM) management methods[23] . VirtCo
be seamlessly applied to predict the JCT and combined is proposed to achieve joint coflow scheduling and
with our job scheduling method. In the context of the virtual machine placement in cloud data centers[24] .
paper, we regard job information, such as JCT, as a priori The authors investigated the scalability of the request
knowledge. scheduling process in cloud computing and provided
We added new simulations to evaluate the a theoretical definition of the scalability of this
performance of the DJSF method. In the experiment process[25] . Virtual machine fault-tolerant placement
setting, each job has the real JCT (JCTr ) and the for data centers of cloud systems also involves the
predicted JCT (JCTp ). The real JCT is extracted from scheduling problem.
the dataset. Given the prediction error ", the predicted 6.2 Impact of number of job groups
JCT is calculated as follows: One of the main challenges in the k-means clustering
JCTp D JCTr .1 "/ (14) algorithm is to specify the number of groups as an input
The prediction accuracy of the state-of-the-art prediction parameter. Algorithm 2 is not capable of determining
model is above 0.8. Thus, we set the value of " in a the appropriate number of groups. For the job packing
range from 0.0001 to 0.2. We evaluate the average JCT case, we should make the set of jobs compact so that jobs
of different methods. The results are listed in Table 4. are completed rapidly and resources are fully utilized.
When the prediction error increases, the performance Based on this idea, we roughly estimate the number of
improvement of the DJSF method over the SJF groups as follows. We calculate the sum of resources
method increases. We infer that the JCT alignment N
X
method alleviates the negative influence incurred by requested by all jobs. Let Rn denote the sum of
nD1
the prediction error. Before the DJSF method schedules resources. Assume that the number of groups, denoted
jobs, the JCT alignment method groups jobs by the by G, is equal to the sum of resources divided by the
length of JCT. If the prediction error is small, then the capacity of resources in the cluster. We have
jobs that have roughly the same JCT are still grouped N
X
together. When the DJSF method schedules these jobs, GD Rn =C (15)
the prediction error slightly changes the scheduling order nD1
of jobs. We conduct an experiment on the varying values of
As a comparison, the SJF method is more sensitive to the number of job groups in Algorithm 2. The results
the prediction error because the SJF method determines are listed in Table 5. When the number of job groups
the job scheduling order according to only the length of increases from 1 to 10, the performance improvement of
Table 4 Impact of prediction errors.
Improvement rate of Improvement rate of
Prediction error JCT of Tetris (s) JCT of SJF (s) JCT of DJSF (s)
DJSF over SJF DJSF over Tetris
0 1696.3 1695.3 1302.2 0.232 0.232
0.0001–0.001 1696.3 1698.0 1302.2 0.233 0.232
0.001–0.01 1696.3 1793.4 1301.6 0.274 0.233
0.01–0.2 1696.3 1787.2 1299.3 0.273 0.234
356 Tsinghua Science and Technology, April 2022, 27(2): 344–357
the DJSF method over other methods increases at first [4] J. C. Tang, M. Xu, S. J. Fu, and K. Huang, A scheduling
and then decreases. We infer that the large number of optimization technique based on reuse in spark to defend
job groups decreases the performance of the job packing against apt attack, Tsinghua Science and Technology, vol.
23, no. 5, pp. 550–560, 2018.
method. In previous experiments, we set the number [5] R. Grandl, M. Chowdhury, A. Akella, and G.
of job groups to 6 according to the characteristic of the Ananthanarayanan, Altruistic scheduling in multi-
test workload. Nevertheless, the optimal setting of the resource clusters, in Proc. 12th USENIX Conf. Operating
number of job groups is 3. We infer that if the k-means Systems Design and Implementation, Savannah, GA, USA,
clustering method generates too many groups, then the 2016, pp. 65–80.
[6] R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and
number of jobs in each job group would decrease. This
A. Akella, Multi-resource packing for cluster schedulers,
may be disadvantageous for the job packing method to in Proc. 2014 ACM Special Interest Group on Data
pick up and pack suitable jobs from a job group. To sum Communication, Chicago, IL, USA, 2014, pp. 455–466.
up, determining the optimal number of job groups is still [7] J. Wilkes, Google cluster data, https://github.com/google/
an open problem. We suggest to set a smaller number of cluster-data, 2020.
[8] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.
job groups than that in the heuristic setting in Eq. (15).
D. Joseph, R. Katz, S. Shenker, and I. Stoica, Mesos: A
platform for fine-grained resource sharing in the data center,
7 Conclusion
in Proc. 8th USENIX Conf. on Networked Systems Design
In this paper, we propose three algorithms, the DJSF and Implementation, Boston, MA, USA, 2011, pp. 429–
483.
method, job packing method, and JCT clustering method,
[9] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M.
which are jointly used to schedule multiple jobs. The Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et
DJSF method aims to decrease the average JCT and al., Apache hadoop YARN: Yet another resource negotiator,
overcome the disadvantage of the SJF method. We define in Proc. 4th Annual Symp. Cloud Computing, Santa Clara,
the concept of the job completion efficiency. The job CA, USA, 2013, pp. 1–16.
[10] T. Bonald, L. Massoulié, A. Proutière, and J. Virtamo,
packing method aims to pack jobs into a set of jobs,
A queueing analysis of max-min fairness, proportional
which has a small amount of resources fragment and fairness and balanced fairness, Queueing Syst., vol. 53, nos.
a large job completion efficiency. The JCT clustering 1&2, pp. 65–84, 2006.
method adopts a machine learning technique, k-means [11] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S.
clustering algorithm, to make a JCT alignment. To sum Shenker, and I. Stoica, Dominant resource fairness: Fair
up, the DJSF method achieves small average JCT and allocation of multiple resource types, in Proc. 8th USENIX
Conf. Networked Systems Design and Implementation,
enhances the system throughput.
Boston, MA, USA, 2013, pp. 323–336.
References [12] J. A. Hartigan and M. A. Wong, Algorithm AS 136: A K-
means clustering algorithm, J. Roy. Stat. Soc. Ser. C (Appl.
[1] S. Singh and Y. Liu, A cloud service architecture for Stat.), vol. 28, no. 1, pp. 100–108, 1979.
analyzing big monitoring data, Tsinghua Science and [13] M. K. Pakhira, A linear time-complexity k-means algorithm
Technology, vol. 21, no. 1, pp. 55–70, 2016. using cluster shifting, in Proc. 2014 IEEE Int. Conf.
[2] J. Dean and S. Ghemawat, MapReduce: Simplified data Computational Intelligence and Communication Networks,
processing on large clusters, in Proc. 6th Conf. Operating Bhopal, India, 2014, pp. 1047–1051.
System Design & Implementation, San Francisco, CA, USA, [14] J. O. Iglesias, L. Murphy, M. De Cauwer, D. Mehta, and
2004, pp. 137–150. B. O’Sullivan, A methodology for online consolidation
[3] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, of tasks through more accurate resource estimations, in
and I. Stoica, Spark: Cluster computing with working sets, Proc. 2014 IEEE ACM 7 th Int. Conf. Utility and Cloud
in Proc. 2nd USENIX Workshop on Hot Topics in Cloud Computing, London, UK, 2014, pp. 89–98.
Computing, Boston, MA, USA, 2020, pp. 1–10. [15] P. Janus and K. Rzadca, SLO-aware colocation of data
Zhiyao Hu et al.: Improved Heuristic Job Scheduling Method to Enhance Throughput for Big Data Analytics 357
center tasks based on instantaneous processor requirements, Architectural Support for Programming Languages and
in Proc. 2017 Symp. Cloud Computing, Santa Clara, CA, Operating Systems, Williamsburg, VA, USA, 2018, pp. 564–
USA, 2017, pp. 256–268. 577.
[16] M. Carvalho, D. A. Menascé, and F. Brasileiro, Capacity [21] Z. Y. Hu, D. S. Li, D. X. Zhang, and Y. X. Chen,
planning for IaaS cloud providers offering multiple service ReLoca: Optimize resource allocation for data-parallel
classes, Future Gener. Comput. Syst., vol. 77, no. 4, pp. jobs using deep learning, in Proc. IEEE Conf. Computer
97–111, 2017. Communications, Toronto, Canada, 2020, pp. 1163–1171.
[17] Z. Y. Hu, D. S. Li, and D. K. Guo, Balance resource [22] Y. Dong, J. Chen, Y. Tang, J. J. Wu, H. Q. Wang, and E.
allocation for spark jobs based on prediction of the optimal Q. Zhou, Lazy scheduling based disk energy optimization
resources, Tsinghua Science and Technology, vol. 25, no. 4, method, Tsinghua Science and Technology, vol. 25, no. 2,
pp. 487–497, 2020. pp. 203–216, 2020.
[18] S. Venkataraman, Z. H. Yang, M. Franklin, B. Recht, and I. [23] W. Zhang, X. Chen, and J. H. Jiang, A multi-objective
Stoica, Ernest: Efficient performance prediction for large- optimization method of initial virtual machine fault-tolerant
scale advanced analytics, in Proc. 13th USENIX Conf. placement for star topological data centers of cloud systems,
Networked Systems Design and Implementation, Santa Tsinghua Science and Technology, vol. 26, no. 1, pp. 95–
Clara, CA, USA, 2016, pp. 363–378. 111, 2021.
[19] Z. D. Bei, Z. B. Yu, H. L. Zhang, W. Xiong, C. Z. Xu, [24] D. Shen, J. Z. Luo, F. Dong, and J. X. Zhang, VirtCo: Joint
L. Eeckhout, and S. Z. Feng, RFHOC: A random-forest coflow scheduling and virtual machine placement in cloud
approach to auto-tuning Hadoop’s configuration, IEEE data centers, Tsinghua Science and Technology, vol. 24, no.
Transactions on Parallel and Distributed Systems, vol. 27, 5, pp. 630–644, 2019.
no. 5, pp. 1470–1483, 2016. [25] C. Xue, C. Lin, and J. Hu, Scalability analysis of request
[20] Z. B. Yu, Z. D. Bei, and X. H. Qian, Datasize- scheduling in cloud computing, Tsinghua Science and
aware high dimensional configurations auto-tuning of in- Technology, vol. 24, no. 3, pp. 249–261, 2019.
memory cluster computing, in Proc. 23rd ACM Int. Conf.