Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Wireless Personal Communications (2023) 129:1175–1195

https://doi.org/10.1007/s11277-023-10182-0

Hybrid Gradient Descent Golden Eagle Optimization


(HGDGEO) Algorithm‑Based Efficient Heterogeneous
Resource Scheduling for Big Data Processing on Clouds

N. Jagadish Kumar1 · C. Balasubramanian2

Accepted: 7 February 2023 / Published online: 21 February 2023


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

Abstract
Resource scheduling is indispensable for enhancing the system performance during big
data processing on clouds. It is highly useful for attaining significant utilization of com-
puting resources completely concentrating towards the facilitation of resource scalability
and on-demand services. The resources essential for running different applications is deter-
mined to be maximum heterogeneous in cloud computing. This heterogeneous resource
demand introduces a resource gap in which some of the resource potentialities are drained
on par with the other resource potentialities still available in the same server resulting in
imbalanced resource utilization. This imbalanced resource allocation condition is more
apparent when the computing resources are more heterogeneous. At this juncture, intel-
ligent resource scheduling strategy becomes essential to distribute resources for big data
processing by adopting a potential decision-making process that focusses on the objective
of achieving necessitated tasks over time. In this paper, Hybrid Gradient Descent Golden
Eagle Optimization (HGDGEO) algorithm-based efficient heterogeneous resource schedul-
ing process is proposed for handling the challenges that are highly possible during big data
processing in the Hadoop heterogenous cloud environment. This HGDGEO algorithm is
proposed as an adaptive resource scheduling strategy that handles the dynamic character-
istics of the resources and users’ fluctuating demand during big data stream processing by
mimicking the golden eagles’ intelligence which alternates the speed of tuning at different
spiral trajectory stages of hunting. It handles big data processing by adopting two adaptive
parameters which completely concentrates on optimal resource allocation to suitable VMs
in the shortest time possible depending on their requirements. The simulation results of
HGDGEO algorithm confirmed its predominance in terms of makespan, load balance and
throughput on par with the competitive resource scheduling algorithms.

Keywords Gradient descent algorithm · Golden eagle optimization · Big data processing ·
Hadoop cloud environment · Hunting spiral trajectory

* N. Jagadish Kumar
Jagadishiva@gmail.com
Extended author information available on the last page of the article

13
Vol.:(0123456789)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1176 N. Jagadish Kumar, C. Balasubramanian

1 Introduction

From the recent past, big data processing is essential for handling the massive analyt-
ics requirements introduced by majority of the scientific and business domains applica-
tions that includes scientific exploration, demand forecasting, healthcare, fraud detection
and banking [1]. During the processing of big data, the frameworks such as Spark, Storm
and Hadoop are determined to the ideal option. In general, most of the huge organizations
deploy and execute private computing clusters that possess more than one data processing
architectures on the top of it [2]. The deployment of big data processing clusters over pub-
lic clouds is determined to be highly significant, since public cloud services possesses the
capability of facilitating potential platform, infrastructure, and software for data computing
and storage [3]. However, achieving the process of big data jobs scheduling is a herculean
task in the cloud-deployed cluster as their requirements of scheduling varies depending
on the types of data processing jobs that are categorized into network-intensive, memory-
intensive, and CPU-intensive tasks [4]. Further, the jobs can be classified depending on
the different in the resources demands realized on the cloud computing environment for
sustaining stabilized and better performance [5, 6]. Moreover, the formulation of efficient
cost-effective scheduling strategies is difficult as different kinds of Virtual Machines (VM)
occurrences are available on the cloud environment [7].
In this paper, an efficient big data job scheduling schemes are proposed for minimized
the cost incurred for utilizing the cloud-deployed Apache Spark (AS) cluster with enhanced
job performance. This proposed big data job scheduling algorithm considered AS as the
target framework for demonstrating its potentiality owing to its efficiency, scalability and
versality offered during big data processing [8]. Moreover, it has potentially replaced the
classical Hadoop-based platforms utilized in the industry. It incorporates the concept of
in-memory caching for speeding up the rate of applications processing [9]. The number of
necessitated executers useful for executing a specific job determines the resource require-
ments of that Sparkjob. In this context, each individual executer presents a process utiliz-
ing a static portion of resources such as disk, memory and CPU [10]. But different big
data jobs require changing executor sizes based on the workload types considered for pro-
cessing. Hence, big data jobs exhibit different properties with respect to its dependability
over the resources available in the cloud computing environment [8]. In particular, First
in First Out (FIFO) is the default scheduling mechanisms used in the process of Spark job
scheduling. A single job following FIFO might consume the complete and comprehensive
resources of the cluster when resource utilization threshold is set [11]. On the other hand,
the remaining resources in the cloud environment can be utilized for scheduling the subse-
quent jobs existing in the queue depending on the resource threshold assigned by the user.
This AS framework also utilizes a fair scheduler in addition to FIFO scheduler for mitigat-
ing the problem of resource contention realized among the jobs [12]. Both the schedul-
ers in default aids in placing the job executers in a Round Robin (RR) manner on worker
nodes or VMs for the objective of improving performance with effective load balancing.
However, the placement of executer in a RR manner results in wastage of resource in the
complete set of available VMs whenever the cloud-deployed cluster is not completely
loaded with jobs [13]. Even though, Spark possesses an important option of consolidating
the placements associated with the executer, the manager of the cluster fails in considering
the capacity of resources and cost of different cloud VM instance types [14]. This limita-
tion of the AS framework results in inappropriate cost-efficient decisions associated with
the placement of the executer. In addition, majority of the existing scheduling works of the

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1177

literature completely concentrated on Hadoop-based platforms. These existing mechanisms


were not suitable in directly applying it over the process of Spark job scheduling as the
architectural paradigm possesses by them are completely dissimilar from in-memory com-
puting architectures. Furthermore, only a marginal amount of work has been carried out for
handling problem of scheduling associated with in-memory computing-based frameworks
like AS [15]. But majority of the existing works assumes the setup of cluster to be homoge-
neous in characteristics, thereby fail in facilitating the scheduling process with more cost-
effective from the perspective of the cloud.
Let us consider an example of a cluster which includes two homogeneous VMs with
each with 8 CPU cores capacity. At this juncture, if each Spark job requires two executers
with two cores for each, then the core requirement for the total CPU is 4. But majority of
the existing methodologies utilizes both the VMs to deploy these 2 executers, resulting in
wastage of resources and maximized cost in VM utilization [16]. Whenever each scheduler
may consider different VM instance types and VM pricing model in the cluster, then the
executers from the jobs can be potentially and tightly packed in least cost-efficient VMs.
Only when there is a maximized load on the cluster, then the instances with increased
resource capacity and prices will be involved [17]. Hence, a scheduling problem of Spark
jobs is formulated in the cloud-deployed cluster as an alternative to bin-packing problem.
The proposed scheduling problem primarily concentrates on the minimization of VM cost
with the view to improve performance and guarantee maximized resource utilization.
In this paper, Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO) algo-
rithm-based efficient heterogeneous resource scheduling process is proposed for handling
the challenges that are highly possible during big data processing in the Hadoop heterog-
enous cloud environment. This HGDGEO algorithm is proposed as an adaptive resource
scheduling strategy that handles the dynamic characteristics of the resources and users’
fluctuating demand during big data stream processing by mimicking the golden eagles’
intelligence which alternates the speed of tuning at different spiral trajectory stages of
hunting. It handles big data processing by adopting two adaptive parameters which com-
pletely concentrates on optimal resource allocation to suitable VMs in the shortest time
possible depending on their requirements. The simulation results of HGDGEO algorithm
confirmed its predominance in terms of makespan, load balance and throughput on par
with the competitive resource scheduling algorithms.
The subsequent section of the paper is structured as follows. Section 2 gives a compre-
hensive view of the existing works contributed to the literature over the latest years with
the pros and cons. Section 3 shows the problem statement, the objective of the problem,
the role of GEOA in placing executor in VMs depending on the satisfaction of constraints
required during placement with algorithm used for achieving the same. Section 4 demon-
strates the simulation results and discussion of the proposed HGDGEO algorithm with the
proper justification behind their potential performance. Section 5 concludes the paper with
major contributions of the proposed work with future scope of enhancement.

2 Related Work

Initially, Mashayekhy et al. [18] proposed an energy-efficient framework for MapReduce


applications by satisfying the requirements of SLA. This framework modelled the energy-
based scheduling problem as an Integer program for each MapReduce job. It adopted
two heuristic algorithms termed energy-aware MapReduce scheduling algorithms for

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1178 N. Jagadish Kumar, C. Balasubramanian

determining the allocation of map and reduced jobs to the machine slots. It was proposed
for minimizing the energy utilized during the process of application execution. Experi-
ments validation of this framework was conducted using Hadoop cluster to estimate the
time of execution and energy utilized for varying workloads derived from the HiBench
benchmarked suite. The results of these two heuristic algorithms confirmed its capability
in identifying the near optimal job schedules with reduction of energy consumption up to
40% compared to the average amount of energy incurred by a generic practice scheduler.
Then, Lu et al. [19] proposed a Genetic Algorithm-based scheduling approach for improv-
ing the efficiency involved in the execution of big data analytics jobs. It included a module
of estimation that aids in predicting the clusters’ performance during the process of execut-
ing big data analytics job. It specifically utilized GA for attaining job scheduling for geo-
distributed data. The experimental results of GA-based scheduling confirmed better effec-
tiveness and accuracy during its evaluation through different cluster nodes and data centers
considered from the Amazon EC2 platform.
Lim and Majumdar [20] proposed an effective resource management strategy for effec-
tive processing of MapReduce jobs within its deadline time of completion. It processes
MapReduce jobs by achieving maximized user satisfaction and improved high system
performance. It was proposed with the capability of achieving high resource utilization,
minimized response time and job throughput without missing the deadlines of execution.
Hashem et al. [21] propounded a multi-objective optimization scheduling algorithm based
on the cost and completion time for executing MapReduce jobs. This scheduling algo-
rithm utilized the factors of cost minimization and completion time into account during the
process of multi-objective optimization. It adopted earliest finish time scheduling for job
scheduling and resource allocation in the cloud environment. The experimental results of
this scheduling algorithm confirmed better performance of approximately 25 and 21% in
minimizing the cost and maximized the time of execution compared to the Fair and FIFO-
based scheduling approaches.
Further, Shao et al. [22] proposed a YARN-based energy-ware fair scheduling frame-
work for potentially minimizing the cost of energy consumption by satisfying the require-
ments of SLA. This YARN-based scheduler possessed the capability of turning on or off
the cluster nodes and schedule jobs to them for attaining energy efficiency. It allocated the
required resource to MapReduce tasks based on energy-aware dynamic capacity manage-
ment strategy integrated with deadline-driven policy. It confirmed better mean execution
time for containers and minimized allocation time for user requests. It was modelled as
a multi-dimensional knapsack problem with the merits of energy-aware greedy algorithm
for realizing the tasks essential for achieving fine-grained placement on energy-efficient
nodes. It minimized energy costs by turning off the nodes that are in the state of idle in the
duration of threshold. Experimental validation of this energy aware scheduling approach
conformed better resource allocation rate and minimized execution time independent to
the big data application jobs entering in the cloud environment. Hu et al. [23] contributed
an intelligent cloud workflow management and scheduling approach using the platform
of JStorm for achieving real-time big data processing. It specifically adopted the benefits
of a dynamic resource scheduling system termed D-JStorm for attaining adaptive balance
between the available cloud service resources for attaining two-ties scheduling process of
cloud workflow tasks.
Seethalakshmi et al. [24] proposed an integrated spider Monkey Optimization and
Gradient Descent (SMOGD) local search method-based resource scheduling for handling
the challenges during the process of big data processing in Hadoop heterogeneous sce-
nario. This resource scheduling approach achieved effective task scheduling and resource

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1179

allocation process in a balanced manner, such that load balancing of tasks over the VMs
can be attained in a more reliable way. It was proposed with the capability of assigning
the dynamic number of incoming tasks of the cloud environment into the suitable VMs
depending on the constraints and requirements necessary for placement. The results of this
SMOGD confirmed better performance in terms of makespan, load balancing and through-
put independent to the number of tasks entering into the heterogeneous cloud environ-
ment. Islam et al. [25] proposed an efficient scheduling algorithm over the cloud-placed
AS cluster for minimizing the resource utilization cost. This scheduling algorithm was pro-
posed with the capability of prioritizing the tasks depending on their execution deadlines
in the cloud computing environment. It was proposed as an adaptive and online cluster
scheduling algorithms that handled the degree of uncertainty in a more potential manner.
It was deployed over the top of the Apache Mesos to determine the suitable placement of
executors associated with each task of the cloud computing environment. The results of
this scheduling algorithm proved its efficacy in reducing the resource utilization cost to a
marginal level of 32.21%, independent to the number of workloads incoming to the cloud
computing scenario.

3 Proposed Work

In this section, initially the problem formulation is defined with the objective function con-
sidered for executor placement over the VMs for each individual tasks during big data pro-
cessing over clouds as in Fig. 1. It presents the detailed view of the GEOA algorithm used
for attaining executor placement in VMs with its significance facilitated over the existing
approaches. It also demonstrates the algorithm which represents the algorithm used for
executor placement in the cloud environment.

3.1 Problem Formulation

The executor demands from the job in an AS cluster are the same. When the demands for
resources are satisfied then every worker node (VM) possesses a collection of accessible
resources that includes memory and CPU cores which can be utilized for placing the executors
from any job [26]. Thus, the main objective targets on the process of determining the place-
ment of all its executors for each submitted job over one or more accessible VMs. Moreover,
the resource capacity associated with each VM must not get exceeded during the placement of
one or more executors in the particular VM during scheduling. In this context, it is identified
that the cost gets reduced with fewer VM usages due to the compact assignment of execu-
tors [27]. Hence, this problem of scheduling is modelled as a variant problem of binpacking.
At this juncture, CPU cores and memory is considered as the resource requirements of an
executor. Thus, each executor associated with a job can be considered to be multi-dimensional
volume that needs to be positioned over a specific VM (referred as bin) in scheduling. For
instance, there exists a job comprising of k executors which in turn possess the requirements
of CPU and memory such as CPUReq(i) and MemReq(i) respectively. Where, ESet represents the
index set of executors concerned with specific job, ESet = {1, 2, 3, … , K}. Further there exists
‘r’ kinds of VM with a 2-dimensional resource capacity representing the memory and CPU
which involves a static cost during utilization. This problem concentrates on the process of

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1180 N. Jagadish Kumar, C. Balasubramanian

Fig. 1  Proposed work

selecting VMs for the objective of placing the executors into them for minimizing the total
cost with satisfied resource constraints.

3.1.1 Objective 1 Minimization of Cost

The first objective of the problem concentrates on reducing the cost through the process of
scheduling any jobs in the whole cluster. In this case, the total cost represents the sum of cost
incurred during the process of utilizing the complete VMs in the implementation environment.
In specific, a binary decision variable (BDV(kr)) represented in Eq. (1) is used for controlling
whether a VM ‘k ’ pertains to the category ‘r ’ is utilized or not.

1 If the kth VM of type r is used


{
BDV(kr) =
0 Otherwise
. (1)

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1181

3.1.2 Objective 2 Constraints of Resource Capacity

This objective focusses on the checking whether the total resource demands with respect to
the complete set of executors positioned in a specific VM does not surpass the capacity of
the total resource necessity of the VM. The restraints of resources in terms of memory and
CPU cores are presented in Eq. (2) and (3)
∑ (
EPikr × 𝛽 Mem ≤ 𝛿kr
Mem
)
× BDV(kr)
(2)
i∈ESet

∑ (
EPikr × 𝛽 CPU ≤ 𝛿kr
CPU
)
× BDV(kr)
(3)
i∈ESet

where EPikr , BDV(kr) ∈ {0, 1}

3.1.3 Objective 3 Constraints of Resource Capacity

This final objective targets on verifying whether an executor can be positioned in any one
of the available VMs with the constraints of its placement specified in Eq. (5). In particu-
lar, a binary decision variable (EPikr ) represented in Eq. (4) is used for controlling whether
an executor is placed over the VM ‘ k ’ associated with category ‘ r ’ or not.

1 If ith executor is positioned in kth VM of type r


{
EPikr =
0 Otherwise (4)

Finally, the problem aims at the process of selecting VMs and placing the complete set
of executors over the available VMs, such that the complete cost of minimized with the
constraints of resource satisfied as specified in Eq. (5)

(5)
( )
Minimize Cost = VMPrice(r) BDv(kr)

Subject to
∑ ∑
BDV(kr) = 1
k∈ESet r∈VMType
∑ (
EPikr × 𝛽 Mem ≤ 𝛿kr
Mem
)
× BDV(kr) (6)
i∈ESet
∑ (
EPikr × 𝛽 CPU ≤ 𝛿kr
CPU
)
× BDV(kr)
i∈ESet

where EPikr , BDV(kr) ∈ {0, 1}


As mentioned, this problem of selecting VMs and placing the executors over the avail-
able VMs is modelled a Bin packing problem. This Bin packing represents a combinato-
rial problem of optimization which is identified to be NP-Hard. In this proposed work, the
multi-dimensional bin packing problem is solved through the benefits of HDGEOA. The
proposed HDGEOA scheduler is proposed with the capability of job scheduling through
the estimation of resource demand necessitated for the current job and availability of clus-
ter resources. It also proposed with the potential of determining the most cost-efficient

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1182 N. Jagadish Kumar, C. Balasubramanian

positioning of executor for each specific job under execution. The process of scheduling
jobs by the baseline schedulers is time consuming when the size of problem increases.
In this context, metaheuristic optimization methods are ideal and suitable candidates for
achieving faster placement of executors.

3.2 HDGEOA‑Based Executor Placement

In this section, the mathematical formulation that mimics the movements of golden eagle in
the process of searching their prey is explained. This process of searching exhibited by the
golden eagles is adopted for searching the search space to determine the optimal CH node
from the network for prolonging network lifetime. The searching process of prey facili-
tated by GEOA algorithm comprises of spiral motion, subsequently followed by attack and
cruise phase that represented the steps of exploitation and exploration, respectively.

3.2.1 Spiral Motion in GEOA

The GEOA algorithm completely depends on the spiral movement that has the capability
of memorizing the best location determined by the search agents visited until the recent
iteration. The search agent (eagle) has the potentiality of performing the actions of prey
attacking and cruising towards the prey for searching for optimal CH sensor node (prey)
from the network. In this context, the attack and cruise actions of GEOA depicting exploi-
tation and exploration phases is represented as follows.

3.3 Exploitation Phase (Attack Phase of GEOA)

This phase of attack in GEOA is represented using the vector starting from the current
position of the search agent (eagle) and terminating at the location of the identified CH
sensor node (prey) determined from the memory of the search agent. The vector of attack
(A Vector ) utilized by the search agent (eagle) to determine the location of CH node is com-
����������

puted based on Eq. (7).

A ⃗ �����������������⃗ ��������⃗
����������
Vector = BLPrey(Past) − CPSA (7)

where, CP SA and BLPr ey(Past) represents the current position of the i search agent and best
��������
⃗ ������������������⃗ th

position of the CH (prey) as determined by that search agent until the previous iteration.
This attack phase represents the phase of exploitation in GEOA, since the attack vector is
responsible for guiding the complete population of search agents towards its best visited
locations.

3.3.1 Exploration Phase (Cruise Phase of GEOA)

The cruise phase of GEOA represents the exploration phase that completely depends on
the cruise vector, which is computed based on the attack vector. This vector of cruise
depicts the tangent vector to the circle which is perpendicular to the attack vector. It
presents the linear rate at which the search agent exhibits linear movement relative to
the movement of the CH sensor node (prey) in the search space. First, the formula for
the tangent hyperplane is calculated to estimate cruise vector, since it is located inside

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1183

the tangent hyperplane of the circle of ‘ m ’ hyperplane. At this juncture, the tangent
hyperplane equation in scalar form in the d-dimensional space is presented as
m

d= nv(j) vc(j) = nv(1) vc(1) + nv(2) vc(2) + … = nv(m) vc(m) (8)
j=1

where nv(j) and vc(j) represents the variable vector and normal vector, respectively in the
hyperplane.

3.3.2 New Position movement in GEOA

The search agents’ (eagles) displacement completely depends on the attack and cruise
vector. In each iteration, the step vector associated with each search agent is defined
based on Eq. (9)
����������
A Vector
⃗ �������������
CR Vector

ΔPosSA(i) = r�������
nd(1)⃗ACoeff + r�������
nd(2)⃗CRCoeff (9)
����������
A Vector
⃗ �������������
CR Vector

where, ACoeff and CRCoeff represents the attack and cruise coefficient with respect to each
iteration. In specific, these attack and cruise coefficient plays an indispensable role in
adjusting the search agents’ movements towards the optimal solution with the impact of
attack and cruise phases. Moreover, r������� ⃗ and r�������
nd(1) nd(2)⃗ refers to the random vectors whose ele-
ments always lies in the interval of [0,1]. In addition, the attack and cruise vector Euclid-
ean norm such as A Vector and CRVector is computed based on Eq. (10) and (11)
����������
⃗ �������������⃗

√ n
)2
����������

√∑ (
AVector = √ av(j) (10)
j=1


√ n
)2
�������������

√∑ (
CR Vector = √ crv(j) (11)
j=1

At this context, the search agents’ position in the iteration is computed by just adding
the step vector as represented in Eq. (12)

(12)
t+1
PosSA = PostSA + ΔPostSA

When the fitness value of each search agent ‘ i ’ in its new position is better than the
fitness value determined in its old position stored in the memory, then the new posi-
tion of the search agent is updated. Otherwise, the search agents still exist in the new
position even when the memory remains intact. Each search agent (eagle) in each new
iteration selects another search agent randomly from the entire population for updat-
ing its own position through the computation of attack vector, cruise vector and step
vector during the successive iteration. This process is executed until the condition of
termination is satisfied. In specific, attack vector and cruise vector coefficients are used
for controlling the step vector which aids in determining the phase of exploration and
exploitation.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1184 N. Jagadish Kumar, C. Balasubramanian

3.4 Transition from Exploration to Exploitation Phase

In GEOA, exploration is introduced into the initial iterations while exploitation is intro-
duced into the final iterations. In this context, the phase of exploration and exploitation
is determined depending on the value of attack vector ( PAttack ) and cruise vector ( PCruise)
coefficients. In specific, GEOA adopts PAttack and PCruise for achieving transition from
exploration to exploitation. The process of algorithm starts with low PAttack and high PCruise
value. During the successive iterations, the value of PAttack and PCruise is gradually increased
and decreased, respectively. Moreover, the intermediate values of PAttack and PCruise is com-
puted based on linear transition as presented in Eq. (13) and (14)
IterCurr ( (IterMax ) )
PAttack = P(0)
Attack
+ P − P(0)
(13)
IterMax Attack Attack

IterCurr ( (IterMax ) )
PCruise = P(0)
Cruise
− PCruise − P(0)
Cruise (14)
IterMax

where, IterCurr and IterMax represents the current and maximum iteration used for imple-
(Iter )
mentation. Further, PAttackMax and P(0)
Attack
indicates the initial and final propensity degree
(IterMax )
associated with attack vector. Furthermore, PCruise and P(0)Cruise
highlights the initial
and final propensity degree associated with cruise vector. In specific, the value of P(0) Attack
(Iter )
and PAttackMax is assigned to 0.5 and 2, which means that the attack vector systematically
( Max )
Iter
increased from 0.5 to 2. Similarly, the value of P(0)
Cruise
and PCruise is set to 1 and 0.5, which
means that the cruise vector is linearly decreased from 1 to 0.5.

Algorithm 1 GEOA-based executor placement.


Input: The complete set of incoming tasks and the current task to be scheduled.
Output: A list of VMs representing the placement list over which the task executors are
to be placed.

1. Executor_Place_Procedure (Task)
2. Initially set List_Placement ← 𝜑
3. Sorting the list of available VMs ( ListVMs)
4. For each VM(i) ∈ ListVMs do
5. Until the condition specified in Eq. (3 and 4) is satisfied for the executor placement
6. Update the availability of resource in VM using GEOA algorithm
7. ( VM(i) into ListVMs)
Add
8. If ListPlacementSize = E then
9. Return List_Placement
10. End
11. End
12. End
13. If un-utilized VMs existing in a cluster then
14. Use the constraints specified in Eq. (3 and 4) to turn on the smallest NewVM using
GEOA
15. ListVMs(New) = ListVMs(Old) ∪ NewVM
16. Return to Step 3

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1185

17. End Until


18. Return Failure
19. End.

4 Experimental Results and Discussion

In this section, the details related to the experimental setup comprising of the baseline
schedulers, benchmarked application and the cluster resource configuration is presented.
Then the performance evaluation results of the proposed HGDEOA scheme and the bench-
marked algorithms such as based on the parameters of scheduling overhead, deadline vio-
lations, job performance and cost are demonstrated. In addition, a sensitivity investigation
of the proposed scheme conducted with respect to the system parameters are depicted for
determining the applicability of the propounded algorithms.

4.1 Baseline Schedulers

In general, the major limitation of many of the cluster schedulers that handles Spark jobs is
that they fail in considering the job placement at the executor level. Majority of the base-
line schedulers only concentrate on the process of selecting total number of nodes (VMs)
or resources required for each job during the process of making decisions with respect to
scheduling. But the proposed scheme operates at the fine-grained level by adopting the
placements of executor during the process of job scheduling. The baseline schedulers used
for comparing the proposed scheduling algorithm is explained as follows.
i. Morpheus In this baseline scheduler, the executor placement policy of Morpheus pre-
sented in [26] is adopted. It included the policy of low-cost packing during the process
of placing executors. This low-cost packing policy determined the demanded scarce
resources essential for each job execution depending on the current cluster load deter-
mined based on Eq. (15). It sorted the jobs in an ascending order depending on the
demands of their scarce resources necessary for them during the process of execution.
Thus, the resources in the cluster are identified to be well-balanced over the complete
scheduling processes, such that maximized number of jobs can be executed for a pro-
longed time. It is considered as the baseline scheduler for comparison as it included the
strategy of packing during the placement of executor.
( )
L + JCPU LMem + JMem
JCost = Max CPU , (15)
TCPU TMem

Moreover, the dynamic resource allocation feature of Spark is kept in the active mode
during the implementation of the proposed scheme and the baseline schedulers.

ii. Integer Linear Programming-based Executor placing scheduler (ILPEPS) It is the


scheduler which operates on the principles of integer programming to update the sta-
tus of cluster depending on the recent availability of resources essential for each VM
[27]. It integrated the current availability of cluster resource and resource demand with
respect to the executors of the current job into account to dynamically generate the
constraints of resource capacity, constraints of executor placement and optimization

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1186 N. Jagadish Kumar, C. Balasubramanian

objective. It returned a List_Placement comprising of selected VMs over which the


executors can be constructed whenever a feasible solution is identified. Otherwise, a
failure is returned when it is not able to identify a possible solution. In particular, the
VMs that are already utilized by other jobs are set to 1 whenever the resource avail-
ability constraints are generated during the process of job scheduling. It determined
the optimization objective depending on the cost of VMs that are already utilized
during executor’s placement. This ILPEPS strategy automatically attempts to place
maximized number of executors as much as possible before the use of any new VM
for the purpose of cost optimization, when there are free resources obtainable in the
utilized VMs.
iii. Best Fit Deceasing Heuristic Placement of Executor (BFDHP) This scheduler utilized
a greedy algorithm for determining the VMs over which the executors of the job can
be optimally placed [28]. It initially identified the VMs in the cluster and sorted them
in the ascending order depending on its resource availability. Then it verifies all the
VMs to confirm whether the resource availability of current VMs complies with the
requirement of the executors’ resource required for placement. It the requirement is
satisfied, then the resource availability of VMs is updated and added into a placement
list. It adopted a greedy approach in which the current VM is utilized for placing as
much as executors as feasibly rather than looking at the next VM, such that executors
can be tightly packed. It utilized only a limited number of VMs in the cluster during
the placement of executors. It returns the final placement list of executors when it
identified the positioning for the executors associated with a specific job. Further,
the smallest VM which fulfils the constraints is turned on and added to the list of
VMs, when the cluster is not utilized and the VMs in the VM_List is determined
to be inadequate for placing all the executors. Furthermore, the steps of placement
determination are iterated, until a failure is returned or the cluster does not possess
necessitated resources for placing all the executors associated with the current job.
iv. First In First Out (FIFO) scheduler It is the default scheduler inherent with the AS
positioned over the top of the Apache Mesos [29]. It adopts the strategy of first come
first server for scheduling jobs. It attempts to pack the executors over limited number
of VMs depending on the consolidation option of the scheduler to prevent round-robin
fashion of executors’ distribution. This scheduler is considered as one of the baselines
for comparison as it is most common option for a user with Spark jobs and manyexist-
ing scheduling algorithms use this as the default strategy for placing executors.

4.2 Benchmarked Applications

The proposed and the baseline scheduling algorithms are evaluated based on the big data
benchmarking suite named BigDataBench. In specific, applications such as PageRank,
Sort, Word count from BigDataBench is selected for achieving the comparative perfor-
mance evaluation. These benchmarked applications is used for workload generations over
which each individual job possesses different size of input varying from 1 to 20 GB, espe-
cially for Sort and WordCount. These workload generations also may take 5 to 10 iterations
for applications like PageRank. In this experimental validation process, different appli-
cation types are mixed randomly for the purpose of generating heterogeneous workload.
The experimentation is conducted using Facebook Hadoop workload trace from which
job arrival rime with respect to varying hours of a specific day is extracted. In case of
light-load hour, 50 jobs are used while 100 jobs are used for high-load hour. Thus, the

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1187

jobs arrival rate under high-load hour is always comparatively more than the lightly loaded
hour. Thus, most of the resources are highly utilized in the high loaded hour, and the clus-
ters are slightly utilized in the lightly loaded hour. Moreover, the profiles of the job are
initially submitted for independently executing each job in the cluster without any inter-
ference introduced by other jobs executing in the same environment. In this experimental
process, the time of job completion id average for multiple runs, in specific 5 for each job.
In addition, the mean completion time of jobs is used as the hard deadline during the gen-
eration of workload.

4.3 Configuration of Cluster

The Mesos cluster is deployed over the Nectar cloud, with each cluster comprising of three
kinds of VM instances. The cluster considered for experimental process consists of 18
VMs with a total core of 120 (CPU) with the memory capability of 500 GB. Both AS
of version 2.3.1 and Apache Mesos of version 1.4.0 was installed over each VM. Among
the VMs, one large kind of instance is used as Mesos master and the residual VMs are
utilized as agents. The external schedular is plugged into the master node of Mesos with
Spark supporting different sources of input. In this scenario, the users possess the option
of selecting different data sources whatever they want to utilize. It inherited HDFS which
is one of the potential distributed storage services facilitating highly scalable and fault tol-
erance depending on the phenomenon of replication. Moreover, HDFS aids in storing the
replica for each block of storage in three data nodes. Thus, HDFS is made to automatically
create replicas over the available VMs when any of the above-mentioned VMs are turned
off for conserving the cost. Further, replicas are automatically created by HDFS depending
on the availability of VMs. But when all the three data nodes over which its replicas are
stored are turned off, then the storage block can be lost. Thus, the VM need to be operated
in ON or OFF state in the above-mentioned case to permit the HDFS to generate replicas
before turning off all the data nodes. In this experimentation, a network file system (NFS)
is created with 1 TB volume mounted over the master node for sharing the space of storage
among all the Mesos agents. This scenario minimizes the complexity involved in imple-
menting the proposed approach. This proposed scheme ignores data loss during the off
state of VM, since the NFS server is executing on the master node that will never enter
into the off state. In this situation, the performance overhead incurred in getting the input
from the NFS server is determined to be highly negligible. This is mainly done once during
the state of the jobs execution, and since the intermediate outcomes are saved in the local
storage of each VMs which is completely handled by Spark. The Bash Scripting is used for
automating the setup process of cluster, such that large scale deployment can be achieved
using these scripts. In addition, the scaling up of clusters of the existing clusters can be
attained when more VMs are facilitated from the Cloud Service Provider (CSP).

4.4 Performance Evaluation of the Proposed Work Using Job Performance

In this experimental investigation process, the proposed scheme, and the baseline ILPEPS,
BFDHP, Morpheus and FIFO schedulers using mean completion times under the impact of
light and increased load hours, correspondingly. It is identified that the network-oriented
applications including PageRank minimizes the performance of the BFDHP, Morpheus
and FIFO schedulers compared to, since they utilize maximized network communications
during the process of alternation. On the other hand, it is noticed that the proposed and

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1188 N. Jagadish Kumar, C. Balasubramanian

ILPEPS schedulers perform well with respect to Sort and Wordcount applications on par
with the baseline BFDHP, Morpheus and FIFO schedulers (Fig. 2). The proposed scheduler
algorithm utilizes only minimized number of VMs for the placement of all the complete set
of executors. Moreover, only minimized number of VMs are utilized with the maximized
capacity in terms of memory resource and CPU cores. Thus, the total resource cost utiliza-
tion incurred by the proposed scheme is comparatively lower than the baseline schedulers
used for comparison. In addition, the proposed scheme is determined to perform better than
the baseline schedulers with respect to the applications of PageRank and mixed applica-
tions. It is identified that the benefits of performance under the mixed workload scenario
completely depends on the ratio of network intensive applications as all the algorithms
perform identical with respect to memory or CPU intensive applications. The proposed
scheme under high-load hour achieved maximized job completion time margin of 13.21%,
16.92%, 19.18% and 22.18%, better than the baseline schedulers used for comparison dur-
ing the use of mixed and PageRank applications, respectively. It is also confirmed that the
proposed scheme under light-load hour achieved maximized job completion time margin of
15.18%, 17.96%, 19.86% and 21.64%, better than the baseline schedulers used for compari-
son during the use of mixed and PageRank applications, respectively (Fig. 3).

4.5 Performance Evaluation of the Proposed Work Using Scheduling Overhead

In this performance evaluation, the proposed and the baseline schedulers is evaluated based
on scheduling delays under heavy and light-loaded conditions. In this scenario, the sched-
uling overhead is defined the time taken for determining the placements of the executor
associated with a specific job. Table 1 and 2 represents the mean scheduling delay incurred
by the proposed scheme and baseline ILPEPS, BFDHP, Morpheus and FIFO schedulers
under the impact of different types of workloads in the hours of light-load and heavy-load,
respectively. It is identified that the primitive FIFO is the fastest when compared to other
schedulers involving scheduling delays averaging to 2.32 ms to 4.26 ms. On the other hand,
Morpheus, BFDHP and ILPEPS are also comparatively faster as their mean scheduling
delay ranges between 3.48 ms to 5.64 ms, and 4.98 ms to 7.12 ms, respectively. In contrast,

500
450
MEAN JOB COMPLETION TIME (Seconds)

400
350
300
250
200
150
100
50
0
40 80 120 160 200
DIFFERENT BENCHMRKED APPLICATIONS USED FOR EVALUATION

Proposed ILPEPS BFDHP Morpheus FIFO

Fig. 2  Mean job completion time under light-loaded hours with respect to different benchmarked applica-
tions

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1189

300

MEAN JOB COMPLETION TIME (Seconds)) 250

200

150

100

50

0
40 80 120 160 200
DIFFERENT BENCHMRKED APPLICATIONS USED FOR EVALUATION

Proposed ILPEPS BFDHP Morpheus FIFO

Fig. 3  Mean job completion time under heavily loaded hours with respect to different benchmarked applica-
tions

Table 1  Mean scheduling delays Schedulers under Heavy load


incurred by different scheduler comparison
algorithms with heavy load Page rank Sort WordCount Mixed

Proposed 0.68 2.76 0.77 1.43


ILPEPS 0.61 2.38 0.72 1.32
BFDHP 0.0046 0.0042 0.0055 0.0052
Morpheus 0.0036 0.0043 0.0054 0.0043
FIFO 0.0034 0.0038 0.0036 0.0041

Table 2  Mean scheduling delays Schedulers under Light load


incurred by different scheduler comparison
algorithms with light load Page rank Sort WordCount Mixed

Proposed 0.86 3.27 3.58 2.33


ILPEPS 0.78 3.12 3.41 2.18
BFDHP 0.0071 0.0058 0.0053 0.0046
Morpheus 0.0036 0.0047 0.0043 0.0055
FIFO 0.0028 0.0033 0.0042 0.0048

the proposed scheme attempts to identify the most optimal executor placement for each job
in a cost-effective manner. The results also confirmed that the mean scheduling delay is
determined to be varied from 0.62 s to 3.12 s.
The mean scheduling delays of the proposed scheduler algorithm has the possibility of
causing deadline misses, since it facilitated a mean scheduling delay of 1.21 s, or higher
to 4.12–5.28 min for some of the jobs. It is noticed that the proposed scheduling algo-
rithm has a deadline miss percentage greater than the baseline scheduler algorithms. But
the degradation in performance with respect to the proposed scheduler is highly negligible

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1190 N. Jagadish Kumar, C. Balasubramanian

compared to the baseline scheduling algorithms used for comparison. Moreover, the pro-
posed scheme is more ideal in situations that do not require strict deadlines independent to
the periodic and regular jobs considered for scheduling as it facilitates better reduction in
cost for the prolonged period.

4.6 Performance Evaluation Based on Cost Efficiency

This performance evaluation aids in exhibiting the suitability and applicability of the pro-
posed HGDGEO algorithm with different application types for the objective of minimiz-
ing the cost incurred in utilizing a big data cluster. The VMs’ status with respect to each
second is saved for computing the total cost incurred by an individual scheduler. Further,
Eq. (5) is used for computing the complete per-second costs incurred by each scheduler.
Finally, the complete set of costs incurred by seconds are cumulatively added for determin-
ing the entire makespan of the scheduling process as specified in Eq. (17)

TCost = Cost(i) (16)
i∈T

where, Cost(i) and T represents the cost spent in each ith second and the schedulers’ total
makespan, respectively.
Figure 4 and 5 portrays the plots of cost incurred by different scheduling algorithms
for varying kinds of workloads that includes high and light-load hours. The proposed
HGDGEO algorithm used bin packing for the objective of consolidating the executors
to a minimal collection of VMs. Thus, the cost incurred by the proposed HGDEOA is
identified to be significantly reduced on par with the baseline schedulers used for com-
parison. Furthermore, the proposed HGDGEO algorithm incur slightly minimized cost
compared to the baseline ILPEPS, BFDHP, Morpheus and FIFO schedulers in all the
scenarios as it aided in better placement of executor for a job in a more cost-effective

500

450
MEAN JOB COMPLETION TIME (Seconds)

400

350

300

250

200

150

100

50

0
40 80 120 160 200
DIFFERENT BENCHMRKED APPLICATIONS USED FOR EVALUATION

Proposed ILPEPS BFDHP Morpheus FIFO

Fig. 4  Mean job completion time under light-loaded hours with respect to different benchmarked applica-
tions

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1191

300

MEAN JOB COMPLETION TIME (Seconds)) 250

200

150

100

50

0
40 80 120 160 200
DIFFERENT BENCHMRKED APPLICATIONS USED FOR EVALUATION

Proposed ILPEPS BFDHP Morpheus FIFO

Fig. 5  Mean job completion time under heavily loaded hours with respect to different benchmarked applica-
tions

manner. The results portrayed in Fig. 4 confirmed that the proposed HGDEOA algo-
rithm facilitated potential cost minimization during the heavy-loaded hour. With respect
to Sort and WordCount applications, the proposed HGDEOA algorithm reduced the cost
of cluster utilization by at least 23.42%, 26.38%, 29.14% and 31.98%, better than the
baseline schedulers. On the other hand, the proposed HGDEOA algorithm with respect
to PageRank application minimized the cost of resource utilization by a significant mar-
gin of 17.31%, and 19.56%, better than the Morpheus and FIFO schedulers. This pro-
posed HGDEOA algorithm handles the operations of shuffle to occur in the intra-node
level for improving the performance of the job that results in cost minimization dur-
ing the utilization of network-based applications. It is also identified to utilize mini-
mum level of VMs during the placement of executors to an expected level. In addition,
the proposed HGDEOA algorithm with respect to mixed application reduced the cost
of resource utilization by a remarkable margin of 15.49%, and 17.84%, better than the
Morpheus and FIFO schedulers.
The results portrayed in Fig. 5 proved the significance of the proposed HGDEOA algo-
rithm with respect to cost minimization during the light-loaded hour. With respect to Sort
and WordCount applications, the proposed HGDEOA algorithm under light-loaded hour
minimized the cost of cluster utilization by at least 20.31%, 22.18%, 24.62% and 27.19%,
better than the baseline schedulers. However, the cost reduction facilitated by the proposed
HGDEOA algorithm under the impact to high-load hour is lower compared to the cost
reduction achieved by it during the influence of light-load hour as it is highly utilized the
cluster to the desired level.
In addition, Fig. 6 and 7 portrays the predominance of the proposed HGDEOA algo-
rithm evaluated using cumulative VM cost with mixed workload during the heavy-loaded
and light-loaded hour. The cumulative VM cost incurred by the proposed HGDEOA algo-
rithm is comparatively minimized with mixed workload, since it adopted the balanced
degree of exploration and exploitation to the expected level. Moreover, the proposed
HGDEOA algorithm under mixed workload reduced the cumulative VM cost by at least
17.21%, 19.18%, 22.28% and 24.29%, better than the baseline schedulers.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1192 N. Jagadish Kumar, C. Balasubramanian

500
450
MEAN JOB COMPLETION TIME (Seconds)
400
350
300
250
200
150
100
50
0
40 80 120 160 200
DIFFERENT BENCHMRKED APPLICATIONS USED FOR EVALUATION

Proposed ILPEPS BFDHP Morpheus FIFO

Fig. 6  Cumulative VM cost under mixed workload during heavy-load hours

500

450
MEAN JOB COMPLETION TIME (Seconds)

400

350

300

250

200

150

100

50

0
40 80 120 160 200
DIFFERENT BENCHMRKED APPLICATIONS USED FOR EVALUATION

Proposed ILPEPS BFDHP Morpheus FIFO

Fig. 7  Cumulative VM cost under mixed workload during light-load hours

5 Conclusion

The proposed HGDEOA algorithm achieved better scheduling during the deployed clusters
processing of big data over the cloud under the existence of job heterogeneity and differ-
ent VM types. This proposed HGDEOA algorithm utilized bin packing phenomenon for
handling the problem of scheduling and minimized resource utilization cost with improved
job performance. It used balanced trade-off between the rate of exploration and exploration
and utilized minimized number of VMs with maximized placement of executors associ-
ated with each job. It was implemented by constructing a prototype system over the top of

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1193

Apache Mesos by extending it to adopt the requirements of the new scheduling strategies.
The experimental validation of the proposed HGDEOA algorithm and baseline schedulers
was conducted using the applications of PageRank, Sort, Word count and mixed derived
from BigDataBench. The results of the proposed HGDEOA algorithm confirmed mini-
mized cost resource utilization up to a maximized level of 23.68% in the cluster of cloud-
deployed Apache Spark (AS). Further, the proposed HGDEOA algorithm facilitated close-
fitting packing of executors over the fewer VMs by a significant margin of 18.32%, better
than the benchmarked schemes used for comparison. In addition, the proposed HGDEOA
algorithm under mixed workload reduced the cumulative VM cost by at least 17.21%,
19.18%, 22.28% and 24.29%, better than the baseline schedulers. As the part of future plan,
it is decided to improve the proposed HGDEOA algorithm with the factors of job interde-
pendency and cost which are some of the essential SLA requirements.

Funding The authors have not disclosed any funding.

Data Availability Enquiries about data availability should be directed to the authors.

Declarations
Competing Interests The authors have not disclosed any competing interests.

References
1. Tuli, S., Sandhu, R., & Buyya, R. (2020). Shared data-aware dynamic resource provisioning and task sched-
uling for data intensive applications on hybrid clouds using Aneka. Future Generation Computer Systems,
106, 595–606.
2. Souravlas, S., & Anastasiadou, S. (2020). Pipelined dynamic scheduling of big data streams. Applied Sci-
ences, 10(14), 4796.
3. Nivitha Pabitha, K., Parameshwaran (2022) C-DRM: Coalesced P-TOPSIS entropy technique addressing
uncertainty in cloud service selection. Information Technology and Control, 51(3), 592–605. https://​doi.​
org/​10.​5755/​j01.​itc.​51.3.​30881
4. Kang, Y., Pan, L., & Liu, S. (2022). An online algorithm for scheduling big data analysis jobs in cloud envi-
ronments. Knowledge-Based Systems, 245(4), 108628.
5. Jagatheswari Praveen, S., Ramalingam, J, Chandra Priya (2022). Improved grey relational analysis-based
TOPSIS method for cooperation enforcing scheme to guarantee quality of service in MANETs. Interna-
tional Journal of Information Technology, 14(2), 887–897. https://​doi.​org/​10.​1007/​s41870-​022-​00865-5
6. Kaladevi, P., Janakiraman, S., Ramalingam, P., & Muthusankar, D. (2023). An improved ensemble clas-
sificationbased secure two stage bagging pruning technique for guaranteeing privacy preservation of DNA
sequences in electronic health records. Journal of Intelligent & Fuzzy Systems, 44(1), 149–166.
7. Mashayekhy, L., Nejad, M. M., Grosu, D., Lu, D., & Shi, W. (2014). Energy-aware scheduling of MapReduce
jobs. IEEE International Congress on Big Data, 3(4), 12–24.
8. Rajalakshmi, Shenbaga Moorthy P., Pabitha (2019). Optimal provisioning and scheduling of analytics as a
service in cloud computing. Transactions on Emerging Telecommunications Technologies, 30(9). https://​
doi.​org/​10.​1002/​ett.​3609
9. Sun, D., Zhang, G., Yang, S., Zheng, W., Khan, S. U., & Li, K. (2015). Re-stream: Real-time and energy-
efficient resource scheduling in big data stream computing environments. Information Sciences, 319(4),
92–112.
10. Kaur, N., & Sood, S. K. (2017). Dynamic resource allocation for big data streams based on data character-
istics (5Vs). International Journal of Network Management, 27(4), e1978.
11. Upadhyay, U., & Sikka, G. (2020). MRS-DP: Improving performance and resource utilization of big data
applications with deadlines and priorities. Big Data, 8(4), 323–331.
12. Yin, L., Zhou, J., & Sun, J. (2022). A stochastic algorithm for scheduling bag-of-tasks applications on
hybrid clouds under task duration variations. Journal of Systems and Software, 184(4), 111123.
13. Li, H., Fang, H., Dai, H., Zhou, T., Shi, W., Wang, J., & Xu, C. (2021). A cost-efficient scheduling algo-
rithm for streaming processing applications on cloud. Cluster Computing, 25(2), 781–803.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1194 N. Jagadish Kumar, C. Balasubramanian

14. Shabestari, F., Rahmani, A. M., Navimipour, N. J., & Jabbehdari, S. (2019). A taxonomy of software-
based and hardware-based approaches for energy efficiency management in the Hadoop. Journal of Net-
work and Computer Applications, 126(4), 162–177.
15. Gokuldhev, M., & Singaravel, G. (2020). Local pollination-based moth search algorithm for task-schedul-
ing heterogeneous cloud environment. The Computer Journal, 65(2), 382–395.
16. Khallouli, W., & Huang, J. (2021). Cluster resource scheduling in cloud computing: Literature review and
research challenges. The Journal of Supercomputing, 78(5), 6898–6943.
17. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-qaness, M. A., & Gandomi, A. H. (2021).
Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers & Industrial Engineering,
157(4), 107250.
18. Mohammadi-Balani, A., Dehghan Nayeri, M., Azar, A., & Taghizadeh-Yazdi, M. (2021). Golden eagle
optimizer: A nature-inspired metaheuristic algorithm. Computers & Industrial Engineering, 152(2),
107050.
19. Wang, S., Jia, H., Abualigah, L., Liu, Q., & Zheng, R. (2021). An improved hybrid Aquila optimizer and
Harris hawks algorithm for solving industrial engineering optimization problems. Processes, 9(9), 1551.
20. Mashayekhy, L., Nejad, M. M., Grosu, D., Zhang, Q., & Shi, W. (2015). Energy-aware scheduling
of MapReduce jobs for big data applications. IEEE Transactions on Parallel and Distributed Systems,
26(10), 2720–2733.
21. Lu, Q., Li, S., Zhang, W., & Zhang, L. (2016). A genetic algorithm-based job scheduling model for big
data analytics. EURASIP Journal on Wireless Communications and Networking, 2016(1), 89–98.
22. Lim, N., & Majumdar, S. (2017). Resource management for MapReduce jobs performing big data analyt-
ics. Big Data Management and Processing, 3(4), 105–134.
23. Hashem, I. A., Anuar, N. B., Marjani, M., Gani, A., Sangaiah, A. K., & Sakariyah, A. K. (2017). Multi-
objective scheduling of MapReduce jobs in big data processing. Multimedia Tools and Applications,
77(8), 9979–9994.
24. Shao, Y., Li, C., Gu, J., Zhang, J., & Luo, Y. (2018). Efficient jobs scheduling approach for big data appli-
cations. Computers & Industrial Engineering, 117(2018), 249–261.
25. Hu, Y., Wang, H., & Ma, W. (2020). Intelligent cloud workflow management and scheduling method for
big data applications. Journal of Cloud Computing, 9(1), 2251–2272.
26. Seethalakshmi, V., Govindasamy, V., & Akila, V. (2020). Hybrid gradient descent spider monkey opti-
mization (HGDSMO) algorithm for efficient resource scheduling for big data processing in heterog-
enous environment. Journal of Big Data, 7(1), 34–48.
27. Islam, M. T., Srirama, S. N., Karunasekera, S., & Buyya, R. (2020). Cost-efficient dynamic scheduling
of big data applications in Apache spark on cloud. Journal of Systems and Software, 162, 110515.
28. Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for big data appli-
cations in IoT cloud computing environments. Cluster Computing, 24(4), 2957–2976.
29. Zhao, Y., Calheiros, R. N., Gange, G., Bailey, J., & Sinnott, R. O. (2021). SLA-based profit optimi-
zation resource scheduling for big data analytics-as-a-Service platforms in cloud computing environ-
ments. IEEE Transactions on Cloud Computing, 9(3), 1236–1253.
30. Viswanathan, Ramasamy Jagatheswari, Srirangan Praveen, Ramalingam (2021) Fuzzy and Position
Particle Swarm Optimized Routing in VANET. International journal of electrical and computer engi-
neering systems, 12(4), 199–206. https://​doi.​org/​10.​32985/​ijeces.​12.4.3

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO)… 1195

N. Jagadish Kumar Currently working as Assistsant Professor in the


Department of Information technology,Velammal Institute of
technology,Chennai,India and Pursuing Ph.D research under the fac-
ulty of Information and Communication Engineering at Anna
University,Chennai,India. He completed his Bachelor of Enginnering
degree in the Department of Computer science and engineering from
Prathyusha Engineering college,Chennai,Tamil nadu,India in 2005 and
Master of Engineering in the Department of Computer technology
from Anna University – MIT Campus Chennai, Tamil Nadu, India, in
2014 respectively. His research areas include Big data analytics,
Machine learning,cloud computing,Internet of things and Networks.
https://​orcid.​org/​0000-​0001-​7677-​9872.

C. Balasubramanian Currently working as a Professor in the Depart-


ment of Computer Science and Engineering, P.S.R. Engineering
College,Sevalpatti,Tamil Nadu,India. He received the M.E. degree in
computer science and engineering from Annamalai University, India,
in 2005, and he received his Ph.D. degree in Information and Commu-
nication Engineering in Anna University, Chennai in 2011. His current
research interests include data mining, bigdata analytics, and image
mining. https://​orcid.​org/​0000-​0001-​9066-​7287.

Authors and Affiliations

N. Jagadish Kumar1 · C. Balasubramanian2


C. Balasubramanian
balasubramanian@gmail.com
1
Department of Information Technology, Velammal Institute of Technology, Tamil Nadu,
Chennai 601204, India
2
Department of Computer Science and Engineering, P.S.R. Engineering College, Tamil Nadu,
Sivakasi 626140, India

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:

1. use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
2. use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at

onlineservice@springernature.com

You might also like