Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Comparative Analysis of Energy-Efficient Scheduling Algorithms for Big

Data Applications
ABSTRACT:

Nowadays, big data analytics has been widely applied in addressing the growing cybercrime threats.
However, energy consumption is explosive increasing with the fast growth of big data processing in anti-
cybercrime. In this paper, an energy-efficient framework for big data applications is proposed to reduce
energy consumption while satisfying deadline constrains. First, the problem of energy-efficient tasks
scheduling of a single Spark job is modeled as an Integer Program. We design an energy-efficient tasks
scheduling algorithm to minimize the energy consumption for big data application in Spark. To avoid
SLA violations for execution time, we propose an optimal task scheduling algorithm with deadline
constrains by trading off execution time and energy consumption. Experiments on a Spark cluster are
performed to determine the energy consumption and execution time for several workloads from the
HiBench benchmark suite. Our algorithms consume less energy on average than FIFO and FAIR under
deadlines. The optimal algorithm is able to find near optimal tasks schedules to trade off energy
consumed and response time benefit in small shuffle partitions.

Existing system:

The existing studies on Spark scheduling focused on minimizing the time between the arrival and the
completion time of a big data application [6] [8] [9] [10]. To increase performance and reduce costs,
Muhammed et al. [6] proposed a resource allocation framework of Spark to provide fine-grained resource
allocation for any type of big data applications. Resource waste occurs while a big data application runs
in all the nodes in a Spark cluster

Proposed System:

We design tasks scheduling algorithms that optimize the energy efficiency of running big data application
in Spark clusters, while satisfying the deadline constrains. In this algorithm, based on an energy
consumption model for Spark, a strategy table for the relationship between task and executor is designed
to reduce the energy consumption for spark applications. Compared with original scheduling algorithms
of Spark FIFO and FAIR, Our algorithm can satisfy the SLA by trading off the execution time and energy
consumption. It also effectively reduces the total energy consumption of Spark application under deadline
constrains.

CONCLUSION:

Spark as a unified engine for big data processing has been widely used in anti-cybercrime. While big data
applications are executed on large Spark clusters, energy consumption is a critical concern in data centers.
Energy-Efficient tasks scheduling under deadline constraint has become a key problem for Spark. In this
paper, we design two energy-efficient tasks scheduling algorithms in Spark for big data applications. Our
objective is to understand the tradeoff between energy consumption and execution time for several
different kinds of workload in benchmarks such as Sort, TeraSort, and PageRank. This work provides
some insight on the relationship between energy consumption and SLA guaranteeing.

REFERENCES:
[1] P. Dhaka and R. Johari, “CRIB: Cyber crime investigation, data archival and analysis using big data
tool,” in Computing, Communication and Automation (ICCCA), 2016 International Conference on,
Noida, India, 2016, pp. 117–121.

[2] O. M. Adedayo, “Big data and digital forensics,” in Cybercrime and Computer Forensic (ICCCF),
IEEE International Conference on, Vancouver, BC, Canada, 2016, pp. 1–7.

[3] A. Shalaginov, J. W. Johnsen, and K. Franke, “Cyber crime investigations in the era of big data,” in
Big Data (Big Data), 2017 IEEE International Conference on, Boston, MA, USA, 2017, pp. 3672–3676.

[4] M. Zaharia, R. S.Xin, and P. Wendell et al., “Apache spark: a unified engine for big data processing,”
Communications of the ACM, vol. 59, no. 11, pp. 56–65, Nov. 2016, DOI. 10.1145/2934664.

[5] M. Zaharia, M. Chowdhury, and T. Das et al., “Resilient distributed datasets: A fault-tolerant
abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX conference on
Networked Systems Design and Implementation, San jose, CA, USA, 2012, pp. 2–2.

[6] M. T. Islam, S. Karunasekera, and R. Buyya, “dSpark: Deadline-based Resource Allocation for Big
Data Applications in Apache Spark,” in eScience (e-Science), 2017 IEEE 13th International Conference
on, Auckland, New Zealand, 2017, pp. 89–98.

[7] A. Z. Zhang, “Scheduler module explain,” in Spark internals: design and implement principle of Spark
core, 1th ed. Beijing, China: China Machine Press, 2015, ch. 4, sec. 3, pp. 57–72.

[8] J. Chen, K. Li, and Z. Tang et al., “A parallel random forest algorithm for big data in a Spark cloud
computing environment,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 4, pp. 919–
933, Apr. 2017, DOI. 10.1109/TPDS.2016.2603511.

[9] H. Yang, X. Liu, and S. Chen et al., “Improving Spark performance with MPTE in heterogeneous
environments,” in Audio, Language and Image Processing (ICALIP), 2016 International Conference on,
Shanghai, China, 2016, pp. 28–33.

[10] A. Gounaris, G. Kougka, and R. Tous et al., “Dynamic configuration of partitioning in spark
applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, pp. 1891–1904, Jul.
2017, DOI. 10.1109/TPDS.2017.2647939.

[11] J. Huang, Y. Zhou, and Q. Duan et al., “Semantic Web Service Composition in Big Data
Environment,” in GLOBECOM 2017 - 2017 IEEE Global Communications Conference, Singapore,
Singapore, 2017, pp. 1-7.

[12] J. Huang, J. Zou, and C. Xing, “Competitions among Service Providers in Cloud Computing: A New
Economic Model,” IEEE Transactions on Network and Service Management, PP. 1–12, Apr. 2018, DOI.
10.1109/TNSM.2018.2825358.

[13] J. Huang, Q. Duan, and S. Guo et al., “Converged NetworkCloud Service Composition with End-to-
End Performance Guarantee,” IEEE Transactions on Cloud Computing, pp. 1–15, Oct. 2015, DOI.
10.1109/TCC.2015.2491939.
[14] R. Buyya, C. S. Yeo, and S. Venugopal et al., “Cloud computing and emerging IT platforms: Vision,
hype, and reality for delivering computing as the 5th utility,” uture Generation computer systems, vol. 25,
no. 6, pp. 599-616, Jun. 2009, DOI. 10.1016/j.future.2008.12.001.

[15] S. Srikantaiah, A. Kansal, and F. Zhao, “Energy Aware Consolidation for Cloud Computing,” in
Proceedings of the 2008 conference on Power aware computing and systems, San Diego, CA, USA, 2008,
pp. 10-10.

You might also like