Professional Documents
Culture Documents
Smilax
Smilax
Smilax
net/publication/340238386
CITATIONS READS
5 286
2 authors:
All content following this page was uploaded by Euripides Petrakis on 31 May 2021.
Panagiotis Giannakopoulos
School of Electrical and Computer Engineering, Technical University of Crete (TUC), Cha-
nia, Greece, e-mail: pgiannakopoulos1@isc.tuc.gr
Euripides G.M. Petrakis
School of Electrical and Computer Engineering, Technical University of Crete (TUC), Cha-
nia, Greece, e-mail: petrakis@intelligence.tuc.gr
1
2 P. Giannakopoulos, E. G.M. Petrakis
1 https://flink.apache.org/
2 https://docs.docker.com/engine/swarm/
Smilax 3
alytics platforms. Initial ideas for a statistical machine learning model for
the scaling of resources are discussed in [4]. Arabnejad et al. [2] compare
two different autoscaling types of Reinforcement Learning (RL), which is
SARSA and Q-learning. The autoscaler dynamically resizes Web applica-
tions in order to meet the quality of service requirements. Bibal Benifa and
D. Dejey [3] propose the RLPAS algorithm, which applies RL using a neural
network in order to reduce the time for convergence to an optimal policy.
Rossi, Nardelli and Cardellini [5] propose RL solutions for controlling the
horizontal and vertical elasticity of container-based applications in order to
cope with varying workloads. These autoscalers do not adapt their scaling
model to changes of the application’s behavior at run-time.
The following solutions are all reactive: DS2 [7] enables automatic scal-
ing of Apache Flink applications. A controller assesses the running applica-
tion at operator level in order to detect possible bottlenecks in the data-flow
(i.e. operators that slow down the whole application). In contrast to Smilax
which monitors and scales applications at job level (i.e. multiple operators
or tasks may execute in a job), DS2 is designed to adjust the parallelism of
each operator separately in order to maintain high throughput. Autopilot3
is a proprietary solution for Ververica platform which is designed to drive
multiple high throughput, low latency stream processing applications on
Apache Flink. There are also solutions which have been incorporated into
the real-time analytics platforms of commercial cloud providers: Apache
Heron4 is the stream processing engine of Twitter; Dataflow5 is a serverless
autoscaling solution that supports automatic partitioning and re-balancing
of input data streams to servers in the Google Cloud Platform.
3 Smilax Ecosystem
Apache Flink provides an extensive toolbox of operators for implement-
ing transformations on data streams (e.g. filtering, updating state, aggregat-
ing). The data-flows or jobs (i.e. operations chained together) form directed
graphs (Job Graphs), that start with one or more sources and end at one
or more sinks. The Flink cluster consists of a Job Manager and a number
Task Managers (workers). The Job Manager controls the operation of the
entire cluster: schedules the workers, reacts to finished or failed tasks, load
balances the workload among Task Managers, coordinates checkpoints and
recovery from failures. The Task Managers are the machines (servers) which
execute the tasks of a workflow. A task represents a chain of one or more
operators that can be executed in a single thread or server. A task can be
executed in parallel (on separate Task Managers). Each parallel instance of
a task is a subtask. The number of subtasks running in parallel is the paral-
lelism of that particular task.
3 https://docs.ververica.com/user_guide/application_operations/autopilot.html
4 https://incubator.apache.org/clutch/heron.html
5 https://cloud.google.com/dataflow
4 P. Giannakopoulos, E. G.M. Petrakis
The number of allocated Task Managers varies over time and it is reg-
ulated by Smilax agent. Smilax agent monitors the operation of all tasks
and, depending on workload and performance, decides to change the par-
allelism of a task (i.e. scale-up or down). Flink is particularly flexible, but
making the most out of it can become a challenging task that requires in
depth understanding of its underlying architecture (especially in the case of
multiple workflows with many operators executing in multiple layers). For
simplicity of the discussion, the following assumptions do apply in Smilax:
each Task Manager (worker) runs the entire Job Graph (workflow), which
means that the number of allocated workers is identical to the parallelism
of the job. Rescaling actions (e.g. adding or removing a worker) will mod-
ify the parallelism of all operators of a subtask at the same time. Changing
the parallelism of individual operators would require that Smilax monitors
each operator separately and takes scaling decisions based on the perfor-
mance of each individual operator, in the example of DS2 [7]. If more than
one workflow run on the same Flink cluster, taking optimal scaling decisions
for each individual workflow requires monitoring the performance of each
workflow separately (i.e. a separate model must be built for each workflow).
An application receives data records (or events) from streaming sources
such as Apache Kafka6 . Kafka, queues data from application sources like
databases, sensors, mobile devices, cloud services etc. Kafka reads data
streams in topics and in parallel (i.e. events are appended to more than one
partitions defined for that topic). The incoming workload is monitored by
inspecting the Kafka topics which are the data sources of the running job.
The workload represents the number of records per second the system re-
ceives. Kafka queues are empty if the system consumes (processes) the re-
ceived data at a rate higher than the production rate; otherwise, the data
remains in the queue (slow records). The average length of Kafka queues
is an indicator of whether the system can keep up with the data produc-
tion rate. Prometheus service 7 is responsible for the monitoring of running
applications. Prometheus retrieves the Kafka metrics by querying the HTTP
endpoint of JMX8 (i.e. Prometheus cannot connect to Kafka directly). Apache
Zookeeper9 is a coordination service for the Kafka queues.
The percentile of slow records is computed as queue-length/workload. In
Smilax, quality of service is represented by an SLA metric which is defined
as the percentile of slow records per second that a client (e.g. application
owner or user) can accept. In this work, the threshold is 90% (i.e. less than
10% of the number of records can remain in the queue or more than 90% of
the records are processed instantly). Smilax, collects information from Kafka
6 https://kafka.apache.org
7 https://prometheus.io
8 https://docs.oracle.com/en/java/javase/15/jmx/
9 https://zookeeper.apache.org
Smilax 5
10 https://flink.apache.org/news/2020/08/20/flink-docker.html
11 https://issues.apache.org/jira/browse/FLINK-12312
6 P. Giannakopoulos, E. G.M. Petrakis
the optimal parallelism of the system. This is also a two stage process: (a) in
the first phase, future predictions of the workload are derived based on past
(i.e. recent) values by applying linear regression. Assuming that the rate will
not change in the near future, the workload is predicted based on the slope
of this curve representing the rate of change of the workload (i.e. whether it
increases, decreases or it is steady). For the next 60 seconds and for every 5
seconds, the output is an array with 12 values. (b) For each future workload
and according to the model, the performance (i.e. percentile of slow records)
of the application can be predicted as well. For each predicted value of the
workload, the performance takes a value for each possible parallelism. The
optimal parallelism Starget is the minimum parallelism which satisfies the
SLA (i.e. the percentage of slow records per second is less than 10%). Algo-
rithm 2 illustrates this process.
Algorithm 2 Scaling policy during optimal control
1: procedure P ROACTIVE S CALER
2: w f uture ← WorkloadPredictor()
3: parallelismSet ← [1 . . . nmax ]
4: Starget ← nmax
5: for ni ∈ parallelismSet do . select optimal parallelism
6: performancePoints ← Predict(ni , w f uture )
7: evaluation ← CheckViolation(performancePoints)
8: if evaluation then
9: Starget ← ni
10: break
11: Snew ← Hysterisis(Starget )
12: scale(Snew )
13: procedure W ORKLOAD P REDICTOR
14: w past ← GetPastWorkload(10mins)
15: slope ← LinearRegression(w past )
16: w f uture ← slope.predict(1min)
return w f uture
17: procedure C HECK V IOLATION(performancePoints)
18: for each pointi ∈ performancePoints do
19: if point i ≥ SLA then return false
return true
20: procedure H YSTERISIS(Starget )
21: Sold ← CurrentNumberOfTaskmanagers
22: if Starget > Sold then
23: Snew ← Sold + α · (Starget − Sold )
24: else if Starget < Sold then
25: Snew ← Sold + β · (Starget − Sold )
26: else
27: Snew ← Sold
return Snew