Professional Documents
Culture Documents
Robust Resource Scaling of Containerized Microservices With Probabilistic Machine Learning
Robust Resource Scaling of Containerized Microservices With Probabilistic Machine Learning
123
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
dynamically adjust the number of VMs assigned to a cloud Micro- Microservice Micro-
service service
application. Wajahat et al. [26] presented an application-
Auto-scaler
agnostic, neural network-based auto-scaler for minimizing
SLA violations of diverse applications. Delimitrou et al. [4]
used collaborative filtering to estimate the impact of resource Adaptive
Model
scale-out (more servers) and scale-up (more resources per Container Container
Engine Engine
server) on application performance. Iqbal et al. [8] used
supervised learning to predict the workload arrival rate Performance VM VM
Monitor
and combined the observed response time of the last k
Libvirt API
intervals to make resource provisioning decisions. Yang,
Hypervisor
Z. [29] proposed a model-based reinforcement learning for
microservice resource allocation. However, these works have Figure 3: RScale Architecture.
mainly focused on applying deterministic machine learning
models, which provide point estimates only without express- GP regression utilizes the concept of Gaussian Process and
ing uncertainty associated with the prediction. Recent work Bayesian inference to find a distribution over all the possible
using Bayesian Neural Network [32] is able to estimate functions that are consistent with the observed data. Hence,
the uncertainty in predictions for deep regression models. this probabilistic modeling approach can directly provide
However, it is challenging to quickly adapt such models to confidence bounds on its predictions, which is critical for
drastic variations in workload and performance interference making robust resource management decisions in an uncer-
patterns. In contrast, our work presents a robust resource tain cloud environment.
scaling system that utilizes an adaptive performance model Model Formulation. As with all Bayesian methods, GP
and directly incorporates predictive uncertainty into the regression begins with a prior distribution of functions, and
resource allocation problem. updates this as data points are observed, producing the
posterior distribution over functions. The prior distribution
III. RS CALE D ESIGN AND I MPLEMENTATION is defined by a GP, a collection of random variables, any
In this section, we present the key design and implemen- finite number of which have (consistent) joint Gaussian
tation of RScale, a resource scaling system that aims to distributions. For every input x there is an associated random
provide a robust performance guarantee for containerized variable f (x), which is the value of the stochastic function
microservices. f at that location. In this context of modeling containerized
microservices, x represents a vector of performance metrics
A. RScale Architecture collected from the application-layer, container-layer, and
Fig. 3 shows the interaction between various components VM-layer at a particular sampling interval. f (x) represents
of RScale. The performance monitor is responsible for the distribution of the end-to-end tail latency of a particular
periodically collecting data from the application, container, workflow for the given set of input data denoted by x.
and VM layers at each physical server that hosts the VMs GP assumes that p(f (x1 ), · · ·, f (xN )) is jointly Gaus-
belonging to an application owner (cloud user). The mapping sian, with a mean function m(x) and covariance function
of containers to VMs and VMs to the physical server can Σ(x) = k(xi , xj ), where k is a positive definite kernel.
be obtained via commonly available APIs provided by the The key idea is that if xi and xj are deemed by the
container orchestration engine (Kubernetes) and cloud man- kernel to be similar, then we expect the output of the
agement software (OpenStack) respectively. Probabilistic function at those points to be similar, too. A commonly
performance models based on Gaussian Process Regression used kernel function is squared-exponential
kernel given by
2
for respective microservice workflows are adapted based on kSE (x, x ) = σ 2 exp − (x−x2l2
)
. It calculates the squared
observed data samples. Finally, the auto-scalar component distance between points and converts it into a measure of
is responsible for adding or removing containers for various similarity, controlled by hyperparameters σ and l.
microservices to meet the end-to-end tail latency targets for Let f be the function values that have been observed
microservice workflows, based on the optimization problem (training set output) for a training input set X, and let f∗
formulated in Section III-B. be a set of function values that need to be predicted (test
B. Performance Modeling with Gaussian Process set output) corresponding to test input set X∗ . The joint
distribution of the function values can be expressed as:
GP regression is a non-parametric probabilistic modeling
technique. In contrast to a deterministic ML-based per- f m(X) K(X, X) K(X, X∗ )
∼N , (1)
formance modeling approach, which aims to approximate f∗ m(X∗ ) K(X∗ , X) K(X∗ , X∗ )
a particular function that can represent the relationship
between independent variables and the predicted variable, where m(X) and m(X∗ ) are the training and testing mean
functions. Usually, for notational simplicity the prior means
124
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
Table I: Kernel Selection using log marginal likelihood. Table II: Notation for Resource Scaling Optimization.
log marginal likelihood Symbol Description
kernel
Catalogue Cart S Set of microservices relevant to the target workflows
RBF -3.153E+03 -2.153E+03 R Set of container resource types(CPU, network)
RationalQuadratic -2.995E+03 -2.052E+03 SLOjtarget tail latency target of workflow j
Matern -3.036E+03 -2.086E+03 xir Average utilization of resource r in microsevices i
DotProduct -1.988E+15 -1.706E+15 m̄j (x) Posterior mean function of workflow j from Eq.2
RBF + RationalQuadratic -2.971E+03 -2.027E+03 σ̄j (x) Posterior standard deviation of workflow j from Eq.2
RBF + Matern -2.974E+03 -2.027E+03 Sj Set of microservices related to workflow j
RationalQuadratic + Matern -2.977E+03 -2.027E+03
metrics to enable auto-scaling features. For example, the
are assumed to be constant and zero. This is also consistent
auto-scaling feature in Kubernetes determines the allocation
with the practice of data normalization in machine learning.
of containers to a microservice by using the formula:
If there are n training points and n∗ test points then
K(X, X∗ ) denotes the n × n∗ matrix of the covariances
currentM etricr
evaluated at all pairs of training and test points, and similarly desiredReplicas = max( currentReplicas ∗ )
r desiredM etricr
for the other entries K(X, X), K(X∗ , X∗ ) and K(X∗ , X). (4)
Based on Bayesian inference, it is possible to get the If the desiredM etricr is specified as an average CPU
conditional distribution of f∗ given f as shown in Eq. (2). utilization of 40% for and the currentM etricr for CPU
−1 utilization is 50% for a particular microservice, the number
f∗ | f ∼ N m̄, Σ̄ ∼ N (K (X∗ , X) K (X, X) f,
−1
(2) of containers allocated to that microservice will be doubled
K (X∗ , X∗ ) − K (X∗ , X)∗ K (X, X) K (X, X∗ )) since the ratio is 1.2 (50%/40%). If multiple metrics are
This is the posterior distribution over target functions for a specified, this calculation is done for each metric, and then
specific set of test cases X∗ . The mean of this distribution, the largest of the desired replica counts is chosen. Any
m̄, will be used to predict the end-to-end tail latency of scaling is executed only if the ratio of currentM etricr
a microservice workflow. The standard deviation σ̄ derived and desiredM etricr drops below 0.9 or increases above
through Cholesky decomposition [5] of the variance Σ̄, will 1.1 (10% tolerance). It is challenging and burdensome for
be used to get the confidence bound on the prediction. application owners to determine the resource utilization
Kernel Selection. One of the key design for the general- thresholds for various microservices in order to meet the
ization properties of a GP model is the choice of the kernel application’s end-to-end performance target. Setting inap-
function. A multitude of possible families of kernel functions propriate thresholds may lead to overprovisioning or un-
exists such as Matern kernel, Rational quadratic kernel, etc. derprovisioning of resources. We now describe how RScale
New kernel can be derived by additive and multiplicative can enable cloud platforms to automatically determine these
combination of existing kernels[15]. We systematically com- thresholds based on user-provided performance SLO targets.
pare the log marginal likelihood (LML) of various kernels on 1) Problem Formation: Consider that SLO targets in
our training data to select the best kernel for the GP models terms of the end-to-end tail latency for a set of workflows
in Table I. We use the sum kernel RBF+RationalQuadratic are specified. For a given workload condition, we aim to
for our GP model corresponding to the highest LML value find the highest resource utilization values of the relevant
for both Catalogue and Cart workflows. microservices, at which the given SLO targets will not be
GP Model Fitting. For a set of n observations, the hyper- violated. These utilization values are calculated periodically
parameters associated with the kernel functions are estimated and set as the thresholds (desiredMetricValue) for making
and optimized during the fitting of GP model. This is done resource scaling decisions. These dynamic thresholds help in
by maximizing the log marginal likelihood, log p(f |X, θ) determining which microservices should be scaled, and how
i.e the probability of the data given the hyperparameters many containers should be allocated to each microservice
denoted by θ, as shown in Eq. (3). based on Eq. (4). This approach aims to avoid resource
overprovisioning while providing a performance guarantee
1 to the given workflows.
L = log p(f |X, θ) = − log |K (X, X) |
2 (3) We formulate resource scaling as a constrained optimiza-
1 n
− f T (K(X, X))−1 f − log(2π) tion problem as follows:
2 2
max xir (5)
C. Robust Resource Scaling.
iS rR
Although existing cloud platforms4 provide mechanisms
for auto-scaling microservices, they expect application own- s.t.∀j : m̄j (x) + κσ̄j (x) ≤ SLOjtarget (6)
ers to specify thresholds for various microservices load ∀i, r : xir ≥ 0 (7)
4 https://aws.amazon.com/ecs/ x = ((xir )rR )iSj (8)
125
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
30
Pod_CPU+Pod_Net
Algorithm 1: Online Adaptation and Resource Scal- 25
Pod_CPU+Pod_Net
Pod_CPU+Pod_Net+VM_CPU+VM_Net+VM_CPI
1.0
Pod_CPU+Pod_Net+VM_CPU+VM_Net+VM_CPI
ing 20
0.8
R2 Score
MAPE (%)
0.6
0.4
0.2
5
1: collect multi-layered data at sampling interval T ; 0 0.0
Catalogue-NN Catalogue-GP Cart-NN Cart-GP
2: Insert new sample into dataset D; Catalogue-NN Catalogue-GP Cart-NN
ML models
Cart-GP
ML models
126
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
0.06 0.06 0.06
μ = 276.79 μ = 278.90
Execution times(s)
Probability density
Probability density
Probability density
95% prediction interval
40 σ = 18.89 σ = 14.32
μ = 276.81
0.04 0.04 95% prediction interval 0.04 σ = 8.20
30
95% prediction interval
20
0.02 0.02 0.02
10
(a) Bootstrap overhead(NN). (b) Data size is 200. (c) Data size is 300. (d) Data size is 400.
Figure 5: Distribution of predicted tail latency for Cart workflow obtained using the Bootstrap method and its associated
overheads for various data sizes.
predicted latency(ms)
GP
60 GP
NN 320
a number of concurrent clients that generate HTTP-based 300
NN
Time(s)
REST API calls. To introduce performance interference, 40
280
memory-intensive STREAM[13] benchmark and the Iperf 20 260
ence was varied by changing the number of containers for data size data size
each interfering workload. (a) Overheads in GP and NN (b) Predicted tail latency with
models with bootstrapping. 95% prediction interval for NN
B. Machine Learning models.
and GP models.
We compare our GP regression based probabilistic model Figure 6: Comparing predictive uncertainty in GP and NN
with a multi-layer perceptron based Neural Network (NN), models (with bootstrapping) for Cart workflow. Measured
which is a representative deterministic model with superior tail latency is 280ms.
prediction accuracy. The input features of each model in-
absolute percentage error (MAPE) and the coefficient of
clude the number of concurrent clients, pod-level resource
determination, R2 . An R2 of 1 indicates that the regression
metrics (the average CPU utilization and network throughput
predictions perfectly fit the data. Figs. 4 show that the GP
of load-balanced pods for each microservice), and VM-level
regression models for the Catalogue and Cart workflows
resource metrics (the CPU utilization, network throughput
provided better prediction accuracy than NN models. To
and CPI of VMs that host the pods). We use the scikit-
collect the training data, we conduct extensive experiments
learn library’s randomized lasso[15] technique to reduce our
on our testbed by varying the number of concurrent clients
feature space and avoid potential over-fitting issues [14].
and the performance interference levels experienced by
The hyperparameters of NN models are tuned by scikit-
different microservices in the Robot Shop benchmark. We
learn’s GridSearchCV tuner. The prediction accuracy of NN
also vary the number of containers (pods) allocated to the
model is highly sensitive to the number of hidden layers
microservices, and measure the end-to-end tail latency of
and the number of neurons in each hidden layer. Hence,
various workflows as reported by the Locust tool.
we tuned these parameters through an exhaustive search for
various combinations of input feature space and the targeted D. Estimation of Predictive Uncertainty.
workflow for the prediction of end-to-end tail latency. The Estimation of uncertainty for NN. Making robust
optimal number of hidden layers for our NN model is three, resource management decisions in the face of changing
and the optimal number of neurons in these three hidden dynamics of the cloud environment requires the ability
layers is (5, 3, 3) for Catalogue workflow and (3, 4, 6) for to quantify predictive uncertainty of performance models.
Cart workflow. However, deterministic performance models, such as popular
C. Prediction Accuracy. machine learning (ML) based models applied in recent
works [21, 26, 29], provide point estimates only without
We evaluate the prediction accuracy of NN and GP models
confidence bounds. Although bootstrapping methods can
with two different input feature space. The first input feature
calculate the confidence bounds on each prediction made by
space includes only pod-level resource metrics and the
a deterministic model, they incur large overheads. Here, we
second input feature space uses both pod-level and VM-level
evaluate the bootstrap technique using NN model to estimate
metrics. We also evaluate the models with 10-fold cross-
the uncertainty in the prediction of tail latency for Cart work-
validation on the collected dataset and compared the mean
flow, when facing a workload of 55 concurrent clients and in-
5 https://locust.io terference on Web microservice from STREAM benchmark.
6 https://iperf.fr/ The NN model is trained with 80% of randomly sampled
127
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
40 40
Catalogue with 10 samples per interval Catalogue with 10 samples per interval
35 Cart with 10 samples per interval 35 Cart with 10 samples per interval
baseline for Catalogue workflow baseline for Catalogue workflow
30 30
MAPE(%)
MAPE(%)
baseline for Cart workflow baseline for Cart workflow
25 25
20 20
15 15
10 10
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Training interval Training interval
(a) Case 1: Initial model trained (b) Case 2: Initial model trained (a) Current vs Desired CPU (b) Current vs Desired Network
with dataset that includes in- with dataset that includes inter- utilization of various microser- throughput of various microser-
terference from memory inten- ference in Web and Cart mi- vices using NN and GP models. vices using NN and GP models.
sive workload. Model is adapted croservices. Model is adapted
online and tested with dataset online and tested with dataset Figure 8: Optimization of resource utilization thresholds for
that includes interference from that includes interference in efficient resource scaling with a workload of 30 concurrent
network intensive workload. Catalogue microservice. clients, and SLO target 170 ms for 95th percentile latency
Figure 7: Prediction errors during online adaptation of GP of Catalogue workflow.
model to changing interference pattern.
training data and used to predict the tail latency. This process
is repeated for 200 iterations to obtain a distribution of
predicted tail latency. The 95% confidence interval obtained
from this distribution provides the predictive uncertainty of
the model. We found that using 200 iterations for bootstrap
provided the optimal estimation accuracy for a given data
(a) Current vs Desired CPU (b) Current vs Desired Network
size, and iterations greater than 200 only increased the utilization of various microser- throughput of various microser-
overhead. Due to space limitation, we only show the results vices using NN and GP models. vices using NN and GP models.
obtained by using 200 bootstrap iterations for various data
sizes as shown in Fig. 5. We observe that the predictive Figure 9: Optimization of resource utilization thresholds for
uncertainty reduces with increasing data size. However, there efficient resource scaling with a workload of 30 concurrent
is a significant overhead of bootstrapping for all data sizes. clients, and SLO target 220 ms for 95th percentile latency
of Cart workflow.
Comparison of uncertainty between NN and GP.
Fig. 6 compares the predictive uncertainty and associated in the offline evaluation of the model with the offline training
estimation overheads in case of GP and NN models. In dataset. The adaptiveness of a GP regression model can be
contrast to NN models which require bootstrapping to es- attributed to its ability to learn more from less data and
timate predictive uncertainty, GP models directly provide lazy learning technique, which delays the generalization of
confidence bounds on each prediction. Fig. 6a compares training data until the next inference interval, thus enabling
the time taken by these two methods to estimate the 95% local approximation of the target function.
prediction interval. We observe that when the data size is
200, NN model with bootstrapping incurs 4X more overhead F. Performance Guarantee.
than GP model in estimating predictive uncertainty. This
We now evaluate the effectiveness of RScale’s resource
is despite the fact that we include the time taken for
scaling optimization in meeting the end-to-end tail latency
model fitting as a part of GP model’s estimation overhead.
targets of microservice workflows. For this purpose, we
However, model fitting may not be needed every time for GP
specify performance SLO targets of 170 ms and 220 ms
to make a prediction. This can further reduce its overhead
for Catalogue and Cart workflows respectively, when a
of uncertainty estimation. Fig. 6b shows that GP model is
workload of 30 concurrent clients is applied to the Robot
able to achieve smaller uncertainty in its prediction than the
Shop benchmark. Furthermore, we compare the impact of
NN model for various data sizes.
the resource scaling optimization based on NN and GP
regression models respectively. Fig. 8 compares the current
E. Model Adaptiveness.
(measured) resource utilization of the microservices relevant
To evaluate the adaptiveness of GP regression models, to Catalogue workflow and their desired resource utilization
we conduct experiments for two cases that require online values, when only one pod is allocated to each microservice.
adaptation of performance models in the face of varying In case of NN model based optimization, current CPU
interference patterns. As shown in Fig. 7, the prediction error utilization of microservices including Catalogue, Ratings
of the GP models converges quickly and stays close to the and MongoDB are less than their desired CPU utilization.
the baseline value as the model is adapted over the training Furthermore, the ratios of the current network read/write
intervals. Here, the baseline is the prediction error measured throughput and desired network read/write throughput for
128
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
50 Catalogue workflow Catalogue workflow
Cart workflow
0.05 Cart workflow
Prediction time(s)
Fitting time(s)
40 0.04
30 0.03
20 0.02
10 0.01
0 0.00
100 200 400 600 800 100 200 400 600 800
Training data set Training data set
(a) Catalogue workflow. (b) Cart workflow. (a) Fitting time. (b) Prediction time.
th
Figure 10: 95 percentile latency of microservice workflows Figure 11: Impact of training data size on GP.
with resource scaling optimization based on NN and GP.
a lazy learning technique that delays the generalization of
microservices including Web, rating, and Catalogue are training data until the inference (prediction) is made. It
within the tolerance range from 0.9 to 1.1. Therefore, NN means that they need to consider the training data each time
model based resource optimization does not suggest any they make a prediction. Hence, the size of training data
change in resource allocation. Whereas in case of GP model determines both the time taken to fit GP models and the
based optimization, the ratio of current CPU utilization and time taken to make a prediction as shown in Fig. 11.
desired CPU utilization and the ratio of current Network Fig. 12 shows the im- 3.0
Catalogue workflow
Read/Write throughput and desired Network Read/Write pact of training data size
Optimization time(s)
2.5 Cart workflow
throughput for Catalogue microservice are 1.3 and 2.8 on the time taken to solve 2.0
1.5
respectively. Hence, the desired number of replicas for the resource scaling opti- 1.0
Catalogue is three according to Eq. 4. The desired replicas mization problem given by 0.5
stay the same as current replicas for Ratings, MongoDB, Eqs. (5) - (8). We ob- 0.0
100 200 400 600 800
Training data set
and Web. Thus, the optimal resource scaling option for GP serve that the optimization
model based optimization is to add two additional pods problem takes more time
to the Catalogue microservice. Similarly, Fig. 9 shows the to converge in the case of Figure 12: Optimization time
optimal resource scaling option relevant to Cart workflow Cart workflow since it has for GP.
for NN is to allocate an additional pod to Web and another a much wider range of tail latency (100ms to 1000ms), com-
new pod to Cart and for GP is to allocate an additional pod pared to that of the Catalogue workflow (50ms to 250ms)
to Web and three new pods to Cart. for a given range of resource utilization values. Based on
Finally, combining the two different scaling options for our overhead analysis, we limit the training data size to 200
Catalogue workflow and Cart workflow, the optimal resource samples so that the overhead of data collection(Line 1-3 in
scaling option to satisfy both SLO targets at the same algorithm 1), the overhead of model fitting, prediction, and
time according to NN, is to add one pod to Web, Cart, optimization(Line 4-6 in algorithm 1), and the overhead of
and Catalogue respectively. On the other hand, the optimal scaling (Line 7 in algorithm 1) remain smaller than the data
solution according to GP is to add one pod to Web and sampling interval of 30 seconds.
three pods to Cart and Catalogue respectively. Fig. 10 shows
that when the microservices are scaled based on the NN’s H. Provisioning under varying load intensity
optimization model, the measured 95th percentile latency for
We now illustrate the
Catalogue and Cart workflows violates the SLO target. This
provisioning of resources 50
is due to the fact that resource scaling optimization was
concurrent clients
129
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
600
Prediction Measured latency
500
Measured latency Prediction
system dynamics. In future, we plan to develop interference-
SLO target 95% CI SLO target 95% CI
95% latency(ms)
95% latency(ms)
400
400
add three pods to add one pod to Cart, aware container scheduling technique that aims to minimize
Cart, one pod to Web, and Catalogue remove one pod
300
Web and Catalogue remove one pod
from Cart
from Cart and Web the resource contention experienced by the containerized
200
200 microservices when they are placed on a cluster of VMs. We
100
add a pod
to Cart
remove one pod
from Cart and Catalogue
add two pods remove one pod from
to Cart Cart and Catalogue
will further extend our work to include diverse microservice-
0 0
0 2 4 6
Sampling interval
8 10 0 2 4 6
Sampling interval
8 10 based applications with different resource bottlenecks.
(a) 95th percentile latency with- (b) 95th percentile latency with ACKNOWLEDGMENT
out interference. interference from iPerf. Results presented in this paper were obtained using the
Figure 14: Resource provisioning under dynamic workload
Chameleon testbed supported by the National Science Foun-
for Cart workflow.
dation. The research is supported by NSF CNS 1911012
grant.
400 800
Measured latency SLO target Measured latency SLO target
Predicted latency 95% CI Predicted latency 95% CI
R EFERENCES
95% latency(ms)
95% latency(ms)
0
to Catalogue
0
a cloud-native architecture. IEEE Software, 33(3):42–
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Sampling interval Sampling interval 52, May 2016.
[2] J. Brownlee. Statistical Methods for Machine Learn-
(a) Catalogue workflow. (b) Cart workflow.
ing: Discover how to Transform Data into Knowledge
Figure 15: Resource provisioning under varying interference.
with Python. Machine Learning Mastery, 2018.
pling interval 6 and 8 in order to avoid overprovisioning of [3] X. Chen, L. Rupprecht, R. Osman, P. Pietzuch, F. Fran-
resources. ciosi, and W. Knottenbelt. Cloudscope: Diagnosing
Fig. 14b shows similar results in the presence of perfor- and managing performance interference in multi-tenant
mance interference from iPerf colocated with the microser- clouds. In Proc. of the IEEE International Symposium
vices relevant to Cart workflow. We observe that RScale on Modeling, Analysis, and Simulation of Computer
allocates more resources and decreases fewer resources and Telecommunication Systems, MASCOTS ’15, page
when performance interference exists in the system. 164–173, 2015.
[4] C. Delimitrou and C. Kozyrakis. Quasar: Resource-
I. Provisioning under varying interference. efficient and qos-aware cluster management. SIGARCH
We evaluate the effectiveness of RScale in providing per- Comput. Archit. News, 42(1):127–144, Feb. 2014.
formance guarantee to Catalogue and Cart workflows under [5] D. Dereniowski and M. Kubale. Cholesky factoriza-
varying interference patterns. In this experiment, the initial tion of matrices in parallel and ranking of graphs.
GP models are trained with dataset including interference In R. Wyrzykowski, J. Dongarra, M. Paprzycki, and
from memory-intensive STREAM workload. As shown in J. Waśniewski, editors, Parallel Processing and Ap-
Fig. 15, the interfering workload is changed from STREAM plied Mathematics, pages 985–992, Berlin, Heidelberg,
to network-intensive iPerf after sampling interval 2. As a 2004. Springer Berlin Heidelberg.
result, the measured tail latency exceeds the SLO target [6] N. Dragoni, S. Giallorenzo, A. L. Lafuente, M. Maz-
at sampling interval 3. RScale is able to adapt the model zara, F. Montesi, R. Mustafin, and L. Safina. Microser-
quickly and scale the appropriate microservices at sampling vices: Yesterday, Today, and Tomorrow, pages 195–216.
interval 4. Finally, the measured tail latency meets the SLO Springer International Publishing, Cham, 2017.
target at sampling interval 5. [7] M. Fazio, A. Celesti, R. Ranjan, C. Liu, L. Chen, and
M. Villari. Open issues in scheduling microservices in
V. C ONCLUSION the cloud. IEEE Cloud Computing, 3:81–88, 09 2016.
In this paper, we designed and implemented RScale, [8] W. Iqbal, A. Erradi, M. Abdullah, and A. Mahmood.
a robust resource scaling system that provides end-to-end Predictive auto-scaling of multi-tier applications using
performance guarantee for containerized microservices de- performance varying cloud resources. IEEE Transac-
ployed in multi-tenant cloud. Unlike existing works, RScale tions on Cloud Computing, pages 1–1, 2019.
ensures system robustness by explicitly incorporating pre- [9] V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Ry-
dictive uncertainty estimated from probabilistic performance balkin, and C. Yan. Speeding up distributed request-
models into the resource allocation problem. Experimental response workflows. SIGCOMM Comput. Commun.
evaluation demonstrates that RScale can meet the end-to- Rev., 43(4):219–230, Aug. 2013.
end tail latency of microservice workflows even in the [10] D. Jiang, G. Pierre, and C.-H. Chi. Autonomous
face of multi-tenant performance interference and changing resource provisioning for multi-service web applica-
130
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.
tions. In 19th International World-Wide Web Confer- services and its applications. Sigmetrics Performance
ence (WWW), pages 471–480, 01 2010. Evaluation Review, 33:291–302, 06 2005.
[11] P. Lama and X. Zhou. Autonomic provisioning with [24] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and
self-adaptive neural fuzzy control for percentile-based T. Wood. Agile dynamic provisioning of multi-tier in-
delay guarantee. ACM Transactions on Autonomous ternet applications. ACM Transactions on Autonomous
and Adaptive Systems, 8:9:1–, 07 2013. and Adaptive Systems, 3, 03 2008.
[12] J. Li, N. K. Sharma, D. R. K. Ports, and S. D. Gribble. [25] M. Villamizar, O. Garcés, L. Ochoa, H. Castro, L. Sala-
Tales of the tail: Hardware, os, and application-level manca, M. Verano, R. Casallas, S. Gil, C. Valen-
sources of tail latency. In Proc. of the ACM Symposium cia, A. Zambrano, and M. Lang. Infrastructure cost
on Cloud Computing, SOCC ’14, page 1–14, 2014. comparison of running web applications in the cloud
[13] J. McCalpin. Memory bandwidth and machine bal- using aws lambda and monolithic and microservice
ance in high performance computers. IEEE Technical architectures. In 2016 16th IEEE/ACM International
Committee on Computer Architecture Newsletter, pages Symposium on Cluster, Cloud and Grid Computing
19–25, 12 1995. (CCGrid), pages 179–182, 2016.
[14] N. Meinshausen and P. Bühlmann. Stability selec- [26] M. Wajahat, A. Gandhi, A. Karve, and A. Kochut.
tion. Journal of the Royal Statistical Society Series Using machine learning for black-box autoscaling. In
B, 72:417–473, 09 2010. Proc. of IEEE International Green and Sustainable
[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, Computing Conference (IGSC), pages 1–8, 01 2016.
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, [27] L. Wang, J. Xu, H. Duran-Limon, and M. Zhao.
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, Qos-driven cloud resource management through fuzzy
D. Cournapeau, M. Brucher, M. Perrot, and E. Duches- model predictive control. In Proc. of IEEE Interna-
nay. Scikit-learn: Machine learning in Python. Journal tional Conference on Autonomic Computing, pages 81–
of Machine Learning Research, 12:2825–2830, 2011. 90, 07 2015.
[16] F. A. Potra and S. J. Wright. Interior-point methods. [28] Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail:
Journal of Computational and Applied Mathematics, Avoiding long tails in the cloud. In Proc. of USENIX
124(1):281 – 302, 2000. Numerical Analysis 2000. Symposium on Networked Systems Design and Imple-
Vol. IV: Optimization and Nonlinear Equations. mentation (NSDI), 2013.
[17] J. Rahman and P. Lama. Predicting the end-to-end tail [29] Z. Yang, P. Nguyen, H. Jin, and K. Nahrstedt. Miras:
latency of containerized microservices in the cloud. In Model-based reinforcement learning for microservice
Proc. of the IEEE International Conference on Cloud resource allocation over scientific workflows. In Proc.
Engineering (IC2E), pages 200–210, 06 2019. of IEEE International Conference on Distributed Com-
[18] J. Rao and C.-Z. Xu. Online measurement of the capac- puting Systems (ICDCS), 2019.
ity of multi-tier websites using hardware performance [30] Q. Zhang, L. Cherkasova, and E. Smirni. A regression-
counters. In Proc. of the IEEE International Con- based analytic model for dynamic resource provision-
ference on Distributed Computing Systems (ICDCS), ing of multi-tier applications. In Fourth International
2008. Conference on Autonomic Computing, ICAC’07, pages
[19] C. Rasmussen and C. Williams. Gaussian Processes 27–27, 07 2007.
for Machine Learning. MIT Press, 01 2005. [31] Y. Zhang, D. Meisner, J. Mars, and L. Tang. Treadmill:
[20] R. Singh, U. Sharma, E. Cecchet, and P. Shenoy. Attributing the source of tail latency through precise
Autonomic mix-aware provisioning for non-stationary load testing and statistical inference. SIGARCH Com-
data center workloads. In Proc. of the 7th International put. Archit. News, 44(3):456–468, June 2016.
Conference on Autonomic Computing, ICAC ’10, 01 [32] L. Zhu and N. Laptev. Deep and confident prediction
2010. for time series at uber. In IEEE International Confer-
[21] S. Subbiah, j. wilkes, X. Gu, H. Nguyen, and ence on Data Mining Workshops. IEEE, 2017.
Z. Shen. Agile: Elastic distributed resource scaling
for infrastructure-as-a-service. In Proc. of 10th Inter-
national Conference on Autonomic Computing (ICAC
13), 2013.
[22] L. Suresh, P. Bodik, I. Menache, M. Canini, and
F. Ciucu. Distributed resource management across
process boundaries. In Proc. of ACM Symposium on
Cloud Computing (SoCC), 2017.
[23] B. Urgaonkar, G. Pacifici, P. Shenoy, M. Spreitzer, and
A. Tantawi. An analytical model for multi-tier internet
131
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:09:19 UTC from IEEE Xplore. Restrictions apply.