REPORT_Hazem_CHAABI_rev

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 42

Training cycle for engineers in Telecommunications

Option :
SYSTIC

End of study project report

Topic :

« Programming urgent applications in the


Edge-to-Cloud continuum »

Done by :
Hazem CHAABI
Supervisors :
Ms. Rim Barrak – Sup’Com
Ms. Helene Coullon – IMT-Atlantique
M. Daniel Balouek-Thomert – University of Utah

Work proposed and carried out in collaboration with

Academic year : 2021/2022

Ecole Supérieure des Communications de Tunis


SUP’COM
2083 Cité Technologique des Communications - Elghazala - Ariana - Tunisie
Tél. +216 70 240 900 – site web : www.supcom.mincom.tn
Acknowledgment
Summary
Contents

Summary

List of Figures

List of Tables

General Introduction 2

Chapter 1 General Presentation 4


1.1 Introduction...........................................................................................................4
1.2 Presentation of the Host Organisation.................................................................4
1.3 Presentation of the Project...................................................................................5
1.3.1 Problem statement..........................................................................................5
1.3.2 Internship final goals.................................................................................6
1.4 Conclusion....................................................................................................................6

Chapter 2 Background 7
2.1 Introduction...........................................................................................................7
2.2 Edge-to-Cloud Continuum.....................................................................................7
2.3 Urgent Computing.................................................................................................8
2.4 Dynamic Reconfiguration......................................................................................8
2.5 Use case: Canny Edge Detection..........................................................................8
2.6 Software Product Line...................................................................................10
2.7 Feature Model...........................................................................................................10
2.8 Monolith VS Micro-services..........................................................................12
2.9 Software Environment.........................................................................................13
2.10 Conclusion..................................................................................................................14
BIBLIOGRAPHY

Chapter 3 Related Work 15


3.1 Introduction.........................................................................................................15
3.2 Variability Management Modelling....................................................................15
3.3 Re-Configuration........................................................................................................16
3.4 Conclusion..................................................................................................................17

Chapter 4 Scientific Contribution 18


4.1 Introduction.........................................................................................................18
4.2 Features model and Workflow.................................................................................18
4.3 Decision model..........................................................................................................18
4.4 Dynamic Reconfiguration....................................................................................20
4.5 Conclusion..................................................................................................................20

Chapter 5 Implementation and Evaluation 21


5.1 Introduction.........................................................................................................21
5.1.1 Hardware Environment...........................................................................21
5.2 Use-case Implementation..........................................................................................22
5.3 Building the Decision Model...............................................................................24
5.3.1 Data collection.........................................................................................24
5.3.2 Data analysis............................................................................................25
5.3.3 Data pre-processing......................................................................................26
5.3.4 Modelling the machine learning model.......................................................27
5.4 Dynamic Reconfiguration....................................................................................29
5.4.1 Creation of the images of the micro-services.............................................29
5.4.2 Deployment automation...............................................................................29
5.5 Evaluation............................................................................................................31
5.6 Conclusion..................................................................................................................32

General Conclusion 33

Bibliography 34
List of Figures

1.1 Logos of IMT-Atlantique and the university of Utah...........................................5

2.1 Edge-to-Cloud Continuum.....................................................................................8


2.2 Example of a Feature Model...............................................................................11
2.3 Monolith Architecture to micro-services.............................................................12

3.1 Learning method on the video generator................................................................16

4.1 Steps for building the decision model.....................................................................20

5.1 Before and after applying Canny Edge Detection algorithm on an image.........23
5.2 the CED workflow Micro-services Architecture..................................................23
5.3 the feature model of the CED micro-services.....................................................24
5.4 Box plot of the execution time of CED in Edge and Cloud paradigms.............26
5.5 Scatter plot showing the computing time in each of the computing paradigms
colored by the used smoothening filter
26
5.6 Train test Split................................................................................................27
5.7 Predicting the computing paradigm parameter..................................................28
5.8 Predicting the Operator parameter.........................................................................28
5.9 Predicting the smoothening filter parameter.........................................................29
5.10 Sequence diagram of the dynamic deployment scenario.......................................30
5.11 XGBoost accuracy on the operators, smoothening filter and computing
paradigm parameters.
31
List of Tables

5.1 Computer characteristics....................................................................................21


5.2 A portion of the collected database...................................................................25
5.3 An Encoded portion of the collected database.......................................................27
5.4 Comparison between the output of the decision and the expected output.........31
1

Abreviations List

• CED = Canny Edge Detection.

• G5k = Grid’5000.

• K8s = Kubernetes.

• SPL = Software Product Line.

• abv3.
General Introduction

With emergence of distributed infrastructures, the Cloud computing paradigm is


increas- ingly moving towards a full continuum from IoT devices and sensors to the
centralized Cloud, with Edge (edge of the network) and Fog computing (core network) in
between. At the same time, distributed applications also evolve. Urgent computing
tackles services that needs time-critical decisions. that It aims to improve quality of life,
monitor civil infrastructures, react to natural disasters and extreme events, and
accelerate science (e.g., autonomous cars, disaster response, precision medicine, etc.).
These services are usually sensitive to latency and response time and are among the top
candidate for the IoT to Cloud com- puting continuum [1].

In this internship, we deem a new breed of urgent smart services using the IoT-to-Cloud
Continuum, combined alongside the recent advancements in Artificial Intelligence and
Big Data Analytics. First, these services and applications need a large computing power
to perform well, while usually being under the restrictions to move data from the edge
of the network to the Cloud [2]. Second, these services and applications demand system
support to program reactions that arises at run-time, particularly when the target
infrastructure capacities and capabilities is unknown during the design [3].

It is challenging to run such services with a guaranteed performance on the continuum.


From the application viewpoint, various types of events associated to the data to be al-
tered drive its configuration (i.e., what should it run?) [4]. From the infrastructure angle,
objectives related to the urgency of the results or the resources usages impacts the
place- ment of function across the IoT-to-Cloud Continuum (i.e., where should it run?)
[5].

The report is organized as follows:

• In the first chapter, we describe the general frame of the project by introducing the

2
3

host organisation and then presenting the project overview.

• Throughout the second chapter, we present the Background of this internship by


explaining some important concepts like urgent computing, dynamic reconfiguration,
software product lines, Micro-services, Canny Edge Detection and giving the set of
tools that we used.Those concepts will help to clarify the understanding of the
following chapters.

• In the third chapter, we will present the existing Related Work.

• In the fourth chapter, We will present our Scientific Contribution to the addressed
problem.

• The fifth and last chapter, we will detail the technical implementation of our con-
tribution along with an evaluation to the solution.
Chapter 1

General Presentation

1.1 Introduction
This end-of-studies project was a collaboration between IMT-Atlantique Nantes and The
University of Utah. This chapter is composed of three sections, the first one is a presen-
tation of the host school in which I did my internship. In the second section, we present
the need of our project and we specify the goals of our work.

1.2 Presentation of the Host Organisation


IMT-Atlantique is a leading school of general engineering and an internationally recog-
nized research center. IMT Atlantique is one of the top 10 engineering schools in France
and one of the top 400 universities in the world according to THE World University
Ranking. It is a leading general engineering school of the Ministry of Industry and Digital
Technologies in France, the first ”Mines-Telecom” school of the Institut Mines T´el
´ecom, created on 1 January 2017 from the merging of Mines Nantes and T´el´ecom
Bretagne. A school with a first-rate research potential, internationally recognised for its
research (present in 5 disciplines in the Shanghai, QS and THE rankings).
A school with a strong presence in its territories, to whose development it contributes. A
school aware of its environmental and social responsibilities. It has obtained the Sustain-
able Development and Social Responsibility label in 2019 for 4 years.

In this internship, I was a member of the LS2N lab which is a joint research unit
pools the digital research strengths of three higher education institutions (University of
Nantes,

4
1.3. Presentation of the Project 5

Centrale Nantes, IMT-Atlantique Nantes).


Also, I’m a member of the STACK Team which is a joint team of the Automation and
Computer Science Departement of IMT Atlantique and Inria’s (Inria is the French na-
tional institute for research in digital science and technology) research center in Rennes,
STACK is also a team of Laboratoire des Sciences du num´erique de Nantes (LS2N).

The Scientific Computing and Imaging (SCI) Institute at the University of Utah is an
internationally recognized leader in visualization, scientific computing, and image anal-
ysis applied to a broad range of domains. The SCI Institute brings together faculty in
bioengineering, computer science, mathematics, and electrical engineering in applying ad-
vanced computing technologies to challenges in a variety of domains, including biology
and medicine. The SCI Institute includes 19 faculty members and over 200 other scien-
tists, administrative support staff, and graduate and undergraduate students.

The overarching goals of the SCI Institute’s scientific computing research are to cre-
ate new techniques, tools, and systems, by which scientists may solve problems
affecting various aspects of human life.

Figure 1.1: Logos of IMT-Atlantique and the university of Utah.

1.3 Presentation of the Project

1.3.1 Problem statement


Urgent Science describes decision-making driven by data and events to mitigate
negative impacts under strict time constraints. The urgent applications are not only
expected to execute fast but also to react fast. For example, we can avoid some
catastrophes by any early alert, like:

• Disaster response and management (e.g., earthquake aftermath)


1.4. Conclusion 6

• Climate crisis (e.g., predicting wildfire)

• Disease spreading (e.g., Ebola, COVID-19)

Upon detecting an urgent alert, in most cases we need to do further complex calcula-
tions (e.g., executing a machine learning model) in order to confirm that it is not a false
positive. As the IoT devices that detected the alert are generally not powerful enough to
do the complex calculations, Dynamic reconfiguration can be the solution as it allow us
to move from one configuration to another while the application is running .

1.3.2 Internship final goals


In this project, We aim to develop a solution to dynamically reconfigure urgent applica-
tions implemented as a workflow in the Edge-to-Cloud continuum, using machine
learning to take the decisions of the needed reconfiguration.

1.4 Conclusion
This chapter is dedicated to describe the host organisation and the project framework.
The next chapter we specify a literature review to establish the theoretical framework of
the bestowed work.
Chapter 2

Background

2.1 Introduction
This chapter outlines the relevant fundamentals required to implement the proposed work.
We define at first Edge-to-Cloud Continuum, Urgent Computing and Dynamic Reconfig-
uration which are necessary concepts to understand our work. After that, we present
our use case application which can be viewed as a software product line. Then, we
introduce Feature Models and give the utilities of Micro-service architectures. Finally,
we detail the software environment used to develop our solution.

2.2 Edge-to-Cloud Continuum concept


The Edge-to-Cloud continuum describes the management of distributed computing and
network infrastructures to meet the capacity, security and cost-effectiveness requirements
of an edge Eco-system.
Each of the components of the Edge-to-Cloud Continuum has its benefits and
drawbacks, that’s why combining these components can be beneficial to the applications
that requires a decent performance and latency altogether (e.g., urgent applications).

7
2.3. Urgent Computing 8

Figure 2.1: Edge-to-Cloud Continuum.

Source: [6]

2.3 Urgent Computing


Urgent computing is the backbone technology of early warning systems (EWS) for surveil-
lance, abnormality detection, forecasting and disaster avoidance. It addresses services that
require time-sensitive decisions to enhance the quality of life, to monitor civil infrastruc-
ture, to act in the event of natural disasters and extreme incidents, and to accelerate the
science (e.g. self-driving cars and precision medicine). This type of applications requires
a dynamic reconfiguration during the executing in order to run in a perfect manner.

2.4 Dynamic Reconfiguration


Dynamic reconfiguration refers to the process of adding, removing, moving resources or
changing their internal configuration in the network without disabling the primary node
involved.

2.5 Use case: Canny Edge Detection


To make a dynamically re-configurable urgent application, we chose Canny Edge Detec-
tion as use case for us. So, what is CED? In 1983, John Canny of MIT introduced the
2.5. Use case: Canny Edge Detection 9

CED technique. It is the most commonly used and popular edge detection tool. Canny
is a more efficient method of edge extraction than any other method currently available,
and it gives good results. The Canny operator can manage a variety of edge image data
and effectively remove noise.[7]

This is a multi-stage algorithm:

• Step 1: Image smoothing


As edge detection is sensitive to the noise in the image, the first step is to use a
smoothening filter (e.g., ”Gaussian filter”, ”Median filter” and ”Anisotropic filter”)
to remove high frequency noise from the image.

• Step 2: Image gradient


The image is then smoothened and gradient filtered with an operator (e.g., ”Sobel
operator”, ”Robert Cross operator” and ”Prewitt operator”).

• Step 3: Non-maximum suppression


After a determination of the gradient direction, the image is thoroughly checked
to eliminate any irrelevant pixels that might not be forming part of the edge. To
achieve this, each pixel is compared to its two adjacent pixels. If the compared
pixel is larger than its adjacent one, the pixel is not modified. If not, the pixel is not
the maximum value, and it is zeroed.

• Step 4: Edge tracking by hysteresis


This phase decides which edges are real and which are not. For this we need two
threshold levels, minimum value (m) and maximum value (M). Edges with an gra-
dient intensity greater than M are likely to be edges, while those with an intensity
gradient less than m are not sure to be edges and are therefore rejected.
Depending on their connectivity, those that fall between these two levels are
categorized as edges or non-edges. They are considered to be edges if they are
connected to pixels on the ”sure edge ”. If not, they are also rejected.

CED is as a chain of features that executes in a manner in which every feature has a
predecessor that produces a necessary output for the execution of that feature.
2.6. Software Product Line 10

2.6 Software Product Line and feature model


A software product line is a bundle of software intensive systems sharing a common,
managed set of properties that meet the particular needs of a specific market sector or
mission and are developed from a common set of base assets in a prescribed manner.

[2.7] Feature Model


A feature model shows the information of all achievable products of a software product
line in terms of features and relationships between them. A feature model is defined as
a hierarchically arranged set of features composed by:

• relationships between a parent-feature and its child-features.

• cross-tree restrictions that are mostly inclusion or exclusion statements in the


form: if some feature F is used, then features A and B must also be inserted (or
ignored)

Figure 2.2 describes a simplified feature model simplified by the mobile phone production.
The model depicts how features are used to designate and develop software for mobile
phones.
The program loaded in the phone depends on the features that it supports. According to
the model, all phones must contain support for calls, and displaying information in
either a basic, coloured or high resolution screen. Moreover, the software for mobile phones
may optionally contain support for GPS and multimedia tools such as video camera and
an MP3 player.
2.7. Feature Model 11

Figure 2.2: Example of a Feature Model.

Source: [8]

We group together as baseline feature models those that provide the following rela-
tionships between the features:

• Mandatory. A child feature has a obligatory connection to its parent when the
child is embedded in all products in which its parent feature is present. For
example, each mobile phone must provide call support.

• Optional. A child feature has an optional relation to its parent when the child can
be optionally included in all of the products in which the parent feature appears.
In the example, mobile phones may have GPS support as an optional feature.

• Alternative. A set of child features is in alternative relationship with its parent


when only one child feature can be selected while its parent feature is part of the
product. In this example, mobile phones may include support for a basic screen, a
coloured screen or a high resolution one, but only one of them can be selected.

• Or. A set of child features has an or relationship with its parent when one or
several of them can be included in the products in which the parent feature is
found. In Figure 2.2, when selecting Media, Camera, MP3 or both can be
included.

A feature model may also contain cross-tree constraints among features. These constraints
are usually of the form:
2.8. Monolith VS Micro-services 12

• Requires. If an A feature requires a B feature, the presence of A in a product


implies the presence of B in that product. Mobile phones with a camera must
support a high resolution screen.

• Excludes. If a certain feature A excludes a certain feature B, the two features


cannot be included in the same product. GPS and basic screen are two
incompatible features.

2.7[2.8] Monolith VS Micro-services


The decomposition of monoliths into micro-services is essential for dynamic reconfigura-
tion giving the bad adaptability to reconfiguration of monoliths. In [9] they said ”mono-
lithic architectures represent a single large application composed of tightly interdependent
and non reusable components”, this confirms the need of the migration to micro-
services based solution to gain more freedom of customization as they combine small,
single func- tionality, loosely linked services, to build up more advanced functionalities.

To achieve this migration from a monolith to micro-services, we started by making a


feature model of the monolith for the purpose of identifying the features representing it.
After that, each feature will become itself a micro-service or a set of micro-services (de-
pending on the complexity of the feature). Also, in our case, the workflow of the services
needs to be defined because of their dependency of each other.

Figure 2.3: Monolith Architecture to micro-services.


2.9. Software Environment 13

2.8[2.9] Software Environment


To develop the solution, a wide range of software and technologies are used.

• Programming languages:

◦ Python: a high-level programming language designed to be easy to read and


simple to implement. It is open source and is often used to create web appli-
cations and dynamic web content.
◦ Bash: the default shell on most GNU/Linux systems.

• Programming tools:

◦ PyCharm: one of the best known Python Integrated Development Environ-


ments (IDE), developed by a Czech organisation called JetBrains.
◦ Google Colab: allows you to write and run Python code in the browser with
easy sharing, free access to GPUs and no configuration required.
◦ Jupyter Notebook: is a web-based open source application which you can use
to create and share documents containing code, equations, graphs and text.

• The used libraries:

◦ OpenCV: is an open source computer vision and machine learning library. it


was built to provide a common infrastructure for computer vision
applications and to accelerate the use of machine awareness in commercial
products.
◦ NumPy: allows numerical computations to be performed with Python. It
introduces easy management of arrays of numbers.
◦ Scikit-learn: is a open source machine learning library for the Python
program- ming language. It includes several algorithms for classification,
regression and clustering.
◦ Flask: a light-weight Python web framework that provides a set of tools and
features to build web applications in Python.

• DevOps tools:

◦ Docker: allows you to build container images and manage them.


2.10. Conclusion 14

◦ Kubernetes: a portable, extensible, open source platform for managing con-


tainerized workloads and services, that facilitates both declarative configura-
tion and automation.

2.9[2.10] Conclusion
In this chapter, several basic concepts necessary for the implementation of the project
have been outlined.
Chapter 3

Related Work

3.1 Introduction
In this chapter, we are going to state the related work in the field of Variability manage-
ment and also the proposed solutions in literature to reconfigure applications.

3.2 Variability Management Modelling


Variability management is about creating and managing multiple applications as well as
the variability within and between these applications. The main purpose of variability
management is to present an overview of the variability across the whole product line, to
define the dependencies across variations and to coordinate the overall variability across
multiple domain assets. Several approaches have been proposed to model the variability
of micro-services.

For instance, Naily et al. introduces a framework to engineer the connected micro-services
as a software product line. They have proposed a framework (ABS micro-services frame-
work) for developing softwares based on microservices with the Software Product Line
Engineering (SPLE) concept. They aim to reduce the effort of adapting to changes in
requirements by using the rigorous variability management approach offered by SPLE.

Meanwhile, in [11], Acher mentions that Feature Models are the most used variability
management approach in the industry based on a survey. Later, in his HDR, he intro-
duced FAMILIAR (for FeAture Model scrIpt Language for manIpulation and
Automatic

15
3.3. Re-Configuration 16

Reasoning) which is a domain-specific language with textual syntax that permits opera-
tions on multiple feature models and their configurations.

3.3 Re-Configuration
Previous studies in the literature have been devoted to investigate the performance of
dif- ferent AI methods for automatically re-configuring applications due to the high
number of parameters that applications have nowadays.

In [11], Acher introduced his approach to constrain variability models.As the example
in Figure 3.1 shows, at first, he gathers a training dataset using different configurations
of the variability model. After that, he labels that dataset with Boolean values (accept-
ed/not accepted output). Then, the labeled data is passed through a machine learning
decision tree model to learn the patterns for classifying the images as accepted or not
and therefore presenting new constraints. Combining the variability model with the new
constraints give us a new one that only produces the accepted images.

Figure 3.1: Learning method on the video generator.

Source: [11]

In [12],they present a system that uses machine learning classifiers to ensure high
accuracy in offloading (choosing the which computation paradigm to run on) . The pro-
3.4. Conclusion 17

posed solution is based on a contextual database for training and testing classification
algorithms.

3.4 Conclusion
In this chapter, we presented how the variability management is being modeled, in our
case we combine feature models with workflows because we’re programming an urgent
application. Also, we mentioned the methods used in literature to configure
applications.
Chapter 4

Scientific Contribution

4.1 Introduction
In this chapter, we will give an overview of our main contribution, from combining
feature models with workflows to take re-configuration decisions using machine learning
to using those decisions to dynamically re-configure our application.

4.2 Features model and Workflow


Every application can be expressed as a feature model. But, in the case of urgent com-
puting, we ought to combine it with the workflow. This combination introduces new
constraints for the features model as the components of a workflow are linked and each
its functions requires an input the output of its predecessor.

4.3 Decision model


To make the decision of which configurations should be executed We can hard code every
scenario that could happen and choose a specific configuration for it. But, that takes a
lot of coding time and errors can be made and in some cases it can be very challenging
and almost impossible to do. For example, Linux kernel has 15000+ options and most of
them can have 3 values: ”yes”, ”no”, or ”module”. Overall, there may be more than
105000 possible variants of Linux (the estimated number of atoms in the universe is 1080)
[11].
A solution for this is to automate it using machine learning approaches. First, we need
to build the dataset. the process of building the dataset is quite simple, we need to run

18
4.3. Decision model 19

several different experiments and collect and save the used parameters in the experiments
along with some other features (e.g., execution time). For example, in our use case, we
collected the variability parameters of the application (operators and smoothening filters
+ their internal parameters) along with the execution time, the quality of the output
(good, medium, and bad) and the computing paradigm (”Edge” and ”Cloud”). Note that
the bigger the collected dataset the better the decision model will be.
After that, we need to prepare the data to be used for training. The data preprocessing
differs from a use case to another. But, it is generally one or a combination of these
Techniques:

• Data Cleaning.

• Dimensionality Reduction.

• Feature Engineering.

• Data Transformation

• Data balancing

• Sampling Data.

The last step of the data preprocessing is sampling the data, we need to divide the
collected dataset to at least two parts, train data to feed it the machine learning decision
model in order to learn the its patterns and test to validate the performance of the trained
model.
The next step is to build a decision model. At first, we have to choose a metric for our
model to optimize (e.g., ”accuracy”, ”F1 score”, ”recall”), the metric must be adequate
to the type of the dataset, the distribution of different classes and the use case. After
that, we need to pick a machine learning model. this part is very challenging due to the
huge number of algorithms that can be used(e.g., ”decision tree”, ”random
forest”,”neural networks”). So, the solution is to train as much as models as possible,
evaluate their performances using the test data, sort them with the chosen metric and
pick the most efficient one or combine the top candidates for better generalization.
4.4. Dynamic Reconfiguration 20

Figure 4.1: Steps for building the decision model.

4.4 Dynamic Reconfiguration


To re-configure our application, we used the output from the decision model in Section 4.3
(predicted configurations) to automatically re-configure while fully respecting the
specifi- cation provided by the decision model.

4.5 Conclusion
In this chapter, we detailed our contribution. At first, we presented why we have to
combine feature models with workflows to represent urgent applications. Then, why we
need to use machine learning to take the re-configuration decision. Finally, we specified
our approach to re-configure applications with the output of the decision model.
Chapter 5

Implementation and Evaluation

5.1 Introduction
This chapter is dedicated to present the working environment at first. Then, we begin by
Implementing our use case. Following that, we detail the process of building the dataset
from the gathering to analysing it and how it is preprocessed. Then, we present the
steps to train our Decision Model. After that, we go through how we made our solution
dynamically re-configurable. Finally, we evaluate our work.

5.1.1 Hardware Environment


The hardware used to develop and validate the functionality of the proposed solution is
composed of:

- A computer with the following characteristics:

Model Hp Pavilion
Processor Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
RAM 16GB
Data Storage 512GB SSD
Operating System (OS) Windows 11

Table 5.1: Computer characteristics.

- The testbed Grid’5000[13]:

21
5.2. Use-case Implementation 22

a scalable and flexible testbed for experimental research in all areas of computing, with
a focus on parallel and distributed computing, including the Cloud, Big Data and AI.
Key features:

• provides access to a large amount of resources: 15000 cores, 800 compute-nodes


grouped in homogeneous clusters, and featuring various technologies: PMEM,
GPU, SSD, NVMe, 10G and 25G Ethernet, Infiniband, Omni-Path.

• highly reconfigurable and controllable: researchers can experiment with a fully


cus- tomized software stack thanks to bare-metal deployment features, and can
isolate their experiment at the networking layer.

• advanced monitoring and measurement features for traces collection of networking


and power consumption, providing a deep understanding of experiments.

• designed to support Open Science and reproducible research, with full traceability
of infrastructure and software changes on the testbed.

5.2 Use-case Implementation


To implement our solution, we took CED as a use case. we started by programming the
Canny Edge detection [7] application from scratch because the application already built
in OpenCV is implemented as a monolith. So, we have to break it down to a bunch of
micro-services to have the kind of variability we need. To make sure that our CED micro-
services version application works perfectly, we compared pixel to pixel the output
images of the Canny Edge detection by OpenCV [14] with ours. The results perfectly
match with each other. The Figure 5.1 show an example of input/output of the CED
algorithm.
5.2. Use-case Implementation 23

Figure 5.1: Before and after applying Canny Edge Detection algorithm on an image.

Later, we divided our implementation of Canny Edge Detection to several functions.


Each one is implemented as a micro-service which that does one specific task in order to
make each one of them a micro-service. After some thinking, we decided to divide them
asThese micro-services are presented in the Figure 5.2 shows. We tried to design the
archi- tecture of Canny Edge Detection micro-services version so that it has as much
variability control as possible so that it can serve us in our upcoming work.
In the proposed Architecture (Figure 5.2), after reading the input file, the use of a
smoothening filter is optional. But, all of the other steps in this workflow are manda-
tory. They are …

Figure 5.2: the CED workflow Micro-services Architecture.


5.3. Building the Decision Model 24

Figure 5.3: the feature model of the CED micro-services.

After making the micro-services, it was time to containerize them so that we can
transfer them between environments without being afraid from affected by the
environmental setups and also for K8s orchestration later on.

5.3 Building the Decision Model

5.3.1 Data collection


A critical component of building a machine learning system is the creation of high
quality database. We have 3 CED parameters to predict individually, which are: the
smoothening filter (Gaussian filter, Anisotropic filter, Median filter), the operator (Sobel
operator, Robert Cross operator, Prewitt operator) and the computing Paradigm (Edge,
Cloud). Those parameters are predicted using mainly the execution time and the quality
of the output.
To create all these data, we used Grid’5000 which gave us access to a huge amount
of resources. As almost all of the machines provided by G5K have high performance
hardware, we decided tothe underclock ( ie reduce the cpu CPU speed) to a speed of
1.2GHz to simulate the processors of Edge devices. the Process of the collection goes like
thisas following:

• Reserving machines on G5K.

• Pulling from GitHub our CED program to the reserved machines.

• Execute all the parameters combinations of the CED Program on each of the re-
served machines.
5.3. Building the Decision Model 25

• Collecting the execution time and the quality of the output on each run.

Time Quality Computing paradigm Operator smoothening filter


37 2 Edge Robert Gaussian
53 2 Cloud Prewitt Median
27 1 Edge Robert Anisotropic
30 3 Cloud Sobel Median

Table 5.2: A portion of the collected database.

5.3.2 Data analysis


In order to get better insights of the collected dataset, we made plots of some of the
features. In the Figure 5.4, we can see that the CED generally take more computing time
in the Edge which is totally normal because the Cloud hardware is more powerful than
the Edge hardware. But in some cases, the Edge is faster than the Cloud and that only
happens when we try the produce low quality output in the Edge and high quality
output in the Cloud.
Also, we can see in the Figure 5.5 that the smoothening filter parameter has an important
effect on the computing time. The Median filter takes more time computing time which is
predictable because that value of filter parameter is necessary for producing high quality
outputs.
5.3. Building the Decision Model 26

Figure 5.4: Box plot of the execution time of CED in Edge and Cloud paradigms.

Figure 5.5: Scatter plot showing the computing time in each of the computing
paradigms colored by the used smoothening filter.

5.3.3 Data pre-processing


The Machine learning algorithms require that input and output variables are numbers.
This means that we must encode the computing-paradigm, operator and filter-
parameters columns (see Table 5.3) to numbers before you we fit and evaluate the
models. We used a popular
5.3. Building the Decision Model 27

encoding technique called Label Encoding. This approach is very simple, it converts each
value in a column to a number.

Time Quality Computing paradigm Operator smoothening filter


37 2 1 1 1
53 2 0 0 2
27 1 1 1 0
30 3 0 2 2

Table 5.3: An Encoded portion of the collected database.

Before moving to the model training we have to split our dataset into train and test
datasets (Train test split is a model validation procedure that allows you to evaluate
the performance of a model on new data). We split our data using a function in the
python library Sklearn called train-test-split into 70% train data and leaving 30% for
testing as Figure 5.6 shows.

Figure 5.6: Train test Split

5.3.4 Modelling the machine learning model


In order to make our decision model, we made a comparison between the most known per-
forming models, we made this comparison for each of the predicted parameters (computing-
paradigm, operator and filter-param). Figure 5.7 shows the accuracy percentage while
predicting the computing paradigm using Time, Quality, Operator and Filter-param as
inputs in each of the trained models.
5.3. Building the Decision Model 28

Figure 5.7: Predicting the computing paradigm parameter

Figures 5.8 and 5.9 show a performance comparison between the classification models
while predicting the operator parameter and the smoothening filter parameter respectively.

Figure 5.8: Predicting the Operator parameter


5.4. Dynamic Reconfiguration 29

Figure 5.9: Predicting the smoothening filter parameter

As shown in Figures 5.7, 5.8 and 5.9 the XGBoost is the most performing model
overall, so based on this experimental result we decided to save it in the three
prediction cases and make it the decision making model for the next steps.

5.4 Dynamic Reconfiguration

5.4.1 Creation of the images of the micro-services


We started by creating separate images for our micro-services. In order to do that, we
need to do this for each of our micro-services:

• creating a requirements.txt file that contains all the dependencies and their versions
that are needed to execute the micro-service.

• creating a Dockerfile containing all the instructions needed to build the image.

• Building the image with docker build command.

After that, we create a docker compose file in which we specify the ports that are going
to used for each of the micro-services along with their corresponding images so that they
can communicate with each other.

5.4.2 Deployment automation


To automate the deployment process of our micro-services, we need to containerize the
images we already built. We use Skaffold to automatically create kubernetes YAML files
5.4. Dynamic Reconfiguration 30

which are needed to create the containers and expose the ports specified in our docker
compose file. After that, we run the Skaffold dev command to deploy our containers and
check every second for changes in the files required to build the images. Upon detecting
a modification, Skaffold automatically deletes the container that needs to be changed
and replace it with a new one that have the required modifications.
As shown in Figure 5.10, first a request to the decision model is made to predict the
optimal configuration. If the model output is different from the deployed configuration,
we update the the configuration of the micro-service(s) that need to be changed. Then,
Skaffold will detect the modifications and automatically change the deployed
containers.

Figure 5.10: Sequence diagram of the dynamic deployment scenario


5.5. Evaluation 31

5.5 Evaluation
The common evaluation for this type of problems is to evaluate the performance of the
decision model [15].
The chosen decision model (XGBoost) gave us a minimum accuracy of 65% while pre-
dicting the operator and smoothening filter parameters, and an accuracy of 85% while
predicting the computing paradigm. Note that in urgent applications, the minimum al-
lowed confidence (accuracy in our case) is 60%.

Figure 5.11: XGBoost accuracy on the operators, smoothening filter and computing
paradigm parameters.

Input parameters (time,


Output (filter, operator, Expected output (filter, op-
quality, computing
time) erator, time)
paradigm)
(20,bad,Cloud) (’Anisotropic’, ’Robert’, 13) (’Anisotropic’, ’Robert’, 20)
(55,medium,Cloud) (’Median’, ’Prewitt’, 48) (’Median’, ’Robert’, 55)
(60,medium,Cloud) (’Median’, ’Robert’, 52) (’Median’, ’Robert’, 60)
(80,good,Cloud) (’Median’, ’Sobel’, 52) (’Median’, ’Sobel’, 75)

Table 5.4: Comparison between the output of the decision and the expected output.

As you can see in Table 5.4, the output of the decision model is sometimes different
of what we are expecting. But, the quality of the output image is the same as we
requested and the execution time requested is in most cases respected.
5.6. Conclusion 32

5.6 Conclusion
During this chapter, we have successfully implemented a dynamically re-configurable
ap- plications that takes as an input an image and gives an output the edges inside it.
For the reconfiguration decisions, we built a model that takes the desired execution
time of the application (a deadline), the desired quality of the output image and the
comput- ing paradigm and predicts the optimal configurations. those predicted
configurations are detected and deployed automatically without any manual
intervention.
General Conclusion

With the technological advancements, programming urgent applications is a must giving


that it improves human life and optimising those applications will lead to a more secure
life and even get us a step closer to achieving long awaited technologies (e.g., Autonomous
Vehicles and E-Medicine).

In this graduation internship, we have successfully made an approach to dynamically


deployed urgent application based on machine learning. This approach can save a lot of
time writing hard coded re-configuration scenarios which some time may be impossible to
make if number of configurations is big.

During this project, we first combined feature models with workflows to model urgent
applications. Then, we built a dataset that contains different execution scenarios. Next,
with the collected data we have built a machine learning model to predict the optimal
configurations to make during a re-configuration. Finally, we used the predicted configu-
rations to re-configure the deployed urgent application.

We have achieved:

• Operator prediction: Accuracy = 68%.

• Smoothening filter prediction: Accuracy = 65%.

• Comuting paradigm prediction: Accuracy = 85%.

However, this work can still be improved. For example, we can venture some other
decision models (e.g., reinforcement learning). Also, we can build a language to model
feature models and workflows.

33
Bibliography

[1] Daniel Balouek-Thomert, Ivan Rodero, and Manish Parashar. Harnessing the computing
continuum for urgent science. ACM SIGMETRICS Performance Evaluation Review, 2020.

[2] Kevin Fauvel, Daniel Balouek-Thomert, Diego Melgar, Pedro Silva, Anthony Simonet,
Gabriel Antoniu, Alexandru Costan, V´eronique Masson, Manish Parashar, Ivan
Rodero, and Alexandre Termier. A distributed multi-sensor machine learning approach
to earth- quake early warning. Proceedings of the AAAI Conference on Artificial
Intelligence, 2020.

[3] Eduard Gibert Renart, Daniel Balouek-Thomert, and Manish Parashar. An edge-based
framework for enabling data-driven pipelines for iot systems. 2019.

[4] Maverick Chardet, H´el`ene Coullon, and Simon Robillard. Toward safe and efficient
recon- figuration with concerto. Science of Computer Programming, 2021.

[5] Emile Cadorel, H´el`ene Coullon, and Jean-Marc Menaud. Online multi-user
workflow scheduling algorithm for fairness and energy optimization. 2020.

[6] Edge-to-cloud continuum. https://tinyurl.com/Edge-to-cloud.

[7] Mohd Ansari, Diksha Kurchaniya, and Manish Dixit. A comprehensive analysis of image
edge detection techniques. International Journal of Multimedia and Ubiquitous
Engineering, 12:1–12, 11 2017. doi: 10.14257/ijmue.2017.12.11.01.

[8] Paolo Arcaini, Angelo Gargantini, and Paolo Vavassori. Generating tests for detecting
faults in feature models. 2015 IEEE 8th International Conference on Software Testing,
Verification and Validation, ICST 2015 - Proceedings, 2015.

[9] Andr´e Carrusca, Maria Cec´ılia Gomes, and Jo˜ao Leit˜ao. Microservices
Management on Cloud/Edge Environments. Springer International Publishing, 2020.

[10] Moh Naily, Maya Setyautami, Radu Muschevici, and Ade Azurat. A Framework for
Modelling Variable Microservices as Software Product Lines. 2018. doi: 10.1007/ 978-3-319-
74781-1 18.

34
BIBLIOGRAPHY 35

[11] Mathieu Acher. Modelling, Reverse Engineering, and Learning Software Variability. PhD
thesis, Universit´e de Rennes 1, 2021.

[12] Warley Junior, Eduardo Oliveira, Albertinin Santos, and Kelvin Dias. A context-sensitive
offloading system using machine-learning classification algorithms for mobile cloud
environ- ment. Future Generation Computer Systems, 2019.

[13] Grid’5000. https://www.grid5000.fr.

[14] Opencv. https://opencv.org.

[15] Hongzhi Guo, Jiajia Liu, and Jianfeng Lv. Toward intelligent task offloading at the edge.
IEEE Network, 2020.

You might also like