Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Research Seminar Report

Framework to Monitor Docker containers using Prometheus and Grafana

Submitted by: Rafay Kashif


Course of Study: Research in Computer and Systems Engineering
Field of Study: Informatik
Academic Supervisor: Dr.-Ing. Detlef Streitferdt
Abstract: 3

Introduction: 3
Methodology: 5
Docker-Compose 6
NodeJS Sample Application 8
Challenges: 9
Results: 9
Resource usage at startup: 13
Running Natively: 13
Docker: 14
Resource usage while uploading a file: 15
Running natively: 15
Docker: 16
Resource usage when downloading a file 17
Running Natively: 17
Docker: 18
Analysis: 18
Conclusion: 19
References: 20
Abstract:
This research paper presents a comprehensive comparison between locally run applications
and those deployed within Docker containers. This study will be evaluating performance,
scalability, and development workflow aspects and the impact of using or not using docker in
them. Through testing and analysis by running different workflows we’ve measuret key metrics
such as execution time, resource utilization, and throughput along with usage of memory are
measured. The findings highlight the trade-offs between local execution and applications
running in Docker, emphasizing the advantages of Docker in terms of reproducibility, consistent
across environments, and simplified deployment. The insights garnered from this research will
help developers and system administrators in making informed decisions when selecting
between local and Dockerized deployments depending on which approach works better for
them.

Introduction:
Containerization in recent years has been a growing trend among the software
development community and it has continued to be developed around, as a result
containers and containerization technologies have completely revolutionized the way
software and applications are developed, deployed and scaled. Similar other new trends
in the tech industry many technologies aim to capture as much of the market as
possible to stand out as the number one in their own space, however in the
containerization space Docker has been the most successful due to its compatibility,
portability and resource utilization. Docker allows developers to encapsulate code into
lightweight and isolated containers that perform consistently throughout all types of
diverse environments which was a phenomena unheard of before containerization.
While there are many benefits to containerization, it is important to understand the
difference between running applications in containers and running them natively on your
host system.

This paper provides a comparison between running applications in docker and running
them natively by running research and test-cases around both topics and comparing the
patterns and outputs that have been recognized. In turn it also might help developers to
make decisions on which implementation might be better for their use-case.

Initially, the paper will be looking at the basics of running applications natively and
running them inside containers with a deep dive on how both of them work. It will also
highlight some of the important features technologies like containerization or docker
offer and how it increases the productivity and efficiency of software development.

Comparison between the resource usage of both these methodologies will be discussed
with in detail findings and different outcomes depending on the workloads which will
basically provide major talking points on which method should be preferred. Along with
this the methodology of how the comparisons were implemented and carried out and
the challenges that were faced in implementing them.

At the end of the paper we will put the analysis into perspective and use it to come to a
conclusion on what information we can gather from the research paper and how the
paper serves as a reference point for someone looking to make a decision regarding
their own implementation.

Software development plays a major part in our lives today as the speed and efficiency
of development of software defined how fast we move forward as a society as our lives
our so centred around software these days. Remembering the history of software
development we see that in the early days it was more about having an idea and getting
the product out to market but as the tech sector has moved forward we see that the idea
or the software is just as important as the execution and the processes followed in the
development. Such requirements have forced us to put time and effort into optimizing
these processes. One such effort has lead to the concept of containerization, before this
existed one of the major issues was the compatibility of software across platforms or
machines. For example, one developer would make changes and push them only to
later find out that because of a small incompatibility in dependencies the whole software
would fail to run on a fellow developer’s machine leading to delays. Containerization or
in this case docker enables us to not worry about these issues. Code can easily be
packaged into a container image and ran on any other machine without having to worry
about compatibility issues as long as docker is installed on the other machine. This is
one of the many ease of life functionalities that docker enables along with being able to
easily run multiple applications or an instance of an application on a single machine.

Running an application locally has drawbacks such as it can be hard to run multiple
applications on a single machine and resource sharing can be a big hassle. This was
the method that was used for a long time until some better options came along but it still
holds a few advantages one of which is that there is no resource overhead when
running applications locally like having a third party software such as docker consuming
valuable machine resources. Through this paper we will analyze if the resource
overhead for running such software is worth the trade-off for having the advantages that
something like docker provides.

Methodology:
The methodology that was used to make the assumptions and carry out the analysis in
this paper was a two part process. I decided on these separate approaches because it
would give us a better idea of two different situations or point of views which would give
us a clearer picture regarding the accuracy of our analysis.

First approach that I used was for us both a situation that I would like to have a point of
view on and also a proof of concept in the sense that I could carry out the approach with
less time invested on building tools and resources and use the tools I already had at our
disposal to confirm that our ideas were in the right place. The testing was done on some
common and off-the-shelf softwares and container images to see how they would
behave under different circumstances of the use-cases.
Once that was carried out I moved to our second approach which was to build around
the requirements to get more precise and accurate data points which would help us
confirm the analysis I gained from the first approach but with more numbers and in turn
with more confidence. The tools I build in order to test this consisted of two major
things. I developed a small nodeJS application that would run natively. To test the
resource usage of this application I ran a small extension on the side called express that
would constantly monitor the application and provide the metrics in a dashboard. Once
this was setup I ran some tests and scenarios on the application and monitored the
resource usage. After getting the outputs from running the application natively I wrote a
Dockerfile for the application that would allow the application to run on Docker. This was
done so I had the same application running inside and outside of docker so that the
comparison was fair. However as part of the seminar I was instructed to build a
separate framework for monitoring docker containers and in this case before running
tests on the containerized application I build a framework that also ran in docker with
some well known monitoring tools. These included Prometheus, Grafana, CAdvisor,
Reports and Node Exporter. These tools were run using the convenient
docker-compose feature which is a file that allows you to run multiple docker containers
with the same configuration which is saved in the file.

In the file there is a multi-container setup which includes prometheus which is a


monitoring system with configuration and data directories mounted and the app is
accessible on the local system’s 9090 port; Node Exporter, which collects metrics from
the host system and exposes them on the port 9100; cAdvisor, which gathers resource
usage and performance metrics from running containers, accessible on port 8080;
Grafana is the visualization and dashboarding tool, for this tool I also included a
mechanism for persistent data storage, accessible on port 3000 and is dependent on
data collected by prometheus and report. All these tools are running on the same
docker network as they are integrated and talk to each other at all times.

Docker-Compose
version: '3'
services:
prometheus:
image: prom/prometheus:latest
user: root
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus_db:/var/lib/prometheus
- ./prometheus_db:/prometheus
- ./prometheus_db:/etc/prometheus
- ./alert.rules:/etc/prometheus/alert.rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.route-prefix=/'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
ports:
- '9090:9090'
networks:
- monitor-net
node-exporter:
image: prom/node-exporter
ports:
- '9100:9100'
cadvisor:
image: gcr.io/cadvisor/cadvisor
ports:
- '8080:8080'
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
grafana:
image: grafana/grafana
user: "0"
environment:
- GF_SECURITY_ADMIN_PASSWORD=grafana123
volumes:
- ./grafana_db:/var/lib/grafana
depends_on:
- prometheus
ports:
- '3000:3000'
reports:
image: skedler/reports:latest
container_name: reports
privileged: true
cap_add:
- SYS_ADMIN
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- reportdata:/var/lib/skedler
- ./reporting.yml:/opt/skedler/config/reporting.yml
ports:
- 3001:3001
volumes:
reportdata:
networks:
monitor-net:

NodeJS Sample Application


const express = require('express');
const multer = require('multer');

const app = express();


const upload = multer({ dest: 'uploads/' });

app.use(require('express-status-monitor')())
app.use(express.static('public'));

app.post('/upload', upload.single('file'), (req, res) => {


res.json({ message: 'File uploaded successfully' });
});

app.get('/download/:filename', (req, res) => {


const fileName = req.params.filename;
const file = `./uploads/${fileName}`;
res.download(file, fileName);
});
app.listen(3000, () => {
console.log('Server listening on port 3000');
});

Challenges:
Let’s approach challenges from two sides one of which is the challenges that you’re
bound to come across when deciding to move to containers instead of running your
applications locally. There is definitely a learning curve in running applications in
containers, being a new technology there are a lot of things that were not trivial 5 years
ago. Applications that are designed to be run in containers are developed very
differently than when they are not and there is yet more to learn once they are
developed like that. Learning how to write a Dockerfile that will allow you to create the
image that will eventually run as a container is another tricky step and something that
has to be learnt. So before declaring anything it should be kept in mind that
containerization comes not only with a higher resource cost, there is also a learning cost
to it that will also take up time.

The second part of this challenges section addresses what challenges I faced when
trying to run these tests and scenarios. In the initial part when running the tests on
common softwares we noticed that it was quite difficult to get an accurate measurement
of the workloads I was running as both natively on linux there are not a lot of built-in
tools that enable detailed monitoring of workloads and also in docker as the “docker
stats” command is limited in the output and details it provides. After that I faced issues
when developing the small NodeJS application that I would use to run our tests as
NodeJS was completely new to me but afterwards I was atleast fimiliar with the basics
of it and became a nice learning exercise. The final and the biggest challenge was
developing the framework to monitor applications running in docker, it required many
different tools running in sync which was challenging. I decided on going with a
docker-compose file for this because if I was to have them running in containers it would
be a perfect fit.

Results:
In this section I will put forward the results that were obtained from the testing I did. First
let’s see the initial findings of the resource usage of the docker runtime. According to
docker documentation it states that the disk capacity for docker completely depends on
the size and number of containers that need to be run on the docker runtime. For
smaller applications a couple of GBs would be enough but if you intend on running large
applications on docker then anywhere from 50-100GBs would serve well. The docker
runtime requires minimum RAM requirements of atleast 4GB however docker
recommends that a capacity of 16GB should be available for docker to be able to run
smoothly.

Initially we decided to test if there was a possibility to grant docker access to 100% of
the resources that were available to the machine, to do that we ran a test where we
stressed our CPU to see what were the usage numbers we were seeing. Once we knew
what the possibilities of a certain machine were we tested the same thing inside the
docker container to see the output. In both cases we saw that the docker container had
the ability to use 100% of the CPU of a machine if it was available which means there is
no upper limit to the CPU usage for docker containers.

We then moved to seeing what the performance impact of Docker were on the system.
The Disk usage of a docker container without its actual application depends on a lot of
factors out of which configuration settings, background processes are some of them.
The base image inherits the memory usage of the base image that also contributes to
the memory usage. Our findings were:

The memory usage of docker container without the application is a fraction of even
large applications. The container size itself is small but as the application runs and
accumulates data the size of the docker image grows with it and can even go into GBs if
the image is large enough.
Example:
For an nginx image of size 142MB the container size is 1.09kB
For an image of size 20.2GB the container size at start time is 51.4MB

The memory usage in the same case is also a similar output which is dependent on
many factors, when testing two applications of different sizes to look for Memory usage
of the containers we observe that the memory usage of the docker containers is lower
than running applications natively and not in docker containers. This is impressive and
shows why docker is such a powerful tool.
We can make an educated guess to how much of the memory usage is the container
itself without the application.
An nginx image uses memory of 6.7MiB and the redis image uses memory of 2.7MiB.
Considering the size of these images it is clear that the memory usage is not static and
varies from image to image. We can also safely assume that it less than one MiB. In
regular use such an overhead of memory usage should not effect the usage of the
application.

The barebones test of resource usage of postgres importing data are as follows
In docker:
Running natively:

Moving to tests that were made with a self developed application and with more details
we tested the application in the following scenarios:
● Resource usage at startup
● Resource usage while uploading a file
● Resource usage when downloading a file
Resource usage at startup:
Running Natively:
Docker:

We can see from these screenshots that when running natively, on startup the CPU
usage shortly spikes up to 35% before settling down at 1-2%. Whereas when running
the application in docker we see that it spikes up to 14% but for a longer duration and
slowly stabilizes.

Talking about memory usage we see that the memory usage for the application running
natively is approximately 60MBs whereas for the application running in Docker it is
15MBs
Resource usage while uploading a file:
Running natively:
Docker:

When uploading an approximately 150MB file to the application we see the above
resource usage. When running natively the CPU usage spikes up to 15% whereas
doing the same thing when running in docker it goes up to 7% but again for a longer
duration. When speaking about RAM we see that the natively running app with the file
uploaded to it uses 74MBs and the application in docker consumes around 170MBs.
This is the first time we see that the application running in docker consumes more RAM
resources than its natively running counterpart.
Resource usage when downloading a file
Running Natively:
Docker:

When downloading the same file that was uploaded in the previous step we see the
resource usage spike to 50% and the docker application spikes up to 35% keeping up
with the previous trend. RAM usage is also similar to the previous tests where it
increases a little bit in the natively running app to 85MBs and for the Docker app it also
increases up to around 190-200MBs.

Analysis:
Looking at the results that we’ve gathered and mentioned in the previous section we
can make some analysis based on that. The analysis looks to follow clear patterns
which means that it is easier for us to make conclusions than it would have been if the
patterns were not clear. First of all we see that the CPU usage of the application in
docker is generally less than that of the natively running app, this could be due to the
container image being lightweight and also might be caused by docker being able to
dynamically allocate resources.

However when running workloads it is seen that the spike in resource usage that is
observed is usually for a longer timeframe which means that docker prefers to regulate
resource usage and prefers less resource usage even if takes a longer time to run the
workload. It could partially also be down to higher latency when running in docker but it
is not substantial atleast in our case.

In terms of RAM usage it is observed that similar to CPU usage on idle and startup the
RAM usage is less when running application in Docker but eventually after running the
application for a while and after some workloads have been computed the RAM usage
goes up substantially and also overtakes the usage of the application in running
natively. This is also not a small amount of resource overhead in terms of RAM as for
workloads where RAM usage was 80MB natively we saw docker took up a 100 extra
MBs at 180-190MBs. This is a major outtake from this research and one of the major
factors of the decision of whether to use Docker or to stick to native applications.

Conclusion:
After discussing all the related topics in detail and also getting an insight into the
findings of the research we can now move to drawing a conclusion from it. As we
discussed multiple times in this paper that there are always draw-backs to any approach
and any new technology. Docker is not the perfect solution, far from it but what it does
give us is flexibility. The negatives that docker has, one of them being the cost of higher
resource usage do infact exist one way or another but on the other hand it comes with
many advantages that far outway it’s disadvantages and that is why we have seen such
a massive rush of developers and companies moving to containerization in their
everyday processes. This is living proof that people are prepared to pay a higher price
in terms of infrastructure costs(when it comes to big operations) for the flexibility and
portability that a technology such as containerization provides.
References:
Docker Documentation

Grafana Documentation

Prometheus monitoring with Docker-compose

Running NodeJS applications on Docker

CAdvisor for monitoring containers

Moving native applications to Docker

When to use docker vs when to avoid it

Docker containers vs Native applications

Monitoring setups in docker

You might also like