Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 2

https://github.

com/danielfm/prometheus-for-developers
Production metrics with Prometheus
Prometheus is an open source monitoring and time-series database (TSDB)
Prometheus Server
which is the component responsible for periodically collecting and
storing metrics from various targets (e.g. the services you want to collect metrics
from).
Prometheus also provides a basic Web UI for running queries on the stored
data, as well as integrations with popular visualization tools, such as Grafana.
Push vs Pull
Metrics Endpoint
By default, Prometheus gets metrics via the /metrics endpoint in each target
Prometheus provides a facility for defining alerting rules that, when
triggered, will notify Alertmanager, which is the component that takes care of
deduplicating, grouping, and routing them to the correct receiver integration
Configuring Alertmanager to send metrics to PagerDuty, or Slack
Instrumenting Your Applications
Measuring Request Durations
We can measure request durations with percentiles or averages.
Measuring Throughput
If you are using a histogram to measure request duration, you can
use the <basename>_count timeseries to measure throughput without having to
introduce another metric.
Measuring Memory/CPU Usage
Measuring SLOs and Error Budgets
SLOs, or Service Level Objectives, is one of the main tools
employed by Site Reliability Engineers (SREs) for making data-driven decisions
about reliability.
SLOs are based on SLIs, or Service Level Indicators, which are
the key metrics that define how well (or how poorly) a given service is operating.
Common SLIs would be the number of failed requests, the number of
requests slower than some threshold, etc.
Availability
The proportion of successful requests; any HTTP status
other than 500-599 is considered successful
Latency
The proportion of requests with duration less than or equal
to 100ms
The difference between 100% and the SLO is what we call the Error
Budget.
The error budget for 95% SLOs is 5%;
if the application receives 1,000 requests during the SLO window
it means that 50 requests can fail and we'll still meet our SLO.
Monitoring Applications Without a Metrics Endpoint
Prometheus needs all applications to expose a /metrics HTTP endpoint
for it to scrape metrics.
To monitor a MySQL instance, which does not provide a Prometheus
metrics endpoint we use exporters

https://www.replex.io/blog/kubernetes-in-production-the-ultimate-guide-to-
monitoring-resource-metrics-with-grafana
Setting up Grafana
Grafana is a part of the Prometheus operator project.
Install the Prometheus operator
helm install --name prom-operator stable/prometheus-operator --
namespace monitoring
This will install the Prometheus operator in the namespace monitoring.
You can see the Grafana instance running in this namespace using:
kubectl --namespace kube-system get pods

Creating Grafana Dashboard for Kubernetes Resource Metrics


To create a dashboard click on the Home button in the top right corner of the
Grafana home screen, select New Dashboard from the drop-down list and click on the
Graph icon.
This will create a new dashboard with a placeholder panel.
Create the Prometheus Data Source Variable
Before we update the panel, however, we first have to create the
Prometheus data source variable.
We can do this by clicking on the cog icon (settings) in the top right
corner of the Grafana dashboard and navigating to variables in the settings panel
to the left.
Click on Add variable and fill in the fields for name and type.
Grafana Kubernetes Dashboard Layout
We are basing the Dashboard layout on Kubernetes abstractions including
pods, nodes, namespaces and clusters.
The dashboard will have separate sections for each abstraction with
individual usage, request and utilization metrics.
To create a new section click on Add Panel 34in the top right
corner of dashboard and then click on Row.
Edit the Row title by clicking on the cog icon next to it and
rename it Pod.
Do the same for Node, Namespace and Cluster.
CPU Template Panel
Axes Tab
Legend Tab
Display Tab
Memory Template Panel
Axes Tab
Legend Tab
Display Tab
Adding Pod Level Resource Metrics to the Grafana Kubernetes Dashboard
Pod level CPU Usage
sum(rate(container_cpu_usage_seconds_total{container_name!
="POD",pod_name!=""}[5m])) by (pod_name)
CPU Requests
sum(kube_pod_container_resource_requests_cpu_cores) by (pod)
Memory Usage
sum(container_memory_usage_bytes{container_name!="POD",container_name!
=""}) by (pod_name)
Memory Requests
sum(kube_pod_container_resource_requests_memory_bytes) by (pod)
Adding Node Level Resource Metrics to the Grafana Kubernetes Dashboard
Adding Namespace Level Resource Metrics to the Grafana Kubernetes Dashboard
Adding Cluster Level Resource Metrics to the Grafana Kubernetes Dashboard

You might also like