Full Ebook of Cloud Observability in Action Meap V10 Michael Hausenblas Online PDF All Chapter

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Cloud Observability in Action (MEAP

v10) Michael Hausenblas


Visit to download the full and correct content document:
https://ebookmeta.com/product/cloud-observability-in-action-meap-v10-michael-haus
enblas/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Cloud Observability in Action MEAP V06 Michael Mh9


Hausenblas

https://ebookmeta.com/product/cloud-observability-in-action-
meap-v06-michael-mh9-hausenblas/

Bayesian Optimization in Action (MEAP V10) Quan Nguyen

https://ebookmeta.com/product/bayesian-optimization-in-action-
meap-v10-quan-nguyen/

Learning Modern Linux: A Handbook for the Cloud Native


Practitioner 1st Edition Michael Hausenblas

https://ebookmeta.com/product/learning-modern-linux-a-handbook-
for-the-cloud-native-practitioner-1st-edition-michael-hausenblas/

Akka in Action, Second Edition (MEAP V10) Francisco


Lopez-Sancho Abraham

https://ebookmeta.com/product/akka-in-action-second-edition-
meap-v10-francisco-lopez-sancho-abraham/
Cloud Native Spring in Action 1st Edition Thomas Vitale

https://ebookmeta.com/product/cloud-native-spring-in-action-1st-
edition-thomas-vitale/

DuckDB in Action MEAP V03 Mark Needham Michael Hunger


Michael Simons

https://ebookmeta.com/product/duckdb-in-action-meap-v03-mark-
needham-michael-hunger-michael-simons/

Hacking Kubernetes Threat Driven Analysis and Defense


1st Edition Andrew Martin Michael Hausenblas

https://ebookmeta.com/product/hacking-kubernetes-threat-driven-
analysis-and-defense-1st-edition-andrew-martin-michael-
hausenblas/

Microservice APIs: With examples in Python (MEAP v10) 1


/ MEAP v10 Edition José Haro Peralta

https://ebookmeta.com/product/microservice-apis-with-examples-in-
python-meap-v10-1-meap-v10-edition-jose-haro-peralta/

Kubernetes Security and Observability: A Holistic


Approach to Securing Containers and Cloud Native
Applications 1st Edition Brendan Creane

https://ebookmeta.com/product/kubernetes-security-and-
observability-a-holistic-approach-to-securing-containers-and-
cloud-native-applications-1st-edition-brendan-creane/
MEAP Edition
Manning Early Access Program
Cloud Observability in Action
Version 10

Copyright 2023 Manning Publications

For more information on this and other Manning titles go to


manning.com

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
welcome
Thank you kindly for purchasing the Cloud Observability in Action MEAP! I’d love to hear from you what you like and
find useful but also what is missing and things you find not clearly explained or hard to understand. I am writing
this book to capture and share my 20+ years of experience in data engineering and (since 2015) with cloud native
systems. I assume you bring some background in cloud native systems (Kubernetes, Lambda, etc.) and are
wondering how to go about observability.

Observability is the capability to continuously generate and discover actionable insights based on signals from the
(cloud native) system under observation, with the goal to influence the system. The way we approach the topic is a
Return-on-Investment driven one: we look at costs and benefits, from the sources to telemetry including agents to
the signal destinations (back-ends such as time series datastores as well as frontends like Grafana).

Throughout the book, I’m using open source tooling including but not limited to OpenTelemetry (collector),
Prometheus, Loki, Jaeger, and Grafana to demonstrate the different concepts and enable you to experiment with
them, without any costs, other than your time.

I’m super excited that you decided we will walk this journey together for a while and looking forward to hearing
from you, either via the forum or Twitter (@mhausenblas, my DMs are open, no need to follow me if you want to
have a convo).

Again, thanks a bunch for your interest in Cloud Observability in Action and for purchasing the MEAP!

If you have any questions, comments, or suggestions, please share them in Manning’s liveBook Discussion Forum.

Cheers,
— Michael “mh9” Hausenblas

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
brief contents
1 End-to-end Observability Example
2 Signal Types
3 Sources
4 Agents & Instrumentation
5 Back-end Destinations
6 Front-end Destinations
7 Cloud Operations
8 Distributed Tracing
9 Developer Observability
10 Service Level Objectives
11 Signal Correlation
APPENDIX
Appendix

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
1

End-to-end Observability Example

This chapter covers


1
What we mean by observability
Why observability matters
An end to end example of observability
Challenges of cloud native systems and how observability can help

In cloud native environments, such as public cloud offerings like AWS or on-premises
infrastructure, for example a Kubernetes cluster, you typically deal with many moving parts.
This ranges from the infrastructure layer including compute (such as VMs or containers) and
databases to the application code that you own.

Depending on your role and the environment you may be responsible for any number of the
pieces in the puzzle. Let’s have a look at a concrete example: consider a serverless Kubernetes
environment in a cloud provider. In this case both the Kubernetes control plane as well as the
data plane (the worker nodes) are managed for you, which means you can focus on your
application code, in terms of operations.

No matter what part you’re responsible for, you want to know what’s going on so that you can
react to and ideally even proactively manage situations such as a sudden usage spike (because the
marketing department launched a 25% off campaign without tell you) or due to a third party
integration failing and impacting your application. The scope of components you own or can
directly influence determine what you should be focusing on in terms of observability.

The bottom line is: you don’t want to fly blind. What exactly this means, in the context of cloud
native systems, is what we will explore in this chapter in a hands-on manner. While it’s
important to see things in action, as we progress we will also try to capture the gist of the
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
2

concepts via more formal means, including definitions.

I assume that you are familiar with cloud native environments. This includes cloud provider
services (I’m using AWS to demonstrate the ideas here), container technologies including
Docker and Kubernetes, as well as Function-as-a-Service (FaaS) offerings, specially AWS
Lambda. In case you want to read up, here are some suggestions:

Kubernetes in Action by Marko Lukša (Manning)


AWS Lambda in Action by Danilo Poccia (Manning)

Further, I recommend Software Telemetry by Jamie Riedesel (Manning) which is complimentary


to this book and provides useful deep dives into certain observability aspects we won’t dive into
detail in this book.

In the context of this book we focus on cloud native environments. We mainly use open source
observability tooling so that you can try out everything without license costs. However, it is
important that while we use OSS tooling to show the concepts in action, they are universally
applicable. That is, in a professional environment you should always consider offloading parts or
all of the tooling to the managed offerings your cloud provider of choice has or equally the
offerings of observability vendors such as Datadog, Splunk, or Dynatrace.

Before we get into cloud native environments and what observability means there let’s step back
a bit and look at it from a conceptual level.

1.1 What is Observability?


What is observability and why should you care? When we say observability we mean trying to
understand the internal system state via measuring data available to the outside. Typically, we do
this to be able to act upon it.

Before we get to a more formal definition of observability, let’s review a few core concepts we
will be using throughout the book:

System
Short for System Under Observation (SUO). This is the cloud native platform (and
applications running on it) you care about and are responsible for.
Signals
Information observable from the outside of a system. There are different signal types
(most common ones are logs, metrics, and traces) and they originate at sources.
Sources
Are part of the infrastructure and application layer such as a microservice, a device, or
the operating system. They typically have to be instrumented to emit signals.
Agents
Responsible for signal collection and routing.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
3

Destinations
Where you consume signals, for different reasons and use cases: includes visualizations
(dashboards for example), alerting, long term storage (for regulatory purposes), analytics
(finding new usages for an app).
Telemetry
The process of collecting signals from sources, routing/pre-processing via agents, and
ingestion to destinations.

The Figure 1.1 provides you with a visual depiction of observability. The motivation is to gather
signals from a SuO (represented by a collection of sources) via agents to destinations for
consumption by either a human or an app with the goal to change the SuO.

Figure 1.1 Observability overview

Observability in this sense represents, in essence, a feedback loop. A human user might, for
example, restart a service based on the gathered information. In the case of an app, this could be
a cluster autoscaler that adds worker nodes based on the system utilization measured.

The important aspect of observability is actionable insights. Simply providing an error message
in a log line or having a dashboard with fancy graphics is not sufficient.

SIDEBAR A formal definition of observability


Observability is the capability to continuously generate and discover
actionable insights based on signals from the system under observation with
the goal to influence the system.

If you want to learn more about observability use cases, signals, and methods, I recommend to
peruse the Observability Whitepaper by the Cloud Native Computing Foundation’s Technical
Advisory Group for Observability, or more informally also called CNCF TAG o11y.

But how do you know what signals are relevant and how do you make the most out of them?
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
4

Let’s first define some terms and then look at different roles and how observability is relevant to
each.

Now, let’s move on to different roles, their goals, and tasks to understand how observability can
assist.

1.2 Roles and Goals


Building products nowadays is a team sport. Besides the case of a small tool, typically a variety
of different job functions or roles work together to realize a software-based offering. The
different roles involved in the software creation and operation process focus on different tasks,
along the application life cycle:

Developers. Developers write code, test and document it and make artifacts such as a
container image, a binary, or a script (for interpreted languages such as Python) available
for deployment. Observability can help to faster iterate (finding bugs via logs and traces),
optimize runtime behavior (identifying slow code paths via traces), and support
operations (via metrics and logs).
Release and test engineers. A release engineer can own CI and/or CD pipelines and needs
visibility into build, test, and deploy processes. In this context, some organizations have a
dedicated tester (or QA testing) role that has similar requirements as release engineers.
DevOps roles. Some teams have devops roles, partially responsible for code, often
owning operations (being on call), and establish and enforce good practices. Similar
considerations as with developers apply concerning observability, and in addition,
metrics are typically useful for either dashboards or alerts. A specific devops role is that
of an SRE (Site Reliability Engineers) who are coaching developers or supporting other
devops roles, helping them to meet certain operational goals (we will come back to that
in the last chapter on SLOs and SLIs).
Infrastructure and platform operations. In non-cloud native environments usually called
system administrators or sys admins. Own the platform (IaaS or containers) and provide
(self-service) access to the platform to developers and devops roles. They are usually
interested in low-level signals such as utilization of resources and cost-related ones.
Non-engineering roles. A product owner is often called a Product (or Project) Manager
and owns a feature, a service, or an entire application, depending on the scope and size.
These role usually focus on end-user facing indicators as well as on performance and
usability. Another non-technical role is that of a business stakeholder. They represent the
part of the organization that specifies the requirements and focuses usually on financial
indicators such as usage, revenue, or unit sold. For this role, technical indicators are
usually irrelevant.

The Table 1.1 shows example observability flows across the roles we discussed in this section.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
5

Table 1.1mExample flows per role


Role Example Flow
developer uses traces to figure out which microservice along the request path is slow
developer uses continuous profiling to speed up the hot path of a microservice
devops receives alert about a component being down, then uses metrics plotted in a dashboard to
understand alert in context
devops uses metrics to narrow down components causing high latency, then jumps to traces to drill into
a specific request path, then looks at logs of one microservice serving on that request path for
details (correlation)
SRE tracks goals and their fulfilment, advising developers, via dashboards
platform operator uses a cluster-level dashboard to track autoscaling over time
release engineer uses a dashboard to monitor build progress of an microservice artifact (such as a container
image), for example to prevent performance regressions
product manager uses a CD dashboard to monitor deployment progress of microservices
business stakeholder uses a dashboard to track weekly revenue and SLA violations

Let’s now have a look at a concrete example of a cloud native system, the signal sources, and
telemetry.

1.3 Example Microservices App


We will be using Weaveworks' Sock Shop microservices application to explore observability. In
the Figure 1.2 you see its overall architecture. The Sock Shop example app is a simple demo app
built using different languages and frameworks (Spring Boot, Go kit and Node.js) and is
packaged using OCI compliant container images. This makes it widely usable, from container
orchestrators such as Kubernetes to Docker Compose to ECS but also in FaaS such as AWS
Lambda. We will show the example app running in Kubernetes.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
6

Figure 1.2 The example microservices app we use for observability exploration

To set up the example microservices app, you need a Kubernetes cluster then follow the steps in
Listing 1.1.

Listing 1.1 Local observability


$ git clone https://github.com/microservices-demo/microservices-demo.git &&
\ cd microservices-demo/deploy/kubernetes/

$ kubectl create namespace sock-shop

$ kubectl apply -f complete-demo.yaml

$ kubectl -n sock-shop port-forward svc/front-end 8888:80

Clone the example app repo and change into the directory where the Kubernetes
deployment YAML docs are located.
Create a Kubernetes namespace that acts as a logical home for our app.
Create all necessary Kubernetes resources (deployments, services, etc.) that
together make up the app. If you want to keep an eye on it, waiting until all
resources are ready, you can (in a different terminal session) use the watch
"kubectl -n sock-shop get deploy,pod,svc" command.

Once all pods are ready, you can make the frontend accessible on your local
machine using the port-forward command.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
7

If you successfully set up the Sock Shop app you can access the entry point (the front-end
service) in your browser: head over to http://localhost:8888 and you should see something
akin to what is depicted in Figure 1.3.

Figure 1.3 Screen shot of the Sock Shop example microservices app front-end

Next, you want to generate some traffic, so log in (right upper corner) with the username user

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
8

and the password password and add some items to your shopping cart.

Now, what about observability? Let’s have a look at a concrete example, end-to-end.

The Sock Shop example app emits various signals. Let’s take a closer look at two signal types,
logs and metrics, to explore what exactly is emitted and what would be necessary to introduce
observability for a given role.

In terms of the logs signal type, Sock Shop services emit a variety of logs. One example, looking
at the orders service is as follows (output edited to fit):

$ kubectl logs deploy/orders


2021-10-25 14:33:10.649 INFO [bootstrap,,,] 7 --- [ main] s.c.a.
AnnotationConfigApplicationContext : Refreshing org.springframework.context.
annotation.AnnotationConfigApplicationContext@51521cc1: startup date
[Mon Oct 25 14:33:10 GMT 2021]; root of context hierarchy
2021-10-25 14:33:16.838 INFO [bootstrap,,,] 7 --- [ main] tratio
Delegate$BeanPostProcessorChecker : Bean 'configurationPropertiesRebinderAu
toConfiguration' of type [class org.springframework.cloud.autoconfigure.Con
figurationPropertiesRebinderAutoConfiguration$$EnhancerBySpringCGLIB$$b894f
39] is not eligible for getting processed by all BeanPostProcessors (for ex
ample: not eligible for auto-proxying)
...

In this case:

The source is the orders service.


The signal types is logs.

NOTE Everyone uses logs (in contrast to metrics or even traces) yet there is way less
standardization and interoperability going on in this signal type compared to
others. While OpenTelemetry is at time of writing stable (GA) for traces and
almost stable for metrics, we’re still working in the community to achieve
similar level of coverage for logs.

To make the logs actionable, for example, for a developer to fix a bug:

We would need an agent (like Fluent Bit) to route the logs to a destination.
We would need a destination such as OpenSearch or CloudWatch.

The services also expose metrics in the Prometheus exposition format, for example, looking at
the front-end service that we forwarded earlier on to the local machine (output edited to fit):

$ curl localhost:8888/metrics
# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 156.37216999999978
...

In this case:

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
9

The source is the front-end service.


The signal types is metrics.

NOTE When talking with customers I notice that many are facing the same
challenge: feeling the need to collect and retain all metrics, because "we
might need it". Try to prune as much as possible on ingest and check if
metrics that land in destinations are actually used. Questions you can ask
yourself:

Is that metric used in an alert? In a dashboard?


Can you think of a use case where you could derive insights from the
metric? Maybe an aggregate is sufficient?
Would a trace be a better choice, here?

To make the metrics actionable, for example, for an SRE to assess indicators:

We would need an agent (such as Prometheus or OpenTelemetry collector) to scrape the


metrics.
We would need a destination such as Prometheus or for long-term storage and federation
Cortex or Thanos.

For now, we will not consider traces since they require a little bit more setup (both from the
telemetry side and the destination).

We’ve had an initial exposure to logs and metrics now. Before we go deeper into cloud
observability throughout the rest of this book, let’s finish this chapter by looking at the
challenges you will need to keep in mind when dealing with cloud native systems and how
observability can help to address them.

1.4 Challenges and How Observability Helps


Cloud native environments and the apps running on them have a number of challenges. In the
following we discuss the most pressing ones and see how observability can help.

In general (ignoring the details of the packaging and scheduling, such as a Kubernetes pod vs a
Lambda function) cloud native systems have the following characteristics:

They are distributed in nature. Most of the communication is not in-process and maybe
not even on the same machine (for example using IPC mechanisms) but takes place via
the network.
Due to the distributed nature, not only the What but also the Where is crucial. Think of a
number of microservices along the request path serving a specific user. You want to be
able to understand end-to-end what’s going on and also be able to drill down into each of
the services.
The volatility of the services is higher compared to a monolith running on, say, a VM.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
10

That is, pods (and with it their IP addresses) come and go, new versions of a Lambda
function might be pushed out every hour, etc.

Let’s now look at how observability can help addressing the challenges found in and with cloud
native systems. With cloud native systems, you don’t want to fly blind: having an automated way
to collect all relevant signals and using them as input for decisions, that is, actionable insight, is
the ideal.

Return on Investment: In order for you to know the right level of investment (time, money)
into observability you need to know about the costs first:

Instrumentation costs (developers time, code, etc.) which is an ongoing type.


Signal retention (for example, log storage) which can be one-off but usually falls under
the pay-as-you-go model.
Overhead of agents and instrumentation which manifests in compute and memory
resource usage on top of what the service itself uses. This should be in the low
single-digit range of CPU and or RAM utilization. For example, the AWS
OpenTelemetry Collector provides a report on the footprint across different signal types.

The upside (gain) of the investment into observability is harder to calculate compared to the
costs. It mainly depends on the role (1.2). For example, you can measure the Mean Time To
Recovery (MTTR) for a given service before and after instrumentation. But there are also many
human-related factors (such as on-call stress or WLB) that count towards the gain.

Signal Correlation: Given that different signal types are usually good for different tasks and not
always every signal type is captured and/or available, you want to be able to correlate them (
Figure 1.4) to get a more complete picture.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
11

Figure 1.4 Signal correlation

The following example might help you understand what we mean when we say correlating
signals:

1. You’re on-call, get paged, and fire up your laptop. You look at a Grafana dashboard that
was linked from the Prometheus alert you received.
2. From the dashboard you can gather that the utilization spiked of certain services spiked a
couple of times in the past hours. You have a hunch that it may be related to the recent
upgrade of a service.
3. You use the trace ID from an exemplar (embedded in the Prometheus metrics) as an entry
point to have a closer look at the involved services, using the distributed tracing tool
Jaeger.
4. In Jaeger you see the duration distribution and can drill into the logs of the services to
verify your hypothesis.

In above scenario you manually correlated the different signal types (metric → traces → logs)
and were able to confirm your hypothesis quickly and effectively. A simple form of correlation is
a time-based one: something you get in many front-ends, such as Grafana, without additional
effort out of the box.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
12

Portability: Avoid lock-in by choosing open standards and open source. This means that you
can use the same mechanisms and tooling in different environments (cloud providers or
on-premises infrastructure) without having to change much of your setup. For example, if you
instrument your applications using OpenTelemetry, then you can switch from on destination to
another without having to change your application or your agents.

1.5 Summary
Defined telemetry (sources, agents, destinations) and observability (getting actionable
insights to change the system under observation).
Made clear why it observability matters in the context of cloud native systems.
Examined different roles (developers, devops, non-technical ones), their typical tasks and
needs in terms of signal types.
Had a look at a concrete example setup of a cloud native app (a containerized
microservices app in Kubernetes), its signals, and what observability could mean in this
context.
Reviewed the challenges of cloud native systems and how observability can help address
them.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
13

This chapter covers


2
Signal Types

What different signal types are


When and how to use a certain signal type
The costs and benefits of each signal type
How to figure out what signals are relevant
Determine signal baselines

In the context of observability, signals play a central role: this is the intel we’re basing our
decisions on.

As discussed in 1.1, the System under Observation (SuO)–for example, a Kubernetes application
or a serverless app based on Lambda functions–emits signals from various sources. You as a
human, or equally a piece of software, then uses these signals to make decisions. There is a
one-to-many relation between SuO and sources. Your SuO typically involves compute, storage,
and network and each of these represents a signal source. We will cover the sources in detail in
Chapter 3.

In terms of signals (and sources) we can differentiate between two cases:

1. Sources, such as code that you own and you can instrument yourself (that is, decide
where and what kind of signals to emit).
2. Sources you don’t own, meaning any signals will be pre-defined. Your task is limited to
selecting which signals to consume, in that case. This might be due to:
Code that is a dependency of your code (library, package, module).
Due to the fact that the component’s source code is not available (for example, a binary
file).
The component only offers a (networked) API. Any managed service from your cloud
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
14

provider of choice falls into this category.

No matter if you own the code and can decide what signal types to emit or if you are working
with a component you don’t own, at some point in time you want to collect the signals and send
them to a central place for processing and storage. This is the telemetry part.

In this chapter we will review the different signal types most often used, how to instrument and
collect each, and discuss the costs and benefits of doing that. With observability you want to take
an Return-On-Investment (ROI) driven approach. In other words, you need to understand the
costs of each signal type and what it enables you to do.

2.1 Reference Example


There are different signals one could take into account from a SuO, however, in practice the
three most commonly used are logs, metrics, and traces. The Figure 2.1 shows these common
signal types in action, visually represented in Grafana, a popular open source front-end app that
supports a range of data sources and visualization. The signals are automatically time-correlated,
that is, a selection of a time range is applied to all signals of the dashboard in Grafana.

Figure 2.1 Visualizing different signal types in a Grafana dashboard

1. Example log lines for a selected service.


2. Example metrics (service invocations group by HTTP response codes) for a service.
3. Example traces for a service (showing response times, HTTP method, client info, etc.).

In this chapter we’re using a running reference example, a simple Go program exposing an
HTTP interface (Listing 2.1) to see the three common signal types in action. As we progress, we
instrument our reference example with different signals with the goal to consume and interact
with the signals similar to what is depicted in Figure 2.1.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
15

Listing 2.1 Example Go program, to be instrumented


package main

import (
"fmt"
"log"
"math/rand"
"net/http"
"time"
)

func handleEcho(w http.ResponseWriter, r *http.Request) {


message := r.URL.Query().Get("message")
if rand.Intn(100) > 90 {
w.WriteHeader(http.StatusInternalServerError)
return
}
fmt.Fprintf(w, message)
}

func main() {
rand.Seed(time.Now().UnixNano())
http.HandleFunc("/echo", handleEcho)
log.Fatal(http.ListenAndServe(":4242", nil))
}

Register handler for the /echo path.


Launch HTTP server that will serve on port 4242 on all available network
interfaces.
Retrieve the payload to echo from the query parameter message. The result is that
we can interact with the service by issuing an HTTP GET to /echo?message.
Simulate a 10% error case.
Return the exact payload we retrieved in 90% of the cases.

To build the echo service, do as shown in Listing 2.2.

Listing 2.2 Building our example Go program


$ go build -o echosvc .

Builds the executable which is then available in the current directory as ./echosvc
.

In order to invoke the echo service via its HTTP API, you would do the following:

$ curl localhost:4242/echo?message=test
test

By default I will use Grafana as the destination since it allows to visualize and interact with a
wide array of data sources in a uniform manner. This is just so that you can reproduce everything
for free, based on open source.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
16

TIP In order to see the signals, going forward, in Grafana you will need to put
some load onto our echo service. You could either directly do something like
the following from the shell (calling the endpoint every second):

while true ; curl -s -o /dev/null localhost:4242/echo ; sleep 1 ; end

… or, alternatively, something more fancy: consider using a full-blown testing


tool like salvo or even k6.

Concerning the consumption of the signals (destination back-ends and front-ends, such as
Grafana): we will cover the "buy vs. build" discussion in later chapters where we talk about
long-term storage options, dashboarding, and alerts.

Now that we’ve laid out our reference example let’s first talk about a way to assess the costs of
instrumentation, before we move to the actual instrument itself.

2.2 Assessing Instrumentation Costs


To assess the costs of instrumentation per signal type, we will use a simple measure, the
"business-logic to instrumentation" (B2I) ratio:

SIDEBAR The B2I ratio


To calculate the B2I ratio, determine the number of lines of code (LOC) before
adding an instrumentation (adding code for emitting signals for a signal type)
and then determine the LOC after the instrumentation. The B2I ratio is then:

B2I = LOC_AFTER_INSTRUMENTATION/LOC_BEFORE_INSTRUMENTATION

In an ideal world the B2I ratio would be 1, representing zero instrumentation


costs in the code. However, in reality, the more LOC you dedicate to
instrumentation, the higher the B2I ratio is. For example, if your code has
3800 LOC and you added 400 LOC for instrumentation (say, to emit logs and
metrics), then you’d end up with a B2I ratio of 1.105, from (3800 + 400) /
3800.

Two things worth mentioning concerning B2I:

It’s up to you if you want to consider telemetry libraries such as an OpenTelemetry SDK
in your B2I ratio calculation. From a pragmatic point of view, focusing on your own code
(excluding these dependencies) might make most sense. This is also what I’m going to do
in the remainder of the chapter.
The B2I ratio is a simple measure and by no means on its own able to serve as an
indicator how well your code is instrumented or tell you if your code is
"over-instrumented". Goodhart’s law1 applies, or, in other words: use the B2I ratio as a
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
17

rough guidance in a regression validation rather than an absolute north star.

With the theory and overview out of the way, let’s now review the three basic signal types (logs,
metrics, and traces). In the remainder of the chapter we will, for each signal type:

1. Define the signal characteristics and semantics.


2. Discuss how to instrument our example echo service.
3. Set up the telemetry to collect and route the signal.
4. Analyze costs (in terms of resource usage and efforts) and benefits of the signal.
5. Explain how the respective signal can contribute to an observability solution.

2.3 Logs
Let’s begin by focusing on logs. Logs are signals with a textual payload that capture an event.
An event is an umbrella term for anything of interest that happens in the code, be it on the happy
path ("completed upload") to unexpected or errorneous behavior ("failed to store items in
shopping basket").

Logs are timestamped and should be structured, ideally including some context (for example
labels) that can be used for correlation with other signals. Usually, humans consume logs, either
directly by searching for an event or indirectly, via correlation.

NOTE There are many ways to refer to a single log event, from log line (more
traditional) to log item to log entry and possibly others I have not come
across, yet. To keep things simple, we will consistently use event here, but
these terms are really, by and large, interchangeable.

In the context of logs we usually differentiate between levels which can represent the nature of a
log event. Different programming languages, libraries, and frameworks have over time
established conventions, however there are some common semantics as follows:

Fine-grained logs events that are relevant for developing or fixing a bug (DEBUG).
Production-type log events:
On the "happy path", signifying expected behavior, usually INFO.
Unexpected behavior such as WARNING or ERROR.

There are a number of log standards and formats available, from line-oriented ones, such as
Syslog (as per RFC 5424) or the Apache Common Log Format (CLF), to JSON log formats
(most modern libraries support this) to binary ones, for example, the Journal File Format found
in the context of systemd. To learn more, check out Graylog’s Logs Format Guide.

Some good practices in the context of logs are:

Consider what exactly to log with the goal to provide actionable insights (context is
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
18

king).
Log to stdout rather than to a fixed location such as a file. This enables log agents to
collect and route your log events. See also the respective entry in the twelve-factor
methodology for some background on this.
Use labels to enhance the value of a log event. For example, you can use the component
name or type, the kind of source, or the type of operation as labels to aid filtering for
events.

TIP If you’re wondering how you can "future-proof" your strategies around logs: in
2021 Ben Sigelman (co-founder of Lightstep, known for his work on
OpenTelemetry and Google Dapper as well as Monarch) published a Twitter
thread called Why Tracing Might Replace (Almost) All Logging where he
argues that, in cloud native systems, tracing will replace logging for the brunt
of the workload: instrumentation about the transactions themselves (as
opposed to logging about infrastructure or other resources).
This makes sense, at least for a large number of use cases for logs in a
transactional setup, and I want to encourage you to consider this kind of
thinking and apply to your environment, if you’re in a position to do so.
Naturally, a lot of environments are more on the brown-field than on the
green-field end of the spectrum, so it can be hard or take a lot of time and
energy to establish this tracing-centered (vs. logs-centered) approach.

2.3.1 Instrumentation
Let’s now see logs in action. For this, we instrument our echo service so that it emits log events;
the Listing 2.3 shows our echo service instrumented with logs.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
19

Listing 2.3 Example Go program, instrumented to emit structured logs


package main

import (
"fmt"
"math/rand"
"net/http"
"time"

log "github.com/sirupsen/logrus"
)

func handleEcho(w http.ResponseWriter, r *http.Request) {


message := r.URL.Query().Get("message")
log.WithFields(log.Fields{
"service": "echo",
}).Info("Got input ", message)
if rand.Intn(100) > 90 {
log.WithFields(log.Fields{
"service": "echo",
}).Error("Something really bad happened :(")
w.WriteHeader(500)
return
}
fmt.Fprintf(w, message)
}

func main() {
rand.Seed(time.Now().UnixNano())
log.SetFormatter(&log.JSONFormatter{})
http.HandleFunc("/echo", handleEcho)
log.Fatal(http.ListenAndServe(":4242", nil))
}

Configure the logger to emit log lines formatted in JSON. Also, note that the
logger we’re using here, logrus, defaults to output to stderr, something we have
to consider when we consume the logs, later.
No matter what, always emit an INFO log line here that confirms the input value we
got via the query parameter message. Note the "service": "echo" in the line
above (and again in the error case below): this provides a context for the event,
allowing you to pinpoint the service.
Emit an ERROR log line in error case.

To build the service you also need to define its dependencies via a file called go.mod (shown in
Listing 2.4).

Listing 2.4 Example Go program dependencies file go.mod


module co11yia/ch2/logs

go 1.16

require github.com/sirupsen/logrus v1.8.1

Now, to pull in the echo service dependencies, run the following command:

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
20

$ go mod vendor

Complete the build by executing go build as explained in Listing 2.2.

2.3.2 Telemetry
We are now in the position to execute the echo service locally and send the logs events it emits
during its execution to certain destinations such as to the screen and into a file:
$ ./echosvc 2>&1 | \
tee echo.log

Execute the echosvc and redirect its stderr output to stdout.


Use the tee command to show the log lines both on the terminal and also at the
same time append the log lines to a file called echo.log. This file is what we in
the next step are going to use as an input to ship to a destination.

Next, we will use some open source tools, provided by Grafana Labs, to route and collect the
logs: we capture the logs of our echo service with Promtail and ship them to Loki (as a data
source in Grafana) as a destination. Using Grafana as the front-end, you can then view and query
the logs as you desire.

To set up our log scenario, use the Docker Compose manifest file shown in Listing 2.5; it
launches everything needed, from the destination to the frontend.

Listing 2.5 Docker Compose manifest docker-compose.yaml to stand up


Promtail/Loki/Grafana
version: "3"
networks:
loki:
services:
loki:
image: grafana/loki:2.4.0
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- loki
promtail:
image: grafana/promtail:2.4.0
volumes:
- ./promtail-config.yaml:/etc/promtail/config.yml
- ./echo.log:/var/log/echo.log
command: -config.file=/etc/promtail/config.yml
networks:
- loki
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
networks:
- loki

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
21

Defines the location of the Promtail config file. We map the host local file
promtail-config.yaml into the container to /etc/promtail/config.yml
where Promtail expects to find it.
Maps the host local file echo.log (which our echo services logs to) into the
container to /var/log/echo.log. We will use that in Listing 2.6 to define the log
processing and routing pipeline.

TIP While using Docker Compose to develop locally or test apps is certainly
handy, for a production setup you want to use Kubernetes. For example, you
can install the Promtail/Loki/Grafana stack via Helm charts or using Tanka.

Now that we have set up the framework for launching the logs destination (Loki), the telemetry
agent (Promtail) and the front-end (Grafana) via our Docker Compose file ( Listing 2.5) we need
one more thing in place: we have to configure Promtail, our log router/ingester. The Docker
Compose manifest references (in line 15) this Promtail configuration file that we call
promtail-config.yaml (Listing 2.6)

Listing 2.6 Promtail configuration file promtail-config.yaml


server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /tmp/positions.yaml

clients:
- url: http://loki:3100/loki/api/v1/push

scrape_configs:
- job_name: app
static_configs:
- targets:
- localhost
labels:
job: echosvc
__path__: /var/log/echo.log
pipeline_stages:
- json:
expressions:
level: level
payload: msg
timestamp: time
service: service
- labels:
level:
service:

This defines the input for Promtail. Here we tell it to ingest logs from a file called
/var/log/echo.log (which we mounted from the host via a volume in line 16 of
the Docker Compose manifest).

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
22

Here comes the first part of the Promtail pipeline. It’s mapping the input JSON
fields that our echo service emits (such as service or level, the ride-hand side
here, the values) to labels we want to filter for in Loki. For example, service or
level (same names here but you can and often have to normalize keys and that’s a
good place to do this).
Here we define the labels that we want to index/expose.

Assuming that you stored the content of Listing 2.5 in a file called docker-compose.yaml it’s
straight-forward to bring up the stack using:

$ docker-compose -f docker-compose.yaml up
Creating network "logs_loki" with the default driver
Pulling loki (grafana/loki:2.4.0)...
2.4.0: Pulling from grafana/loki
...
Status: Downloaded newer image for grafana/promtail:2.4.0
Creating logs_loki_1 ... done
Creating logs_promtail_1 ... done
Creating logs_grafana_1 ... done
Attaching to logs_loki_1, logs_grafana_1, logs_promtail_1
...

Now you can head over to http://localhost:3000/ in your browser to log into Grafana. Use
the following defaults (note: skip the password change request after you entered the credentials):

username: admin
password: admin

Next, configure the Loki data source as shown in Figure 2.2 (note that the screen shot has been
edited so that it focuses on the relevant parts; in reality you would see many more options on this
screen). The one and only config you need to set in this step is the data source URL which is
http://loki:3100 (thanks to our Docker Compose setup).

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
23

Figure 2.2 Adding and configuring the Loki data source in Grafana

1. The data source URL (Loki HTTP API) that you have to set.
2. Verify if everything works with the "Save & test" button.

Now you can head over to the "Explore" tab (in the left-hand menu, it looks like a compass).
Select Loki as the data source and enter {job="echosvc"} into the top input area with the title
"Log labels". If you then select a certain time range you should see something akin to what is
shown in Figure 2.3.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
24

Figure 2.3 Exploring logs in Loki

1. Filter for logs by entering an expression, for example, show logs from our echosvc using
{service="echo"}.
2. The logs visually represented as a time series, grouped by log level.
3. Details of a log line of error level, click on it to view details.

In Figure 2.3, the Loki data source explore view, note two things:

It’s quick and simple to filter for certain log levels such as as the level="error" shown
here.
Scoping to a particular service of app is also straightforward to do: select the appropriate
service label value.

TIP To see what is running, use docker-compose top (in the directory that
contains the docker-compose.yaml file) or also docker ps (anywhere). Also,
to tear down everything, removing all containers again, use the
docker-compose -f docker-compose.yaml down command.

With this we’ve completed our practical logs scenario and we can move on to a discussion on the
values that logs bring to the table and what the costs (in terms of resources and efforts)
concerning logs look like.

2.3.3 Costs and Benefits


We now have a closer look at the costs and benefits of the logs signal type. In Table 2.1 I put
together an overview on the costs.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
25

Table 2.1mOverview on the costs of logs


overhead considerations
instrumentation small, simple to add a log line widely adopted and supported by
programming languages
agent some runtime, lots of options, few standards runtime footprint, ingestion performance,
for formats (Syslog, OTLP) reliability, security
destination long-term storage can be expensive data temperature (access recency or
frequency) can guide retention

INSTRUMENTATION COSTS OF LOGS


The costs of logs are relatively low for instrumentation, well understood and widely supported.
All programming languages and their libraries allow to emit logs. The readability and with it the
maintainability has not greatly suffered by adding the log lines.

Let’s have a look concrete look at the instrumentation costs from a developer’s point of view,
first. The original code in Listing 2.1 has 24 lines of code and the code instrumented with logs in
Listing 2.3 has 32 lines of code. Taking the two LOC numbers together we can calculate the B2I
ratio and arrive at a value of 1.33, which is not too bad.

TELEMETRY COSTS OF LOGS


There are some costs around collecting and routing logs to a central location. This is typically
necessary in a distributed system. Depending on the option you are using here, it may cause
ingestion (network traffic) costs.

When selecting a certain agent you want to ask yourself:

What is the runtime footprint (CPU and memory consumption)?


What is the ingestion performance, for example, how many log lines per second can the
agent handle?
In terms of reliability: is there a guaranteed delivery of every log line (if this is necessary
at all)?
Security considerations, from encryption on the wire (TLS) to multi-tenancy support.

TIP If you’re unsure about how to measure log agent performance, have a look at
what we did in Centralized Container Logging with Fluent Bit, where my AWS
colleague Wesley Pettit and I discuss a concrete case.

Examples of log routers are:

The Cloud Native Computing Foundation (CNCF) projects FluentD and Fluent Bit.
The CNCF project OpenTelemetry has an (experimental) collector.
Cloud provider-specific agents such as Amazon CloudWatch agent, Azure Monitor
agent, or Google Cloud operations suite agents.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
26

Proprietary agents such as Elastic’s Logstash and Filebeat, the Datadog Agent, or the
Splunk Forwarder.

To avoid too high costs in this phase, and also in the destinations, you may wish to prune logs
already here. Some pruning strategies you might want to consider are:

Sampling, that is, drop a certain percentage of the events, based on a heuristic.
Filtering based on type, for example, drop DEBUG level logs in prod.

There are commercial offerings such as Cribl that can help defining and applying pruning
strategies at scale, across providers.

DESTINATION COSTS OF LOGS


Once logs from different microservices arrive at a central location (the destination, such as
Splunk, CloudWatch, OpenSearch, or Datadog) the main cost type is that of storage. At scale,
logs can be very expensive, so usually you don’t want to keep everything around forever. You
want to decide the retention period of logs based on the data temperature:

1. Hot data is your working set, that is, logs you’re using in the current time frame to
troubleshoot or develop. It has to be readily accessible and fast to query. Usually that
spans from days to weeks.
2. Warm data is something you might need but you’re OK with if it takes some time to
access and or query. The retention period here might be months or even years.
3. Cold data is all the (log) data you don’t actively use. You may need, however, keep
certain logs lines around for regulatory reasons. You want to store that kind of log data in
cheap, reliable durable object storage such as Amazon S3 Glacier. It may take minutes or
hours to access the data but given that you seldomly, if at all, use it, that’s a trade-off you
can live with.

BENEFITS OF LOGS
Now that you have an idea about the costs of logs, let’s briefly talk about the benefits. First off,
logs are universally adopted and understood. Every developer knows about how to emit log lines
and there’s fantastic support for logs in every programming language, often in the standard
library or in the wider ecosystem (3rd party packages, modules, or libraries).

The second main benefit of logs is that, due to their textual nature, a human can usually interpret
the meaning. There is still a requirement for a developer to be able to write actionable log lines
(formatting, context, etc.). Beyond that, all you need is to find a log line and you can interpret it.
There are two approaches to finding relevant log lines:

Traditionally, every log line (or: log item) was full-text indexed. Systems like
Elasticsearch or OpenSearch do that as do many commercial offerings. This is great,
since you can search for everything, however, it comes with scalability and cost
challenges.
Recently, label-based indexing has become popular. Rather than indexing the entire text
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
27

of a log line, you would only index the labels (or: tags) that provide the context, such as
this is from service foo and it is about and error condition. The most popular example
of this approach is Grafana Loki, inspired by Prometheus.

With this we’ve reached the end of our logs cost & benefit discussion, so let’s move on to the
topic use cases for logs in the context of observability.

2.3.4 Observability with Logs


What can we do with logs and how is what we just saw in the hands-on part related to
observability?

As a developer, you can use logs:

For debugging an existing code base (that you either own the source code or to
reverse-engineer it).
To perform testing (such as soak tests, performance tests, or even chaos engineering) and
see where and how your microservices break apart.

Further, in a devops or SRE role, logs:

Help to identify the service and code responsible for a bug or outage with the goal to 1.
stop the bleeding (restart of take it offline), and 2. fix the issue in the code identified.
Provide context, for example, who made which change when (also: regulatory reasons or
security-related ones). The log4j vulnerability (CVE-2021-44228 and CVE-2021-45046)
we saw at the end of 2021 is a good example of a security-motivated use case.

Next up: metrics.

2.4 Metrics
Metrics are numerical signals, typically sampled at regular time intervals. The metrics usually
consist of a name, a value, and some metadata. The name identifies the metric (for example
temperature), the value is the observation we’re interested in (say 2.5 ) and the metadata
provides context (e.g. room_42 for location-related context). The unit of the observation can be
encoded in the name or provided as metadata. Last but not least, the metric is timestamped, either
explicitly by the producer of the metric or at ingest (server-side).

Traditionally, semantics would often be encoded in a hierarchical manner, but with the advent of
Kubernetes and Prometheus, the label-based, flat organization scheme became more mainstream.
This label-based scheme is more flexible and extensible. The following example demonstrates
the difference:

room_42.sensor_window.temperature 2.5

temperature[location="room_42", sensor="window"] 2.5

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
28

A metric using a hierarchical scheme.


The same metric, using a label-based scheme. If one wanted to add another aspect,
such as the position, with the label-based scheme it’s as easy as adding a new label
(e.g. position="top") whereas with an hierarchical scheme one would need to
reconsider the entire name.

We usually differentiate between low-level or systems metrics and application-level or business.


The former are resource usage metrics like CPU utilization or allocated main memory, the latter
could be things like requests handled per second or revenue generated per user. In general,
low-level metrics can often be automatically gleaned whereas application-level metrics require a
human to assist in the definition and exposure.

Another aspect of metrics, again popularized by Prometheus, is the pull vs. push approach for
metrics. The gist is: in a cloud native setup where one deals with containers or functions that are
short-lived, the discovery is better centralized and the central entity collects the metrics (or:
scrapes as Prometheus folks would say; technically it means calling an HTTP API endpoint to
retrieve metrics) rather than having the applications push the metrics to a central place.

In terms of what are the important metrics to consider, the community came up with a number of
approaches and recommendations, most notably:

The USE (Utilization, Saturation, Errors) method2 is great for low-level metrics and often
used in the context of performance engineering and root cause analysis.
The RED (Rate, Errors, Duration) method3 focuses on the number of requests served, the
number of failed requests, and how long requests take; this is useful for many
application-level cases.
The "Four Golden Signals of Monitoring"4 are latency, traffic, errors, and saturation,
similar to RED and USE combined.

From a consumption point of view we can say that metrics often are graphed (in dashboards) or
are used as triggers for alerts in the context of monitoring and analytics. But metrics are also
useful and indeed used as an input for programs. For example, a Horizontal Pod Autoscaler
(HPA) in Kubernetes may scale the number of replicas up or down, based on some custom
metric, such as the number of request handled by a service.

2.4.1 Instrumentation
Let’s see how we can extend our echo service to emit metrics. The Listing 2.7 shows the echo
service instrumented with metrics. To be precise, we’re using a Prometheus library for the
instrumentation.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
29

Listing 2.7 Example Go program, instrumented to emit Prometheus metrics


package main

import (
"fmt"
"math/rand"
"net/http"
"strconv"
"time"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
log "github.com/sirupsen/logrus"
)

func handleEcho(w http.ResponseWriter, r *http.Request) {


message := r.URL.Query().Get("message")
log.WithFields(log.Fields{
"service": "echo",
}).Info("Got input ", message)
if rand.Intn(100) > 90 {
log.WithFields(log.Fields{
"service": "echo",
}).Error("Something really bad happened :(")
invokes.WithLabelValues(strconv.Itoa(http.StatusInternalServerError)).Inc()
w.WriteHeader(500)
return
}
invokes.WithLabelValues(strconv.Itoa(http.StatusOK)).Inc()
fmt.Fprintf(w, message)
}

var (
registry = prometheus.NewRegistry()
invokes = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "echo_total",
Help: "Total invocations of the echo service endpoint.",
},
[]string{"http_status_code"},
)
)

func main() {
rand.Seed(time.Now().UnixNano())
registry.MustRegister(invokes)
log.SetFormatter(&log.JSONFormatter{})
http.HandleFunc("/echo", handleEcho)
http.Handle("/metrics", promhttp.HandlerFor(
registry,
promhttp.HandlerOpts{
EnableOpenMetrics: true,
},
))
log.Fatal(http.ListenAndServe(":4242", nil))
}

Sets up Prometheus registry.


Defines our metric.
Registers our metric.
Defines the /metrics endpoint handler.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
30

Increments our metric with the context-specific label carrying the HTTP status
code.

2.4.2 Telemetry
We are now in the position to execute the echo service locally and expose the Prometheus
metrics:
$ ./echosvc

We can scrape the metrics of our echo service using Prometheus, which in term can act as a data
source in Grafana as the destination.

To set up the destination, use the Docker Compose manifest file shown in Listing 2.8 (which
references Listing 2.9) to set up Prometheus and Grafana.

Listing 2.8 Docker Compose manifest docker-compose.yaml to stand up Prometheus and


Grafana
version: "3"
networks:
prom:
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prom-config.yaml:/etc/prometheus/prometheus.yml
networks:
- prom
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
networks:
- prom

Similar as we’ve seen for logs, we also need to configure the telemetry agent for metrics, in this
case we this do by defining the Prometheus scrape config, as shown in the prom-config.yaml
file in Listing 2.9.

Listing 2.9 Prometheus configuration file prom-config.yaml


global:
scrape_interval: 5s
evaluation_interval: 5s
external_labels:
monitor: 'lab'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'lab'
static_configs:
- targets: ['172.17.0.1:4242']
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
31

Next, add Prometheus as a data source in Grafana as shown in Figure 2.4 , using
http://prometheus:9090/ as the URL.

Figure 2.4 Adding and configuring the Prometheus data source in Grafana

1. The data source URL (Prometheus HTTP API) that you have to set.
2. Verify if everything works with the "Save & test" button.

Now you can again use the "Explore" tab: select Prometheus as the data source and you should
see something similar to what is shown in Figure 2.5 if you’re entering the following PromQL
query (the error rate of our service):

sum (rate(echo_total{http_status_code="500"}[5m]))
/
sum(rate(echo_total[5m])) * 100

The above PromQL query is the canonical way to capture the error rate. By dividing the errors
(HTTP status code 500) by all of the data points you have a flexible yet complete way to capture
errors. In this case the error rate is in the low single digit rate and should in average be 10% since

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
32

that’s what we have the service with in Listing 2.7.

Figure 2.5 Exploring metrics with Prometheus

1. Filter for metrics by entering a PromQL expression, for example, error rate.
2. Graph visualization of the resulting time series.

2.4.3 Costs and Benefits


We now have a closer look at the costs and benefits of the metrics signal type. In Table 2.2 I put
together an overview about costs around metrics.

Table 2.2mOverview on the costs of metrics


overhead considerations
instrumentation auto-instrumentation if possible increasingly supported in programming
languages
agent Prometheus, OpenTelemetry collector, what to collect, auto-pruning
OpenMetrics as emerging standard
destination cardinality, long-term storage, federation what to ingest, data temperature for
retention

INSTRUMENTATION COSTS OF METRICS


Concerning the costs instrumenting code with metrics we can distinguish two cases:

1. ©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
33

1. Using auto-instrumentation usually requires little effort (such as including a library


and/or configuring it), for example with Prometheus or OpenTelemetry and increasingly
also eBPF-based.
2. Manual instrumentation is, unsurprisingly, more effort than auto-instrumentation but also
offers more control over what metrics exactly you want to expose in which manner.
Typically, these instrumentation should be one-offs, and for a class of metrics
(business-level metrics) this kind of instrumentation is the only viable one.

The readability and with it the maintainability of code, in case of manual instrumentation can
suffer.

Let’s now have a concrete look at the instrumentation costs from a developer’s point of view:
The original code in Listing 2.1 has 24 lines of code and the code instrumented with metrics in
Listing 2.3 has 56 lines of code overall.

Now, in order to isolate the metrics-caused instrumentation costs from the total (logs+metrics)
instrumentation cost, we need to subtract the 8 lines of code from the logs-caused
instrumentation and we arrive at a value of 48 lines of code.

Taking these two LOC numbers together we can calculate the B2I ratio and arrive, for the
metrics, at a value of 2. This B2I value is slightly higher than what we had with logs (1.33), but
still not worrisome.

If you want to learn more about good practices concerning metrics instrumentation I recommend
you peruse the following resources:

CNCF SIG Instrumentation guidance on Instrumenting Kubernetes.


CNCF TAG observability white paper input on Guidelines for developers on how to
implement new metrics.

TELEMETRY COSTS OF METRICS


Similar considerations around agents for metrics apply as we’ve discussed for logs. There is, in
addition the challenge around the cardinality, that is, the number of different values possible in
the dimensions of a metric that can contribute to a (dramatic) increase of the costs.

Examples of metrics agents are:

CNCF Prometheus, especially in agent mode.


The CNCF OpenTelemetry collector and more specifically collectors from
OpenTelemetry distributions, for example the AWS Distro for OpenTelemetry Collector.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
34

DESTINATION COSTS OF METRICS


One of the most challenging things with metrics is what we typically call cardinality explosion.
What does this mean? Well, in many cases, the dimension of a metric only has a small set of
possible values, for example, the number of HTTP status codes is limited to a couple of dozen:
myapp_http_requests_total{code="200", handler="/helloworld"} --> 42
myapp_http_requests_total{code="404", handler="/cryptomining"} --> 5
...

Above doesn’t cause any issues, the time series databases are well able to handle this, from a
storage and query perspective. But what if you decide to encode the user ID into a label as
follows:

myapp_http_requests_total{code="200", handler="/ex1", uid="9876" } --> 567

Congratulations! You have just introduced the potential for cardinality explosion. Why? Well,
imaging your service becomes popular and you end up having to deal with tens or hundred of
thousands or maybe even millions of users.

There can be as many values for the uid label as you have users and that is something that is
very expensive for the time series database to deal with, both in terms of storage and in terms of
query performance to get it right, if possible at all (usually you will see your queries getting
slower in such a case). There’s really only one solution: keep your dimensions in control.

Another source of costs in the destination comes in the case of many sources. Say, you have a
fleet of Kubernetes clusters and want to centrally query the metrics. Now, you have to consider
how you go about long-term storage and federation of your metrics, something we will get back
to in more detail in a later place in the book.

BENEFITS OF METRICS
The very nature of metrics, their numeric values, enables all sort of automation. For example,
you don’t have to manually track a metrics by looking at a dashboard but you define a condition
(goes above this value for at least three minutes, for example) and get a notification if and when
this is the case. That kind of automation is usually called alerting.

Then, there’s the wide area of establishing a baseline (how does the metric "normally" look like,
what range/max/min does it have). One can employ fancy methods here as well and there’s a
whole area, called anomaly detection, dedicated to the topic of what "this looks not normal".
Again, we get back to this topic in greater detail later in the book.

Popular destinations for consuming metrics include:

Front-ends such as low-level/debug offerings including the Prometheus UI or PromLens.


Universal data sources front-end such as Grafana.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
35

Commercial/integrated offerings including CloudWatch, New Relic, Datadog, etc.


Object storage such as an S3 bucket, for later analysis.

With this we’ve reached the end of our metrics cost & benefit discussion, so let’s move on to
metrics use cases for observability.

2.4.4 Observability with Metrics


What can we do with metrics and how is what we just saw related to observability?

In a devops or SRE role:

You can, based on metrics, establish baselines for signals. Further, you can define
anomalies and expections via alerts and get notifications if certain values are exceeded
(this is more of a traditional monitoring use case for metrics since the questions are
already known, up-front).
Track metrics over time, study histograms: this can help for capacity planning or
scalability conversations.
Allows to define operational, tactical goals—usually called Service Level Objective
(SLO) —for example "serve less than 12% errors, over a period of a week". To assess if
the goal has been met, one uses a measurement—usually called Service Level Indicator
(SLI)—such as the error rate (ratio of errors to overall requests).

Now, let’s see how we are dealing with traces.

2.5 Traces
When you hear traces, it can be that you have to disambiguate things first, that is, ask what
someone really means when referring to it. In general, we can differentiate between two cases:
local or distributed:

In the case of local "traces", the more widely used and established term is profiles. We
use tracing and sampling (a profile) interchangeably. What this is about is capturing
certain characteristics of the execution of a process in the context of a single machine,
such as kernel-level events or user space function calls along with resource usage
(CPU/MEM related), such as the case with using strace to see syscalls. If you go a step
further, and do this sampling over a period of time, you end up with what is called
continuous profiling, for example, what the eBPF-based Parca project offers.
Distributed traces are about propagating a context across a number of microservices
along a request path, in the context of a distributed system such as a containerized
microservice. After the request is completed, you can stitch together the separate
executions (spans), turning it into a trace.

In the following, we will focus on distributed traces, and we will have a closer look at continuous
profiling space, its use cases and offerings later in the book.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
36

2.5.1 Distributed Traces


In a nutshell, distributed tracing is all about capturing and visualizing how a request moves
through your microservices, potentially across different machines and involving different
protocols such as HTTP or gRPC.

Let’s have a look at a concrete example, shown in Figure 2.6: on the left-hand side you see how
a collection of microservices that together serve a request. This tree or graph representation is the
call graph (alternatively called service map) that starts with the front-end service FE with which
the user directly interacts. Said service FE then calls three other services (A, B, and C) of which
one (C) in turn invokes two other services (D and E).

If we now visually represent the overall invocation time as a horizontal bar (the longer the bar,
the longer the execution time of the service) we can represent the request handling across the
microservices in a different way, usually called waterfall view.

NOTE The term "waterfall" visualization is neither new nor unique to distributed
tracing. If you look the in-browser developer tooling available nowadays you
will find that it uses a similar representation to show how long the browser
spends handling events such as DOM elements handling or networking
related stuff such as downloading style files or images.

You see the result of this waterfall rendering on the right-hand side of Figure 2.6.

Figure 2.6 Distributed tracing concept

1. ©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
37

1. Example of a microservices call graph with a service called FE as the top-level, entry
point for a user.
2. Waterfall rendering of the microservices calls (again, with the FE service as the top-level,
user-facing service).

In general, we use the following terms to refer to different parts here:

The trace represents the end-to-end processing of the request as it moves through the
participating microservices. A trace can have multiple spans.
A span represents the execution in one particular microservice along the request path
with the root span as the first span in a trace (in our example, that would br FE) and child
spans the subsequent (nested) ones such as A or D.

But how does this work? When a request enters the system, in our case with the service FE then a
unique identifier (the trace ID) is generated. Then, depending on the protocol the services use to
communicate (typically HTTP or gRPC), this trace ID along with some payload is sent on to the
children (FE for example sends it to A, B, and C and C sends it to D and E). This is what we call
context propagation across processes in a networked setup.

The distributed tracing solution abstracts away the technical details how the propagation is
implemented (such as using HTTP headers) as well as takes care of stitching all the spans
together and deliver them to a central place where you then can view, query, and analyze the
traces.

Combining distributed traces with what logs (2.3) offer yields some interesting insights:

You can look up the active trace context, pinpointing interesting sources quickly.
If you attach a log event to the span, you can directly jump from the waterfall/timing
visualization to the relevant log message of a component. This means you don’t have to
search for log messages and can get away with less indexing overhead.

2.5.2 Instrumentation
The Listing 2.10 shows our echo service instrumented with traces.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
38

Listing 2.10 Example Go program, instrumented to emit traces


package main

import (
"context"
"fmt"
"math/rand"
"net/http"
"strconv"
"time"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
log "github.com/sirupsen/logrus"

"go.opentelemetry.io/otel"

"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/sdk/resource"
tracesdk "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.7.0"
)

// traceProvider returns Otel configured for Jaeger.


// based on:
// https://github.com/open-telemetry/opentelemetry-go/blob/main/example/jaeger/main.go
func tracerProvider(url string) (*tracesdk.TracerProvider, error) {
environment := "test"
service := "echosvc"
exp, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url)))
if err != nil {
return nil, err
}
tp := tracesdk.NewTracerProvider(
tracesdk.WithBatcher(exp),
tracesdk.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String(service),
attribute.String("environment", environment),
)),
)
return tp, nil
}

func handleEcho(w http.ResponseWriter, r *http.Request) {


message := r.URL.Query().Get("message")
log.WithFields(log.Fields{
"service": "echo",
}).Info("Got input ", message)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
tr := otel.Tracer("http-api")
_, span := tr.Start(ctx, "echo")
defer span.End()
if rand.Intn(100) > 90 {
log.WithFields(log.Fields{
"service": "echo",
}).Error("Something really bad happened :(")
invokes.WithLabelValues(strconv.Itoa(http.StatusInternalServerError)).Inc()
span.SetAttributes(attribute.Key("http-status-code").String(strconv.Itoa(
http.StatusInternalServerError))) w.WriteHeader(500)
return
}
invokes.WithLabelValues(strconv.Itoa(http.StatusOK)).Inc()

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
39

span.SetAttributes(attribute.Key("http-status-code").String(strconv.Itoa(http.StatusOK)))
fmt.Fprintf(w, message)
}

var (
registry = prometheus.NewRegistry()
invokes = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "echo_total",
Help: "Total invocations of the echo service endpoint.",
},
[]string{"http_status_code"},
)
tp = tracesdk.TracerProvider{}
)

func main() {
rand.Seed(time.Now().UnixNano())
registry.MustRegister(invokes)
log.SetFormatter(&log.JSONFormatter{})
http.HandleFunc("/echo", handleEcho)
http.Handle("/metrics", promhttp.HandlerFor(
registry,
promhttp.HandlerOpts{
EnableOpenMetrics: true,
},
))
tp, err := tracerProvider("http://localhost:14268/api/traces")
if err != nil {
log.Fatal(err)
}

otel.SetTracerProvider(tp)
log.Fatal(http.ListenAndServe(":4242", nil))
}

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
40

2.5.3 Telemetry
We can now execute the echo service locally and ingest traces like so:
$ ./echosvc

We can ingest the traces of our echo service using Jaeger, which we then use as the data source
in Grafana as the destination.

To set up the destination, use the Docker Compose manifest file shown in Listing 2.11 to set up
Jaeger and Grafana.

Listing 2.11 Docker Compose manifest docker-compose.yaml to stand up Jaeger/Grafana


version: "3"
networks:
jaeger:
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "6831:6831/udp"
- "16686:16686"
- "14268:14268"
networks:
- jaeger
grafana:
image: grafana/grafana:8.2.2
ports:
- "3000:3000"
networks:
- jaeger

Next, add Jaeger as a data source in Grafana as shown in Figure 2.7 , using
http://jaeger:16686 as the URL.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
41

Figure 2.7 Adding and configuring the Jaeger data source in Grafana

1. The data source URL (Jaeger HTTP API) that you have to set.
2. Make sure to enable the "Node Graph" option.
3. Verify if everything works with the "Save & test" button.

Now you can again use the "Explore" tab: select Jaeger as the data source, switch to the "Search"
tab (in the "Query type" field), select the echosvc, and if you now select one of the traces by
clicking on a "Trace ID" you should see something similar to what is shown in Figure 2.8 (in the
"Trace View", right-hand side).

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/#!/book/cloud-observability-in-action/discussion
Another random document with
no related content on Scribd:
Eenige zomermaanden bracht jonkheer Van Breukelen met zijn gezin door
in zijn ruime villa in de buurt van Apeldoorn, terwijl hij elk jaar groote
buitenlandsche reizen maakte.

Doch steeds weer trok Neerlands hoofdstad hem het meest aan en de rijke
landjonker bekende dan ook openhartig, dat geen enkele stad in het
buitenland, zelfs niet Parijs, voor hem het aangename en aantrekkelijke
had, wat Amsterdam kenmerkte.

De heer, die rechts van lord Lister zat, kon hoogstens vijf-en-dertig jaren
tellen. Het was een flink gebouwde jongeman met een echt Germaansch
uiterlijk.

Het knappe gelaat had een blanke tint, een kortgeknipt blond snorretje
sierde de bovenlip, de helderblauwe oogen hadden een eerlijke, oprechte
uitdrukking en de regelmatige, witte tanden verhoogden den prettigen
indruk, dien het geheel maakte.

Aan de welgevormde rechterhand prijkte een gladde trouwring en dit was


de eenige kostbaarheid, die de jonge man droeg.

Hij was de eenige zoon van een der aanzienlijkste Amsterdamsche


industrieelen en juist kort geleden, nadat hij in het huwelijk was getreden,
door zijn vader als deelgenoot in de zaak opgenomen. Bij zijn huwelijk met
een meisje van voornamen Frieschen adel had de jonge Herman de
Breesop van zijn vader een prachtig huis in het Museumkwartier ten
geschenke gekregen, dat hij nu met zijn jeugdige echtgenoote en een
uitgebreid dienstpersoneel bewoonde.

De vierde medespeler aan het groene whisttafeltje had een meer burgerlijk
uiterlijk dan de drie andere heeren.

Klein en gezet van gestalte, met levendige oogen en drukke gebaren, met
het ronde gelaat en den steeds tot lachen gereedstaanden mond, maakte
hij reeds op den eersten blik een prettigen indruk. En wanneer men den
kleinen bankdirecteur, Frits van Egelen, eenmaal had leeren kennen, werd
men onwillekeurig aangetrokken door zijn jovialen, hartelijken toon, zijn
gullen lach en eerlijke rondborstigheid.

Meer dan een uur lang was het viertal verdiept geweest in de kansen van
het spel en Frits van Egelen was bezig, de door hem en zijn partner
behaalde winst te noteeren, toen graaf van Sloten op zachten toon sprak:

„Wie is die heer, die hier recht tegenover mij aan dat tafeltje zit? Kent gij
hem?”

Hij had de vraag gericht tot Herman de Breesop, die het dichtst bij hem
zat.

Deze keek in de hem aangeduide richting en antwoordde toen:

„Ja, ik ken hem. Het is een persoon, die in Amsterdam wel bekend, maar
niet bemind is, een zekere Heemberg. Hij is millionnair en woont hier in
een van de grootste huizen aan de Heerengracht.”

„De man heeft een hoogst onaangenaam uiterlijk, vindt u ook niet, graaf?”
vroeg nu Frits van Egelen, die klaar was met zijn noteering en zich nu in
het gesprek mengde.

„Is hij een Nederlander?” vroeg Raffles, nadat hij de vraag van den
kleinen, beweeglijken man aan zijn linkerzijde bevestigend had
beantwoord.

„Ja, van oud-Amsterdamsche familie,” vertelde de Breesop weer. „En naar


men beweert zijn de millioenen die van vader op zoon zijn overgegaan,
niet altijd op even nobele manier verkregen.”

Raffles wierp een vragenden blik op den spreker, waarna deze vervolgde:

„Ik heb mijn vader wel eens hooren vertellen, dat in vroeger jaren de
voorouders van Jan Heemberg, die daar aan dat tafeltje zit, schatten
hebben verdiend door het snoeien van munten.”
„Het snoeien van munten?” vroeg graaf Van Sloten op eenigszins
verbaasden toon. „Ik moet eerlijk bekennen, dat ik nooit van dezen tak van
nijverheid heb gehoord. Zoudt ge mij eens willen vertellen, wat men
daaronder verstaat? U weet, dat ik meer dan mijn halve leven in Engeland
heb doorgebracht en ik heb daar nooit over iets dergelijks hooren
spreken.”

„Het snoeien van munten werd in vroeger jaren, toen ons geld nog zoo
geheel anders was afgewerkt als tegenwoordig, veel gedaan,” vertelde de
Breesop. „Men had toen lieden, die er zich speciaal op toelegden, onze
gouden en zilveren munten van een smal randje te ontdoen. Zij „snoeiden”
ze, zooals men dat noemde.

„De munt kwam weer gewoon in omloop, terwijl de [14]„muntsnoeiers” zich


wederrechtelijk verrijkten. Op die manier zijn er veel gewetenlooze kerels
rijk, ja schatrijk geworden, zonder dat men hen ooit kon bewijzen, op welke
wijze zij hun rijkdommen hadden verkregen.”

„En welk beroep oefent de erfgenaam van die millioenen uit?” informeerde
graaf Van Sloten tot Haersveld, alias John C. Raffles, weer.

„Hij zal den bodem van zijn geldkist nog wel te zien krijgen, als hij een
beetje tijd van leven heeft,” antwoordde jonkheer Van Breukelen lachend.
„Jan Heemberg gooit het geld bij handen vol weg, wanneer het zijn eigen
genot betreft. Aan Amor en aan Bacchus offert hij jaarlijks enorme
bedragen.”

„En aan den speelduivel!” viel de kleine bankdirecteur in. „Het is bekend,
dat in Heembergs huis zoo grof wordt gespeeld als in Monte Carlo.”

„En aan die buitensporige verkwisting paart de man een ongelooflijke


gierigheid, wanneer het er op aankomt, iets te geven voor een liefdadig
doel. Voor armoede en gebrek houdt hij altijd zijn beurs gesloten. Daarvoor
behoeft men bij hem niet aan te kloppen. Zijn stelregel is, dat ieder voor
zichzelf moet zorgen.
Ontzettend veel geld schijnt hij echter over te hebben voor schilderijen; ik
geloof niet, dat de man eenig gevoel voor kunst heeft, maar het is een feit,
dat het verzamelen van schilderijen een ware hartstocht van hem is.
Eenige zalen in zijn huis moeten geheel zijn ingericht als een klein
museum.

„Maar kom, heeren, laten wij doorgaan met spelen,” viel jonkheer Van
Breukelen zichzelf in de rede, „wij verpraten onzen tijd en het onderwerp
van ons gesprek is niet waard, dat wij zooveel woorden over hem vuil
maken.

„Wie moest geven?”

„Ik geloof, dat het mijn beurt was,” sprak Frits van Engelen, „ja, zeker, ik
herinner mij nu, dat ik zooeven met hartenaas uitkwam.”

Zwijgend werd het spel weer voortgezet, slechts een enkele korte
opmerking werd nu en dan gemaakt en de vier heeren wijdden hun
geheele aandacht aan de kaarten.

De millionnair, die aan het andere tafeltje zat en die zooeven het
onderwerp van hun gesprek had uitgemaakt, was weer vergeten.

Alleen graaf Van Sloten wierp af en toe een korten blik op het
roodgekleurde, opgezette gelaat, den stompen neus en de kleine oogjes
van den man, over wiens levensopvattingen hij zooeven het een en ander
had vernomen.

Het was ongeveer half twaalf, toen Raffles het gebouw van de club verliet
om zich naar zijn villa in het Willemspark te begeven.

Met de handen diep in de zakken van zijn overjas, wandelde hij in de


richting van het stadsgedeelte, waar zich zijn woning bevond.

De koele nachtlucht deed hem aangenaam aan en hij was blij, dat hij geen
auto of rijtuig had genomen om zich huiswaarts te begeven.
Van de Weteringschans ging hij de brug over tegenover het Rijksmuseum
en toen het reuzengebouw als een donkere massa vóór hem lag en hij
onder den gewelfden doorgang liep, tusschen de beide zijvleugels van het
museumgebouw, dacht hij als onwillekeurig weer terug aan de
gesprekken, aan de whisttafel en over hetgeen hij daar gehoord had
omtrent den rijken Heemberg.

Een glimlach gleed over de fijnbesneden lippen van den Grooten


Onbekende en een ondeugende uitdrukking verscheen in zijn groote,
donkere oogen.

Hij vond Charly reeds thuis.

De jongeman had zijn avondtoilet verwisseld voor een gemakkelijk


huisjasje en lag languit op den divan te rooken.

„Je bent vroeg thuis, mijn jongen,” sprak Lord Lister, terwijl hij het zich bij
den breeden haard gemakkelijk maakte. „Ben je niet meer gaan
soupeeren met freule Van Stralen en hare familie?”

„Neen, Edward, den berooiden staat van onze financiën in aanmerking


genomen, kwam het mij verstandiger voor, dezen keer maar eens als een
hoogst solide jongmensch regelrecht naar huis te gaan. Apropos, Edward,
heb je er al eens over nagedacht, op welke wijze we onze kas weer wat
kunnen aanvullen?”

„Heb maar geen zorg daarover, beste jongen,” antwoordde Raffles


glimlachend. „Ik geloof, dat ik er al iets op gevonden heb om jou in staat te
stellen, een volgenden keer met je kennissen op royale wijze te gaan
soupeeren.”

Charly keek zijn vriend met vragenden blik aan, maar lord Lister scheen
niet van plan hem iets naders mee te deelen.

Zwijgend rookte de Groote Onbekende een sigarette en liet zich door


Charly een kopje koffie inschenken, waarna de beide vrienden zich naar
hun slaapvertrekken begaven. [15]
[Inhoud]
DERDE HOOFDSTUK.
Bij den erfgenaam der geldsnoeiers.

Dien volgenden dag liet Raffles zich in de club formeel voorstellen aan den
millionnair Heemberg.

De kleine bankdirecteur Van Egelen had den graaf met Heemberg in


kennis gebracht en ook Charly Brand die zijn vriend op diens verzoek naar
de club had vergezeld, was onder den naam van jonkheer Beekbergen
aan dezen voorgesteld.

Nadat de heeren een poos lang over de nieuwtjes van den dag en de
politiek hadden zitten praten, sprak Heemberg, terwijl hij zijn grof,
onsympathiek gelaat in een vriendelijken plooi trachtte te brengen:

„Ik moet tot mijn spijt vanavond tijdig de club verlaten. Een afspraakje, ziet
u?”

En hij knipoogde tegen graaf van Sloten, terwijl een breede grijns om zijn
mond verscheen.

„Daar ik het echter heel aangenaam vind, kennis te hebben gemaakt met
de heeren,” vervolgde hij, eerst Raffles en daarna Charly Brand
aankijkend, „zou het mij aangenaam zijn, als u, graaf van Sloten en ook u,
Jonker Beekbergen, mij het genoegen wildet doen mij morgenavond te
komen bezoeken. Ik heb verschillende andere kennissen uitgenoodigd en
ik twijfel er niet aan, of de heeren zullen zich wel amuseeren.”

Charly wierp een zijdelingschen blik op zijn vriend, niet anders


verwachtende of deze zou voor deze plotselinge uitnoodiging van den
nieuwbakken kennis bedanken, maar tot zijn verbazing hoorde hij, hoe
Raffles antwoordde:

„Mijn vriend en ik zullen gaarne gebruik maken van uw vriendelijke


uitnoodiging, mijnheer Heemberg. Of had je soms andere plannen,
Beekbergen?” wendde hij zich tot Charly, wien de ondeugende flikkering in
lord Listers oogen niet ontging.

„Welneen,” antwoordde Charly en op hoffelijken toon sprak hij tot den


millionnair:

„Met heel veel genoegen zal ik morgenavond uw gast zijn. U woont


immers aan de Heerengracht, niet waar?”

Heemberg gaf hun zijn volledig adres op en nam haastig afscheid van de
heeren, waarna hij het clubgebouw verliet.

Inwendig was hij innig verheugd over deze beide nieuwe bekenden en het
streelde zijn eergevoel, dat de voorname, aristocratische heeren zoo
zonder slag of stoot zijn uitnoodiging hadden aangenomen.

„Beiden van adel,” mompelde hij, terwijl hij de breede marmeren trappen
van het clubgebouw afging, „en de oudste is een graaf! Dat zal de lui toch
imponeeren, als ik ze morgenavond aan hen voorstel! En Betty, wat zal
mijn kleine Betty er van op hooren, als ik het haar straks vertel! Enfin,
Betty zegt altijd dat ze meer geeft om een millioen dan om een adellijken
titel en dat is nog zoo dom niet van haar geredeneerd!”

Hij had in zijn prachtige auto plaats genomen en de chauffeur bracht hem
vliegensvlug naar het aardige heerenhuis aan den Amstel, waar zijn
hartsvriendin, de mooie Betty, woonde.

„Edward, ik begrijp er weer niets van!” mopperde Charly Brand dien avond,
toen de beide vrienden samen naar huis wandelden.

„Hoe kwam je er in ’s hemelsnaam toe, om de uitnoodiging van dien


plebejer aan te nemen?”

„En waarom niet, Charly?”


„Maar vindt je dan niet, dat de man een allerongunstigst uiterlijk heeft en
ook door zijn manieren verraadt, dat hij maar absoluut geen beschaving
heeft?”

„Jawel, mijn beste jongen, die zelfde opmerkingen heb ook ik gemaakt.”

„Dus je bent het met mij eens?”

„Volkomen!” [16]

„En toch.…..”

„En toch heb ik die invitatie met beide handen aangenomen. Och ja,
Charly, er zijn van die raadselen in het leven, die voor jou altijd eerst
geruimen tijd onoplosbaar blijven en die later, soms al heel gauw zelfs,
doodeenvoudig blijken te zijn.”

„Heb je dan een bepaalde bedoeling met je bezoek aan dien man?”

„Misschien, Charly, misschien! Maar vertel me nu eens, is de Italiaansche


ster—hoe heet ze ook weer?—die gisteravond voor Carmen speelde,
werkelijk zoo uitstekend?”

Charly mompelde eerst iets onverstaanbaars tusschen zijn stevige witte


tanden. Hij was eigenlijk een beetje uit zijn humeur, omdat Edward hem al
weer als een kwajongen behandelde, over wiens vragen men, als men ze
niet verder wenscht te beantwoorden, losjes heen praat.

Maar toen lord Lister hem vriendelijk glimlachend aankeek en Charly den
blik opving uit die guitige bruine oogen, verhelderde het gelaat van den
jongen secretaris als door tooverslag en antwoordde hij:

„Neen, die Carmen was niet schitterend. Ik had er meer van verwacht.
Signora Dellonza is een mooie verschijning, haar stem is glashelder, maar
het was lang niet het meesterlijke spel, dat wij vroeger in die zelfde rol
zagen van Gemma Bellincioni. Herinner je je haar nog, Edward?”
„Natuurlijk herinner ik mij haar! Ja, dat was onvergetelijk mooi! Hoorden wij
haar niet in de Groote Opera in Parijs?”

Charly raakte geheel in vervoering, als hij, die zooveel voelde voor muziek,
weer kon spreken over de genotvolle avonden, die hij had doorgebracht in
de Parijsche Opera. Ook had hij met zijn vriend te zamen eenige keeren
de Wagner-opvoeringen in Bayreuth bijgewoond en deze herinneringen
behoorden tot de liefste uit Charly’s leven.

„Ziezoo, oude jongen, laten wij ons nu gaan kleeden voor den soirée bij
onzen nieuwen vriend,” sprak lord Lister den volgenden avond, nadat hij
met Charly copieus had gedineerd en zij op hun gemak een after-dinner
hadden gerookt.

Beide heeren stonden op uit de gemakkelijke fauteuils aan weerszijden


van den haard en begaven zich naar hun kamers.

Het was ongeveer negen uur, toen de donkerblauwe auto van lord Lister
stil hield voor het breede heerenhuis, dat bewoond was door den heer Jan
Heemberg.

Een bediende in keurige livrei bracht hen naar een ruime kleedkamer en
nadat zij hun pelsjassen en cylinders hadden afgegeven, bracht een even
deftig gekleede huisknecht hen naar de feestzalen.

Langs een prachtig gebeeldhouwde trap op welker breede treden een


kostbare Perzische looper lag, bereikten zij de eerste verdieping van het
huis.

Alles baadde in een zee van licht en Raffles merkte vol bewondering op,
hoe kunstig het trappenhuis was bewerkt, hoe artistiek de beschildering
van deuren en gangen was.

Een reusachtige suite was daar op de eerste étage herschapen in een


feestzaal, waar een zee van electrisch licht werd weerkaatst in spiegels en
kristal, zoodat het geheel een schitterenden aanblik bood.
Toen de livreibediende met luide stem bij den ingang de namen der beide
heeren aankondigde, maakte de gastheer zich onmiddellijk los uit een
groepje om, een glans van genoegen op het breede gelaat, zijn bezoekers
te verwelkomen.

Luid sprekend en met drukke gebaren geleidde hij hen daarna door de
zalen, hen telkens voorstellende aan andere gasten, steeds met zichtbare
voldoening de goedklinkende namen van zijn nieuwe vrienden noemende.

„Graaf van Sloten tot Haersveld en jonkheer Beekbergen!” sprak hij steeds
weer opnieuw en dan hoorden Raffles en Charly Brand de namen van
dames en heeren noemen, die zij zich niet herinnerden, ooit gehoord te
hebben.

Maar onberispelijk was de buiging, die lord Lister maakte voor de gasten
van den millionnair en Charly volgde trouw het voorbeeld van zijn vriend.

Het was een eigenaardig gezelschap, waarin zij zich bevonden.

Op breede divans, die met kostbare Perzische kleeden waren bedekt en in


gemakkelijke causeuses of zachte fauteuils, zat een deel der gasten,
meestal in nonchalante houding. Anderen weer hadden plaats genomen in
gezellig ingerichte hoekjes, waar onder groote palmen zetels waren
aangebracht van allerlei vorm.

Raffles had, toen hij met zijn gastheer door de zalen wandelde onmiddellijk
gezien, dat zij, die hier waren samengekomen, niet behoorden tot de
betere kringen der maatschappij.

Van de heeren waren er misschien enkele, die zich mochten beroemen op


een goed klinkenden naam, maar [17]de dames, waarmede zij zich hier
vertoonden, behoorden zeer zeker niet tot het soort, waarmede zij hun
zusters in aanraking zouden brengen.

Daar op den divan zat in bevallige houding, de mooie Betty, de koningin


van het feest.
Zij was ontegenzeggelijk een buitengewoon bekoorlijke verschijning. Het
mooie gezichtje met de groote blauwe oogen en het pikante wipneusje
was omgeven door weelderig roodblond haar, dat volgens de laatste mode
gekapt was en waarin diamanten versierselen schitterden.

De zeer ver gedecolleteerde avondjapon van soepele zijde omsloot haar


welgevormde, slanke gestalte; de blanke, gevulde hals en de ronde armen
waren getooid met kostbare diamanten sieraden, terwijl de kleine voetjes
in gouden schoentjes staken.

Zij babbelde druk met twee jongelui, aan wie zij een verhaal deed over een
autotocht, die zij onlangs had gemaakt met „Bobby”, op welken tocht zij
bijna in botsing waren gekomen met een tramwagen.

Met „Bobby” bedoelde zij blijkbaar den gastheer, want Raffles hoorde, hoe
zij vertelde:

„Het was voor den eersten keer, dat Bobby zelf stuurde, wij hadden dus
geen chauffeur meegenomen. Het is een kleine landaulette, die hij me pas
cadeau heeft gegeven en waarmee we even naar den Haag zouden tuffen.

„Nou, even achter Haarlem, daar hadt je ’t, hoor! Bij een bocht van den
weg, de tram van den eenen kant, wij van den anderen en jawel, Bobby
stuurt natuurlijk precies verkeerd. Het scheelde geen haar of we hadden
den heelen tram onderste boven gereden, nietwaar Bobby?”

En luid lachend greep zij de hand van den gastheer, die op eenige
schreden afstand met graaf van Sloten tot Haersveld stond te praten.

„Bobby” gaf de gelukkige bezitster van de landaulette een knipoogje en


drukte even het kleine, blanke handje, dat prijkte met kostbare ringen.
Toen wendde hij zich weer tot zijn adellijken bezoeker, wien hij juist een
pikante brunette, die op een der divans zat, had aangewezen.

„Mooi is ze, vindt u niet, graaf van Sloten? Ja, ik voor mij, ik houd meer
van blond, maar ze is voor een brunette heel blank en dan—zij is
buitengewoon lief. Tot verleden jaar heeft zij in Parijs gewoond, met een
van onze bekende schilders, voor wie zij als model poseerde.

„Nu is zij de vriendin van den kleinen Van Balen, die daar naast haar zit. U
zult straks wel nader kennis met haar maken.

„Van Balen is al een oud vriend van mij. Zijn vader is schatrijk, maar er zijn
vijf zoons, dus hij zal niet zoo heel veel van de ouwe erven en Adile is een
duur paardje op stal!”

„Ja, ja,” antwoordde graaf van Sloten. „Het zijn interessante


geschiedenissen, die u mij daar vertelt, mijnheer Heemberg, ik vind het
hoogst belangwekkend, al die verhalen te hooren.”

„O, als het u genoegen doet, zou ik u nog veel meer grappigs kunnen
vertellen!” vervolgde de millionnair op zachten toon.

„Ziet u die dame daar naast dien bejaarden heer met zijn monocle?”

„U bedoelt die in het lichtblauw?”

„Juist, graaf! Van haar verleden zou men een roman van twee deelen
kunnen schrijven. Ik zal u een paar bijzonderheden er uit vertellen. Maar
wacht, laten we hier even plaats nemen, u kunt dan meteen iets
gebruiken.”

Een der bedienden naderde juist met een zilveren blad waarop
ververschingen werden aangeboden, maar lord Lister bedankte.

„Maar steek dan tenminste een sigaar op, graaf Van Sloten!” riep de
gastheer uit. „Of rookt u ook niet?”

„Dolgraag zelfs,” klonk het antwoord, „en daar ik zie, dat de meeste dames
zelf van een sigaret genieten, zal ik zoo vrij zijn.”

„Nu wil ik u eerst het beloofde verhaal doen wat betreft die dame in het
lichtblauw,” sprak Jan Heemberg weer.
En op bijna fluisterende toon vertelde hij:

„Zij wordt nu weer overal ontvangen en geeft zelfs den toon aan op onze
feestjes en bals, maar haar verleden is.…..”

En een veelzeggend hoofdgebaar volgde.

„Reeds als heel jong meisje was zij de goede vriendin van een onzer
voornaamste staatslieden. In een vlaag van jaloerschheid heeft zij hem
willen dooden; de aanval mislukte echter en zij bracht hem alleen maar
een ongevaarlijke wond toe.

„Zij heeft een paar jaar gevangenisstraf achter den rug, is daarna in
Amsterdam komen wonen en heeft hier alweer menig hoofd op hol
gebracht. Het is een kleine heks, u moet die oogen maar eens zien.

„Nu woont zij tijdelijk in een van de groote hotels hier. Over een paar
weken gaat zij met mijn vriend [18]naar de Riviera en of we haar dan ooit
hier terug zullen zien, betwijfel ik zeer.”

„Waarom?”

„Wel, als ze daar in de buurt van Monte Carlo een nog rijkeren minnaar
aan de haak kan slaan, zal zij het niet laten, daar ken ik haar wel voor. Ik
heb mijn vriend herhaaldelijk gewaarschuwd maar hij wil niet luisteren!”

„Laat ik u nu niet langer aan het gezelschap van uw andere gasten


onttrekken,” sprak graaf van Sloten, terwijl hij opstond en den gastheer
volgde naar het druk pratende groepje, waarvan Betty en twee harer
vriendinnen het middelpunt uit maakten.

Terwijl lord Lister, schijnbaar niet zonder eenige belangstelling deelnam


aan de gesprekken, die rondom hem werden gevoerd, liet hij zijn blikken
onophoudelijk door de ruime salons gaan.

Hij zag, dat Charly in druk gesprek was met een heer en dame, welke
laatste een zeer opvallend uiterlijk had. In het vrij knappe gelaat waren de
groote, blauwe oogen overschaduwd door zwarte wenkbrauwen, die
eigenaardig afstaken bij het weelderige goudblonde haar. Als zij lachte,
vertoonden zich tusschen de kersroode lippen twee rijen parelwitte tanden.
Zij was niet groot, maar de witte avondjapon omsloot het sierlijkste
figuurtje, dat men zich kan denken.

Charly lachte af en toe hartelijk, waaruit Raffles begreep, dat de jonge man
zich nogal vermaakte met zijn nieuwe kennissen.

Lord Lister had op eenigen afstand van Betty plaats genomen bij een klein
tafeltje.

Tegenover hem zaten twee heeren, die het druk hadden over de laatste
wedrennen, die zij hadden bijgewoond.

Raffles sprak af en toe een woordje mee, maar geen woord ontging den
Grooten Onbekende van hetgeen Jan Heemberg en Betty samen spraken.
En terwijl hij schijnbaar vol belangstelling luisterde naar de verhalen der
beide heeren tegenover hem, hoorde hij Betty tot den gastheer zeggen:

„Kijk eens, Bob, hoe druk blonde Ka het heeft met je nieuwen vriend. Wat
ziet ze er vanavond simpeltjes uit, hè? Wil je wel gelooven, dat ze die
japon nou al voor den derden keer op een avondje draagt?”

„Zou Jansen haar niet genoeg kleedgeld geven, denk je?” informeerde
Heemberg. „Ja, ja, iedereen wordt niet zoo verwend als mijn kleine
koningin!”

„Och kom, Bob, ze krijgt genoeg van hem, dat zegt ze zelf. Maar weet je,
ze stuurt tegenwoordig geregeld geld aan een zuster van ’r, die verleden
jaar geopereerd is en nou nog altijd buiten woont. Jansen weet daar
natuurlijk niks van, maar, zie je, haar zuster had natuurlijk geen rooje cent,
toen ze uit het ziekenhuis kwam en nou ondersteunt Kaatje haar nog
tijdelijk.”

„Is dat die zuster, die hier vroeger kellnerin was in de Warmoesstraat?”

„Ja, een knappe meid, veel mooier dan Ka, want als die d’r haar niet zoo
mooi blond verfde, zou je eens wat zien!”
Het gesprek werd nu afgebroken door de komst van een der
livreibedienden, die op fluisterenden toon zijn meester iets vroeg.

Heemberg gaf een bevestigend antwoord, waarop eenige oogenblikken


later de zachte tonen van een wals door de zalen klonken en reeds
zweefden danslustige paren over den spiegelgladden parketvloer.

„Zijt gij een liefhebber van dansen, graaf?” vroeg Heemberg, die zich weer
bij Raffles had gevoegd, nadat Betty aan den arm van een zijner vrienden
tusschen de dansenden was verdwenen.

„Neen, ik dans niet,” antwoordde lord Lister, „maar ik kijk er met genoegen
naar.”

„Och, aan zoo’n enkel walsje bezondig ik mij wel eens,” sprak de gastheer
weer, „maar die moderne dansen zijn mij te geleerd en ik ben te oud om
nog van die malle bokkesprongen te leeren maken. Mag ik misschien,
terwijl de anderen zich hier amuseeren, het genoegen hebben, u mijn
schilderijenverzameling eens te laten zien?”

„Als het niet te veel moeite is, dan heel graag!” antwoordde Raffles. „Ik
interesseer mij bijzonder voor schilderijen en men vertelde mij reeds, dat
uw collectie zeer kostbaar en interessant is.”

„Dan zal ik ook jonkheer Beekbergen en twee anderen van mijn gasten,
die mijn schilderijen nog niet kennen, vragen of zij lust hebben om mee te
gaan. De wals is juist geëindigd.”

De heeren volgden den gastheer naar de hooger gelegen étage, waar een
drietal ineenloopende zalen tot een klein schilderijenmuseum waren
ingericht.

Heemberg had geld noch moeite gespaard om de zalenbouw door


bekende architecten zóó te doen inrichten, dat het binnenvallende licht op
de meest gunstige wijze de schilderijen bescheen.

Het was inderdaad niet alleen een kostbare, maar ook een
belangwekkende verzameling van de moderne Nederlandsche meesters.
Er waren er van Israëls, Mesdag, van Thérèse [19]Schwarze en van
Theophile de Bock, watergezichten van Poggenbeek en interieurtjes van
Blommers.

Ook hingen er werken van Bisschop, van Maris en veel andere meesters.

„Ik ben voortdurend bezig, mijn verzameling aan te vullen,” sprak de


gastheer en op blufferigen toon liet hij er op volgen:

„U ziet, heeren, dat ik hier al voor een heelen duit bij elkaar heb, maar dat
kan me niet bommen. Een schilderijenverzameling staat altijd gekleed en
de dingen houden hun waarde.”

„Bent u niet bang voor diefstal?” informeerde lord Lister en hij keek den
parvenu met zijn groote oogen aan.

„Inbraak? Welneen. Mijn huis is goed verzegeld en wordt uitstekend


bewaakt.”

„Oho, dat is geen bezwaar!” mengde Charly Brand zich in het gesprek.

„Het Louvre-Museum heette ook goed bewaakt en toch heeft een


halfgekke Italiaan op een goeien dag de Mona Lisa weggekaapt.”

Allen lachten, toen Charly deze opmerking maakte en de gastheer voegde


den jongen spreker toe:

„Zoo’n wereldberoemd stukje zou ik ook nog wel in mijn verzameling willen
hebben. Als u eens zooiets tegenkomt, jonker, hou ik me aanbevolen. Ik
zal u reëel behandelen.”

Op dit oogenblik kwam een der gasten, die tot nog toe zich met dansen en
converseeren had vermaakt, de trap op en naar den gastheer toe, wien hij
een hand op den schouder legde.

„Heemberg,” sprak hij, „nu je niet in het geroezemoes van de feestzaal


bent, kom ik je even een verzoek doen. Ik wou je vragen, of je voor een
flink bedrag wilt inteekenen voor het nieuwe kinderhuis. Ik sprak je er
reeds over. Het is voor de verwaarloosde stakkers, die zonder vader of
moeder het leven door moeten en geen cent bezitten. Jij zult zeker de lijst
der inteekenaren met een groot bedrag willen openen.”

Heemberg’s gelaat had een gansch andere uitdrukking aangenomen.

„Geen cent geef ik je voor die schreeuwleelijkerds,” sprak hij op bijtenden


toon. „En doe me nou een pleizier, Plankenga, en maal me niet meer aan
mijn ooren met je kinderhuis. Je weet wel, dat ik me met liefdadigheid en al
dien onzin niet ophoud. Veel pleizier, kerel!”

Hij draaide zich op zijn hielen een halven slag om, wees met een luchtig
gebaar naar een schilderij, dat een Weesmeisje voorstelde, dat van een
oud-Hollandsche kast een aarden schotel tilde en sprak tot zijn gasten:

„Dat heb ik verleden week voor mijn verzameling aangekocht. Niet duur,
heeren! Acht mille! Een koopje, wat?”

Raffles zag hoe teleurgesteld de heer met de inteekenlijst zich afwendde.

„Een oogenblik, mijnheer Plankenga,” sprak hij, terwijl hij toetrad op den
philanthroop.

„Mag ik misschien met een klein bedrag de rij der inteekenaren openen?”

Plankenga boog.

„Met genoegen, graaf.”

Raffles nam het papier, haalde zijn gouden vulpenhouder te voorschijn,


zette den naam: Van Sloten tot Haersveld en vulde daarachter het bedrag
van vijfduizend gulden in.

Een blos van genoegen trok over het gelaat van Plankenga.

„Vijfduizend gulden!” riep hij uit, „maar dat is enorm, graaf! Ik dank u uit
naam van de arme toekomstige verpleegden voor die koninklijke gift!”
Allen betuigden thans om strijd hun bewondering voor deze daad,
Heemberg alleen lachte ironisch en noemde de daad van den graaf
belachelijk.

En Charly, in wiens groote blauwe oogen een ongeveinsde verbazing


weerspiegelde, floot zachtjes voor zich heen het wijsje van het liedje, dat
hij op Koninginnedag een troepje thuiskomende feestgangers had hooren
zingen:

„Wie zal dat betalen


Zoete, lieve Gerretje?”

Het antwoord op deze onuitgesproken vraag zou de naaste toekomst


brengen. [20]

[Inhoud]

You might also like