Event Sourcing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Event Sourcing

This chapter is going to be focused on the many intricacies and simplicities (they
are often confused!) of Event Sourcing.
• What is Event Sourcing?
• What is the history of it? Is it new?
• Why do people use it?
• What are trade-offs of modelling things in this way?
Event Sourcing is certainly a bit unusual in that it is a pattern which has been
around for many decades with countless systems built upon it, but it was rarely
talked about. It was a completely niche subject until relatively recently. It was
“software dark matter”. Literally it did not even have a name.
Many people associate me to Event Sourcing often as even being the creator
of building systems in such a way. This could not be further from the truth. I
worked on an Event Sourced system as my first job coming out of university.
The system in question was dealing with gambling. We had transactions which
went to a log and things which followed this log representing information by
reading those transactions, representing the information in shared memory which
could then be read. If you squint at it just right, it worked in an almost identical
way to what people generally discuss with Event Sourcing today.
Event Sourcing is not new, it is in fact very old. It used to be quite common to
build up systems in a style similar to what people call “Event Sourcing” today.
Somewhere around the 1990s the style became less popular to build up systems
with as instead and we will get more into this your database did it internally.
To give the reader a bit of a laugh, in trying to track back the history of
Event Sourcing I have even found people using similar ideas before I was born.
Certainly I was rather unlikely to have been involved with the creation of
something prior to my being born or perhaps the event stream which involves
“me” is far more complex than I currently understand it to be.
So given all of that let’s throw on our “archeaologist hats” and jump right in.
Event Sourcing says that current state is a left-fold of previous behaviours.
This statement while absolutely being true and conveniently succinct is sadly
also slightly above useless in terms of actually understanding/explaining things.
It would instead almost certainly be simpler for us to break down the problem
into multiple stages.

What is an Event?
The definition of an Event is perhaps bizarrely a quite simple thing and simulta-
neously something I have seen many people make some drastic mistakes with

1
long lived repercussions associated to them.
An event is a fact which has occurred at a point in time.
Alice accepted package AC-378495 at 07:55:42. Bob departed in car #64031 to
the store AC8392 at 18:17:22. Charles deactivated item #829376 as it was no
longer available at 12:33:31.
There are some important details here which effect many other things. The first
is that when we discuss an Event, the action has completed.
Events are always in the past tense. If we were to discuss other languages they
are in the “completed past”, in French as an example they would be written in
passe compose. The things discussed are not “ongoing actions” which may be
continuing into the future they are an action which has completed.
There are times this linguistic difference becomes important. Do we model
things as a “XYZBatchJobRunning” or do we have a “XYZBatchJobStarted”
and a “XYZBatchJobCompleted”. The first describes something which “is
occurring” not something which “has occurred”. When describing events we
describe something which has occurred.
That something may be occurring can be derived off the events which
have occurred.
There are numerous cases where this linguistic difference can confuse people.
If you run into the situation where the action is not something which happens
immediately as an atomic action introduce multiple events (started/ended/etc)
to avoid being opaque. Every event is an atomic action which has completed.
We need this in order to be able to rationalize about the system as a whole . . .
Even a single place breaking this rule adds exponential complexity in terms of
actually understanding things and should be avoided.
Be vary wary of any action which states something occurring as opposed to
having occurred. A good question to ask is . . . “What if somebody pulls out
the power cable?”. Power cables do occasionally come out of computer systems
and this can cause a slight problem.
Given that an Event is a “fact that happens at a point in time”, then what is
Event Sourcing?

Event Sourcing
Event Sourcing says that all of our current state is derived off of the events that
we store.
That is it. That is the entirety of Event Sourcing. Literally it is trivial . . . until
it is not. It only states that every piece of state that we have is derived off of
this series of events which are structured into (a) log/logs.

2
It does not matter if we are discussing say a piece of state such as a domain
object in memory or whether we are discussing a piece of state such as a table
off in a database. ALL of that state is directly derived off of the log of events.
Beyond this, any of that state can be thrown away at any moment and rebuilt
from scratch by replaying those same events as they are by definition derived off
of the log of events. The state is in fact a direct “interpretation” of that that
series of events. Said differently ALL of that state is transient.
We have a further benefit in that we can redefine what that “interpretation”
from the log of events to a piece of state looks like. We can even come up with
new ideas of what that transformation should be and send them back in time.
There is a very solid laugh to be had here about sending your current-self back
to 1993 to reinterpret Ace of Bass - All That She Wants.
That song is stuck in your head now isn’t it. It did the same to me
while writing this
While today you may consider sales to be grouped by the customer you may
later realize that they are, in fact, not “single customers” but instead “groups of
customers” which are assigned to the same organization. Such a change would
be quite difficult to manage inside a relational database but can be quite easy to
deal with in an event sourced system. We can in fact just build out a different
read model to represent things in the new way without breaking existing things.
The ability to reinterpret your previous data in the new . . . perhaps interesting
. . . way is highly valuable. In fact, it is quite common dealing with Event
Sourced systems to create such a new model just to answer a question. If the
answer is not interesting you then just throw it away. This type of interaction
initially sounds very odd to those who have had a largely data-model centric
viewpoint over time, but it is in fact. . . Tuesday.
This building a model and throwing it away does a great job of showing how we
treat Read Models in such systems. They are transitory. It is very common to
build one just to answer a single question and then . . . throw it away. We now
have the answer to the question we no longer need to keep it running. This is
highly valuable in many businesses as we can literally answer . . . any question
which might be asked. It may take us some time to answer the questions (I have
seen some pretty complex projections!) but we can answer it.
There is however a secondary trait here. We are not required to throw them
away. Perhaps we are working on a financial system and the people associated
are quite proficient in excel . . . who could even imagine financial people being
proficient in excel?! We could build out a denormalized Read Model that they
then connect to with . . . excel. There are no reports etc off of it, the only use of
this Read Model is for them to be able to connect to it with excel.
Not only does this work but traders/finance people . . . love it. This is literally
their dream of how to interact with such a system in many cases. Part of the
reason why is that they do not want to share with others how they look at the

3
data. It is important to consider that in many financial houses people are not
only competing against people in other financial houses but also with the people
in their own house.
Due to this “detail” very often people working in such roles do not even want
to tell YOU how they look at data. Literally, how they look at data is their
competitive advantage over others. Can they trust you? If YOU know how they
look at data, who else might you tell over an after work pint one night?
This want to guard “how I look at data” is not limited to a small group such as
traders. There are numerous other industries which operate in similar ways. This
ability to provide “private models” can be a massive feature in many business
environments and can further be a key point in “selling” the ideas to people
early in the process. Try converting your current data into an event log and
building out some basic tooling that they can access it. This does not mean to
actually Event Source it, reverse engineer things first.
Very often this will be enough. Bonus points for making it a near real time sync
where they can do things on live data as it arrives. Once they understand the
value of it, the rest becomes quite simple.
Let’s get back into again what Event Sourcing actually is. Allow me to quote
myself “Event Sourcing says that all of our current state is derived off of the
events that we store.”
How do we actually reach this point?
Event Sourcing is saying that we will store . . . facts. We will write these facts
to an Event Log and then replay those facts to get back our current state.
Previously there were three facts listed.
Alice accepted package AC-378495 at 07:55:42. Bob departed in car #64031 to
the store AC8392 at 18:17:22. Charles deactivated item #829376 as it was no
longer available at 12:33:31.
These are not interesting as they are not directly related to each other from
almost any viewpoint someone may have. Let’s try another set of facts.
Jane accepted return of car #64031 at location #28293 at 19:27:12 Bob cleaned
car #64031 at location #28293 at 22:45:12 Janice moved car #64031 at location
#28293 to ready position 14 at 22:47:38 Jane rented car #64031 at ready position
14 to REDACTED PRIVATE INFORMATION
What is the current status of car #64031? This might even be a bit of a tough
question, it might even depend on . . . who is asking?
It could be that the redacted information is in fact just encrypted, and only
certain people are able to negotiate for a key in order to be able to read it. This
is in fact a very common pattern to use with event streaming in general as I want
to put out things efficiently. I would much prefer to send out all the information

4
over say a multicast feed to 10,000 users with some information encrypted than
to deal with 2,000 separate feeds.
You can see this precise pattern of encrypting information then putting it
over a multicast feed used in many financial markets. Sometimes even private
information will come across the public market feed. This information is however
encrypted, and a key is required to figure out what the information actually is.
Not all information is encrypted using the same key. It is much less expensive to
manage one feed with varying data encrypted in different ways than to manage
many feeds.
This pattern can as well be used in many other event sourced systems in terms
of moving around data. It is often much easier to deal with one thing than with
five hundred. So long as you encrypt the data properly you can in fact maintain
a single feed going out that is “selective read” to the consumers who are reading
it.
This is however an advanced use case and one you should not just “jump right
into”. Maintaining keys etc with such a feed is much more difficult than other
mechanisms which can be used, but it is a viable mechanism that can be used if
you run into it. Utilizing it allows you to push off the work onto the networking
infrastructure as opposed to making it part of your project.
Event Sourcing at the core is the storing Events and then replaying from them.
It is not an unusual pattern, it has been used in many transactional systems
since the early 70s. It is the “old new thing” [1] which is definitely worth reading.

Deletion
One question which often comes up when discussing Event Sourced systems is
how do you delete data? While this may initially seem to be a trivial detail it is
in fact anything but. This is one of the first discussions you should be having
about any Event Sourced system.
It sounds quite a bit off that one of the first discussions your should be having
is about how to delete data not what data you should be storing but doing so
will help you quickly focus on data life cycles.
Data life cycles are important in any system but they are especially important
in Event Sourced systems.
What is funny is that when discussing deletion in Event Sourced systems the
first rule is . . . Don’t. I am not only referring to not deleting data but discussing
why you are not leads to some fun situations.
“Wait . . . WAT? What do you mean, don’t delete data?! There will be a huge
amount of it!” “So what?” “But there might be terrabytes of data and . . . ”
“So?” “Huh?” “Does the data need to be ‘active’? We are discussing the wrong
thing.” “So you are saying that we should instead be talking about how to back
things up?” “No, I am suggesting that what needs to be discussed is lifecycles

5
of data. Some events are associated to things which are quite short in terms of
lifecycle other things are associated to things with very long lifecycles, making
a differentiation between the two is important.” “Ok.” “Beyond this in many
domains we find natural temporal boundaries in the system which we should
be looking to exploit.” “What is an example of a ‘natural temporal boundary’?”
“End of day, End of Year, and Shift Change are all good examples of a natural
temporal boundary.” “OK that makes sense but what on Earth does that have to
do with deleting data?” “Because you don’t delete data . . . you instead change
the entire eventstore in production. Data which was live at the boundary”moves
off” and only data which is still relevant gets brought forward to the other side of
the boundary. Think of it like doing a migration on all of your time boundaries.”
“A migration?!” “Why not?! it also becomes a quite useful timepoint when
further discussing releases.” “Wait so every release is a ‘migration’ ” “Yes. You
understand!” “Actually, I think I am beginning to.”
Dealing with events is quite different than dealing with a structual model in
terms of versioning. It will take some time to “come to grips” with the differences
between the models. If you try to treat events the same way you would structured
data it is like trying to “french fry when you mean to pizza” [3]

Multiple Streams
One thing which gets very weird with Event Sourced models is that they are
different. They get especially different when you start talking about attempting
to write to multiple streams concurrently.
This is on its face is a completely normal thing to do. I would like to remove an
item from bin #27271 and add that item to bin #28322.
#Simples
Is it?!
In order to complete this operation two streams will need to be written to.
Streams are much like “documents” in a document database. If you want to
scale, they are generally your partition point.

Summary
Events are a relatively simple concept. They are a fact which occurred at a point
in time. They are represented by an immutable piece of data showing this fact.
Event Sourcing is the replaying of these Events in order to get back a piece of
structure representing how you interpret these facts. The interpretation of these
facts can change over time.
There might even concurrently be multiple ways of interpretating the set of
facts which exist. There should be no worries at all about this occurring. The

6
interpretations have separate life-cycles than the software which is producting
the events.
Every piece of state in the system is directly interpreted off the series of events.
As every piece of state in the system is directly interpreted off the series of events
every piece of state can be interpreted to any point in time that the system has
existed. There is further an assured audit log for every piece of state in the
system.
You will never be able to predict the questions that people will ask you in the
future. The ability to take a new question and bring it back in time to any point
in time that the system existed is highly valuable.
The value of the data is also why you quite rarely want to delete it. The managing
of data in production can be a quite domain specific problem but look for natural
time boundaries within your data. The ability to “segment” the log of events
based upon natural boundaries within the data itself is the best way of handling
things.
Also look at these time boundaries when attempting to figure out what your
Bounded Context/service boundaries might be. Seeing a context/service with
multiple time boundaries inside of it is a quite strong smell that you are likely
dealing with multiple contexts. More often than not you will find that there are
different groups who are looking at the data in different ways.
The fact that events are making up a log inherently leads to many of these things.
It was not at all uncommon historically for things to be structured in this way.
Before SQL databases became the best thing since sliced bread systems were
commonly implemented as an event log.
Nothing being discussed here around Event Sourcing is a new concept. These
ideas have existed and have been successfully implemented into numerous systems
for many . . . decades. They are only considered “new” today by people who
have not seen them previously. The majority of systems used to be written in
the same way.
For us older folks . . .
“See, this is our time to dance. It is our way of celebrating life. It’s the way it
was in the beginning. It’s the way it’s always been.” [2]
[1] https://devblogs.microsoft.com/oldnewthing/ [2] Ross, H. (1984). Footloose.
Paramount Pictures. [3] Parker T, Stone M, 3/6/2002 South Park s. 6 ep. 2
“Asspen” [4] Scott, T. (1986). Top Gun. Paramount Pictures.

You might also like