Professional Documents
Culture Documents
p70 Helland
p70 Helland
data ONLY
Life
Distributed
beyond
Transactions
An apostate’s
opinion
PAT HELLAND
T
ransactions are amazingly powerful mechanisms,
and I’ve spent the majority of my almost-40-year
career working on them. In 1982, I first worked
to provide transactions on the Tandem NonStop
System. This system had a mean time between
failures measured in years4 and included a geographically
distributed two-phase commit offering excellent
availability for strongly consistent transactions.
New innovations, including Google’s Spanner,2 offer
GOALS
This article focuses on how an application developer can
build a successful scalable enterprise application when
he or she has only a local database or transaction system
available. Availability is not addressed, merely scale and
correctness.
SOME ASSUMPTIONS
Consider the following three assumptions, which are
asserted and not justified. Assume these are true based on
experience.
1
FIGURE 1: Two-layered application
application
upper layer
of code scale-agnostic code
scale-agnostic API
lower layer
of code scale-aware code
Transactional Scopes
Lots of academic work has been done on the notion of
providing strongly consistent transactions over distributed
systems. This includes 2PC (two-phase commit),1 Paxos,5
and recently Raft.6 Classic 2PC will block when a machine
fails unless the coordinator and participants in the
transaction are fault tolerant in their own right, such as
the Tandem NonStop System. Paxos and Raft do not block
with node failures but do extra work coordinating, much
like Tandem’s system.
These algorithms can be described as providing strongly
consistent transactions over distributed systems. Their goal
is to allow arbitrary atomic updates to data spread over
OPINIONS TO BE JUSTIFIED
The nice thing about writing an opinion piece is that you can
express wild opinions. Here are a few that this article tries
to justify.
2
FIGURE 2: Data for an application comprises many entities
application
ENTITIES
This section examines the nature of entities in greater
depth.
3
cannot make assumptions about the location of the entity
because that would not be scale agnostic.
As shown in figure 3, entities are spread across
transactional scopes using either hashing or key-range
partitioning.
3
What in the past has been managed
The scale-agnostic automatically as alternate indices
application program must now be managed manually by the
can’t atomically update an application. Workflow-style updates via
entity and its alternate index. asynchronous messaging are all that is
The upper-layer scale-agnostic left. When you read data from alternate
application must be designed indices, you must understand that it is
to understand that alternate potentially out of sync with the entity
indices may be out of sync with itself. Alternate indices are now harder.
the entity accessed with its This is a fact of life in the big cruel world of
primary index (i.e., entity key). huge systems.
As shown in figure 4, different
keys (primary entity key MESSAGING ACROSS ENTITIES
versus alternate keys) cannot This section considers connecting
be collocated or updated independent entities using messages.
atomically. It examines naming, transactions and
messages, message-delivery semantics,
and the impact of repartitioning entities.
4
FIGURE 4: Under scale, alternate index entries land elsewhere
entity
indexed by
unique primary key
entity keys
indexed by
1st alternate key
entity keys
indexed by
2nd alternate key
5
FIGURE 5: Messages flow from one entity to another
sending-entity receiving-entity
transaction-T1 transaction-T2
Natural Idempotence
To accomplish idempotence, it is essential that the
message does not cause substantive side effects. Some
messages provoke no substantive work any time they are
processed. These are naturally idempotent.
A message that only reads some data from an entity is
naturally idempotent. What if the processing of a message
does change the entity but not in a substantive way? Those,
too, would be naturally idempotent.
Now it gets harder. The work implied by some messages
actually causes substantive changes. These messages are
not naturally idempotent. The application must include
mechanisms to ensure that these messages, too, are
idempotent. This means remembering in some fashion
that the message has been processed so that subsequent
attempts make no substantive change.
The next section considers the processing of messages
that are not naturally idempotent.
3
Consider the
processing of an
order consisting of many
items for purchase. Reserving
inventory for shipment of each
separate item will be a separate
application, you need to be very clear
about relationships. You can’t just do
a query to figure out what is related.
Everything must be formally knit together
using a web of two-party relationships.
The knitting is done with the entity keys.
activity. There will be an entity Because the partner is some distance
for the order and separate away, you have to formally manage your
entities for each item managed understanding of the partner’s state as
by the warehouse. Transactions new knowledge when the partner arrives.
cannot be assumed across The local information known about
these entities. a distant partner is referred to as an
Within the order, each activity. See figure 7.
inventory item will be
separately managed. The Ensuring At-Most-Once Acceptance
via Activities
messaging protocol must be
Processing messages that are not
separately managed. The per-
naturally idempotent requires ensuring
inventory-item data contained
that each message is processed at most
within the order entity is an
once (i.e., the substantive impact of the
activity. While it is not named
message must happen at most once). To
as such, this pattern frequently
do this, some unique characteristic of
exists in large-scale apps.
the message must be remembered to
entity-x
message A
7
message B
entity-B entity-A
entity-C
entity-D
Uncertainty at a Distance
The absence of distributed transactions means the
acceptance of uncertainty when attempting to come to
decisions across different entities. It is unavoidable that
decisions across distributed systems involve accepting
uncertainty for a while. When distributed transactions can
be used, that uncertainty is manifested in the locks held
on data and is managed by the transaction manager. In a
system that cannot count on distributed transactions, the
management of uncertainty must be implemented in the
business logic. The uncertainty of the outcome is held in
3
Uncertainty
Think about the style
Entities sometimes accept uncertainty
of interactions
as they interact with other entities.
common across businesses.
This uncertainty must be managed on
Contracts between businesses
a partner-by-partner basis and can be
include time commitments,
visualized as being reified in the activity
cancellation clauses, reserved
state for the partner. Many times,
resources, and much more.
uncertainty is represented by relationship.
Uncertainty is wrapped up in
It is necessary to track it by partner. The
the behavior of the business
activity tracks each partner as it advances
functionality. While more
into a new state.
complicated to implement than
using distributed transactions,
Performing Tentative Business
it is how the real world works...
Operations
Again, this is simply an
To reach an agreement across entities,
argument for workflow.
one entity asks another to accept
uncertainty. One entity sends another
a request that may be canceled later. This is called a
tentative operation. At the end of this step, one entity
agrees to abide by the wishes of the other.
3
Tentative Operations, Confirmation,
If an ordering system and Cancellation
reserves inventory Essential to a tentative operation is the
from a warehouse, the right to cancel. If the requesting entity
warehouse allocates the decides not to move forward, it issues
inventory without knowing a cancellation operation. If it decides
if it will be used. That is to move ahead, it issues a confirmation
accepting uncertainty. Later, operation.
the warehouse finds out if When an entity agrees to perform
the reserved inventory will a tentative operation, it agrees to let
be needed. This resolves the another entity decide the outcome.
uncertainty. It accepts uncertainty, which adds to
The warehouse inventory its confusion. As confirmations and
manager must keep relationship cancellations arrive, the uncertainty
data for each order encumbering decreases, reducing confusion. It is
its items. As it connects normal to proceed through life with ever
items and orders, these will increasing and decreasing uncertainty as
be organized by item. Each old problems get resolved and new ones
item keeps information about arrive in your lap.
outstanding orders for that item. Again, this is simply workflow, but it is
Each activity within the item fine-grained workflow with entities as the
(one per order) manages the participants.
uncertainty of the order.
Uncertainty and Almost-Infinite Scaling
The management of uncertainty usually
revolves around two-party agreements. There may be
multiple two-party agreements. These are knit together as
a web of fine-grained two-party agreements using entity
keys as the links and activities to track the known state of a
distant partner.
3
When considering almost-infinite
Consider a house scaling, it is interesting to think about two-
purchase and the party relationships. By building up from
relationships with the escrow two-party tentative/cancel/confirm (just
company. The buyer enters into like traditional workflow), you can see the
an agreement of trust with basis for achieving distributed agreement.
the escrow company, as do the Just as in the escrow company (see
seller, mortgage company, and sidebar), many entities may participate
all other parties involved in the in an agreement through composition.
transaction. Because the relationships are two-party,
When you go to sign papers the simple concept of an activity as “stuff
to buy a house, you do not I remember about that partner” becomes
know the outcome of the deal. a basis for managing enormous systems—
You accept that, until escrow even when the data is stored in entities
closes, you are uncertain. The and you don’t know where the entity lives.
only party with control over the You must assume it is far away. Still, it
decision-making is the escrow can be programmed in a scale-agnostic
company. way. Real-world almost-infinite scale
This is a hub-and-spoke applications would love the luxury of a
collection of two-party global transactional scope. Unfortunately,
relationships that are used to this is not readily available to most of
get a large set of parties to us without introducing fragility when a
agree without use of distributed system fails.
transactions. Instead, the management of the
uncertainty of the tentative work is
passed off into the hands of the developer of the scale-
agnostic application. It must be handled as reserved
inventory, allocations against credit lines, and other
application-specific concepts.
CONCLUSIONS
As usual, the computer industry is in flux.
Today, new design pressures are being foisted onto
programmers who simply want to solve business
problems. Their realities are taking them into a world
of almost-infinite scaling and forcing them into design
problems largely unrelated to the real business at hand.
This is true today, and it was true when this article was
first published in 2007. Unfortunately, programmers
striving to solve business goals such as e-commerce,
supply-chain management, financial, and health-care
applications increasingly need to think about scaling
without distributed transactions. Most developers simply
don’t have access to robust systems offering scalable
distributed transactions.
We are at a juncture where the patterns for building
these applications can be seen, but no one is yet applying
these patterns consistently. This article argues that these
nascent patterns can be applied more consistently in the
handcrafted development of applications designed for
almost-infinite scaling.
This article has introduced and named a couple of
formalisms emerging in large-scale applications:
• Entities are collections of named (keyed) data that may be
atomically updated within the entity but never atomically
updated across entities. • Activities consist of the collection
of state within the entities used to manage messaging
relationships with a single partner entity. Workflow to
reach decisions functions within activities within entities.
It is the fine-grained nature of workflow that is surprising
when looking at almost-infinite scaling.
References
1. B
ernstein, P. A., Hadzilacos, V., Goodman, N. 1987.
Concurrency Control and Recovery in Database Systems.
Boston, MA: Addison-Wesley.
2. Corbett, J. C., et al. 2012. Spanner: Google’s globally
distributed database. Tenth Usenix Symposium on
Operating Systems Design and Implementation (OSDI).
3. Dean, J., Ghemawat, S. 2004. MapReduce: simplified
data processing on large clusters. Sixth Symposium on
Operating Systems Design and Implementation (OSDI).
4. Gray, J. 1990. A census of Tandem System availability
between 1985 and 1990. IEEE Transactions on Reliability
39(4): 409-418.
5. L amport, L. 1998. The part-time parliament. ACM
Transactions on Computer Systems 16(2): 133-169.
6. Ongaro, D., Ousterhout, J. 2014. In search of an
understandable consensus algorithm (extended version);
https://raft.github.io/raft.pdf.
7. W
achter, H., Reuter, A. 1992. The ConTract Model. In
Database Transaction Models for Advanced Applications:
219-263. San Francisco, CA: Morgan Kaufmann.
Related articles
A Conversation with Bruce Lindsay
Designing for failure may be the key to success
http://queue.acm.org/detail.cfm?id=1036486
Condos and Clouds
Constraints in an environment empower the services.
By Pat Helland
http://queue.acm.org/detail.cfm?id=2398392
Microsoft’s Protocol Documentation Program:
Interoperability Testing at Scale
A discussion with Nico Kicillof, Wolfgang Grieskamp,
and Bob Binder
http://queue.acm.org/detail.cfm?id=1996412