Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Process-as-a-Service: FaaSt Stateful Computing with Optimized Data Planes

Marcin Copik Alexandru Calotoiu Rodrigo Bruno Roman Böhringer


ETH Zürich ETH Zürich INESC-ID, IST, ULisboa ETH Zürich
Torsten Hoefler
ETH Zürich

Abstract itations of its execution model [4, 10–12].


As an example of a trending workload in serverless
Fine-grained and ephemeral functions power many new
systems, we consider distributed machine learning so-
applications that benefit from elastic scaling and lower
lutions [4, 13–16]. There, each invoked function must
computing costs of serverless platforms. However, they
compute new gradients using a subset of data. While
are hampered by expensive and limited communication,
the stateless nature of FaaS simplifies deployment and
and high invocation latency. Functions cannot keep track
resource management, combining the new weights and
of state across invocations and must rely on remote stor-
using them in a next round of invocations incurs major
age, making workflows and applications with dependen-
performance overheads as in each round of updates the
cies between tasks, or relying on communication be-
output must be written and read from external cloud stor-
tween workers, difficult to implement using Function-as-
age, and each new round requires an invocation that goes
a-Service (FaaS) computing.
through the entire cloud control plane.
To continue the serverless revolution, we introduce the
Without direct communication channels, a service like
concept of Process as a Service (PraaS) and show how
serverless federated machine learning is difficult to im-
established operating system abstractions can be adapted
plement and inefficient. The lack of state management in
to model and implement dynamically provisioned cloud
ephemeral functions [10, 12] can partially be addressed
computing workers. We present the new serverless data
with workarounds such as storage-based shared states,
plane that improves invocation performance by up to
transactions, and logs [17–20]. However, these solutions
32 times while preserving the ephemeral and elastic na-
rely on manually managed non-serverless storage in-
ture of serverless workers. Finally, we build a unified
stances, and they negatively impact the execution times,
and portable communication interface for serverless, en-
throughput, and cost of serverless systems [12, 20]. The
abling optimized peer–to–peer communication allowing
performance of such systems could be vastly improved
workers to solve problems in parallel. to save costs.
by taking advantage of local available memory rather
than constantly needing to access remote, slow storage
1 Introduction and by having the capability to bypass the control plane
for repeated invocations.
In less than a decade, Function-as-a-Service (FaaS), has We propose a new paradigm in the serverless world:
established itself as one of the fundamental programming Process-as-a-Service (PraaS) — we incorporate state
models in the cloud. In FaaS, users invoke short-running seamlessly, preserve the ephemeral nature of FaaS com-
and stateless functions and benefit from the pay–as–you– putations, and overhaul the serverless system architec-
go billing and lower costs, while cloud providers gain ture. PraaS is heavily inspired by classical Operating
more efficient resource usage and opportunities to reuse System (OS) design and transfers concepts that stood the
idle hardware [1–3]. Serverless functions, the computing test of time into the context of granular cloud computing.
model FaaS is based on, have been used in web applica- Specifically, this new model allows us to address three of
tions, media processing, data analytics, machine learn- the most significant constraints of current FaaS systems:
ing, and scientific computing [4–9]. Even though FaaS lack of consistent and low-latency state, a complex con-
has achieved remarkable success in decreasing the costs trol plane involved in each invocation, and a lack of uni-
of stateless computations and microservices, the wide fied and portable communication interface.
adoption of serverless computing is hindered by the lim- PraaS is a new step in the evolution of cloud com-
IaaS CaaS PraaS FaaS
Computation Unit Virtual machine Container Process Function
External Interface SSH, TCP, HTTP, RPC, RDMA SSH, TCP, HTTP, RPC HTTP, TCP HTTP
Lifetime Months Days, hours Minutes, hours Seconds
State Duration Persistent Transient Transient Ephemeral
State Location Local disk, memory Memory, cloud storage Memory, cloud storage Cloud storage
Provisioning Manual, minutes Semi-automatic, secs Automatic, msecs Automatic, msecs
Compute Resources Persistent Persistent Ephemeral Ephemeral
Billing Provisioned Provisioned Pay-as-you-go Pay-as-you-go

Figure 1: Evolution of computing platforms in the cloud - PraaS enables state persistence for ephemeral workers.

puting that provides a better balance between the per- munication model for serverless that adjusts the require-
formance of persistent allocations and elasticity of ments of direct messaging to ephemeral functions and
ephemeral workers (Fig. 1). It introduces the concept enables fast communication in a portable and abstract
of a session, an execution environment that retains local interface effectively replacing the functions traditionally
state across invocations, and that exposes a communica- provided by the control plane: direct invocations using
tion channel so a user can send data and requests directly. PraaS are 32 times faster than using AWS Lambda, for
Processes open the possibility for direct warm func- both small and large payload sizes.
tion invocation. In traditional FaaS platforms, ev- To summarize, we make the following contributions:
ery invocation traverses the centralized (and multi- • The new concept of serverless processes that pro-
step) allocation and routing logic in the control plane vides an efficient model for stateful FaaS functions.
(Sec. 2.1). Moreover, serverless functions are mostly • An optimized serverless data plane with fast and
short-running [21] and the increasing availability of fast direct invocations for scalable workflows and high-
network transport puts further pressure on optimizing the performance FaaS.
serverless control plane involved in every invocation. In • A global memory model for serverless functions
PraaS, communication channels exposed by sessions al- that permits optimized function–to–function com-
low us to expose a direct connection to a function execut- munication and provides an interface portable
ing within a session. Thus, we propose a process data across systems and cloud providers.
plane that offers low latency warm executions over a di-
rect connection to function sandbox. Instead of apply- 2 Motivation
ing optimizations to decrease the control overheads, we
remove the control plane overheads from the data path T Cloud
entirely [22] by submitting invocation payload directly Gateway Controller Events
to a process instance. The new data plane we propose Load Balancer M
enables parallel and granular computing with serverless H Metadata
functions [23, 24], where the relative influence of invo-
O W C
cation overheads is very high.
Sandbox
A further benefit of PraaS is the ability to move data Function Server Function Server
directly between sessions within a single process. This
allows users to deploy large-scale applications such as Figure 2: FaaS control plane: each invocation includes
distributed ML that can communicate directly by access-
the management overhead ( M ).
ing state memory. Until now, inter-function commu-
nication has been constrained, forcing users to imple- Function-as-a-Service (FaaS) enhanced clouds with a
ment standard communication patterns on top of cloud fine-grained and elastic programming model and FaaS
storage [4, 8] and introducing additional costs, non- has found its way to the major cloud systems [25–28].
negligible latencies, or requiring non-serverless or even Functions are stateless, and invocations cannot rely on
manually managed storage services. Inspired by par- resources and data resulting from previous executions.
titioned global address space (PGAS), we propose a Instead of using persistent and user-controlled virtual
global memory model for serverless functions backed machines and containers, function instances are placed
by reliable and persistent storage. We provide a com- dynamically in cloud-managed sandboxes, e.g., contain-

2
ers and lightweight virtual machines [29]. A cold invo- Storage Type 1B 1 MB 10 MB
cation requires allocation of a new sandbox adding sig- Persistent storage (AWS S3) 10.3 20.4 102.4
nificant overhead to the invocation latency [11, 30]. The Key-value storage (AWS DynamoDB) 34 n/a n/a
consecutive warm invocations can benefit from reusing In-memory cache (AWS ElastiCache) 0.9 24.6 84.6
sandboxes by a preceding cold invocation, achieving
lower invocation latency. Therefore, cloud systems at- Table 1: Access time [ms] to remote cloud storage from
tempt to retain virtual machines and containers after an AWS Lambda Function with 2 GiB RAM.
the invocation, employing complex and sophisticated re-
tain and prewarming strategies [31–34], and trading off in distributed applications and serverless workflows that
higher memory consumption for faster executions. In invoke functions many times with large payloads. The
addition, the flexible pay-as-you-go billing is another optimization effort of serverless workflows is guided to-
significant advantage of serverless systems: users are wards reusing function instances and provisioning ded-
charged only for computation time and resources used. icated stateful executors [39–42], but does not offer
However, along with the benefits serverless computing a solution can be generalized. Alternative approaches
provides, it does have some disadvantages such as higher include decentralized scheduling and direct allocations
latencies due to complex control planes, poor locality of [3, 39], but they are limited to systems optimized for
data due to the nonexistence of local state, and high com- serverless workflows and RDMA acceleration.
munication costs. [10, 11, 35, 36]. Observation: Every serverless invocation incurs
overheads for authentication, resource management, and
data copying across cloud services. However, the func-
2.1 Serverless Control Plane tion placement does not change in many warm invoca-
The serverless execution model requires that the func- tions, making the control logic unnecessary.
tion executors are allocated dynamically, and users do
not provision cloud resources. In practice, most modern 2.2 Serverless State
FaaS systems implement this as a centralized manage-
ment and routing system that moves user data to a func- The stateless nature of functions makes them easy to
tion instance. Figure 2 presents a high-level overview schedule for cloud providers, but puts significant con-
of the serverless control plane. It includes an abstrac- straint on the programmability FaaS systems. Comput-
tion of a REST API and a gateway with a persistent ing resources are allocated with ephemeral memory stor-
network address, and uses an HTTP connection ( H ) to age that cannot be guaranteed to persist across invoca-
hide the on-the-fly selection and allocation of function tions. Since many applications require retaining state be-
executors. Cloud events and gateway requests trigger tween invocations, stateful functions place their state in
( T ) function invocations. The function’s input is for- remote cloud storage [18–20]. While function’s state is
warded to the central management ( M ) responsible for located in storage far away from the compute resources,
authorization, allocation of resources, and routing execu- fetching and updating state adds dozens of milliseconds
tion to the selected server. In AWS Lambda, the control of latency to the execution (Table 1), resulting in signifi-
logic is responsible for authorizing requests, managing cant performance overheads [19, 20].
sandbox instances, and placing the execution on a cloud Relying on remote storage to implement stateful func-
server [29]. In OpenWhisk [37], the critical path of func- tions requires every invocation to incur into storage over-
tion execution is even longer. Each invocation includes heads, even if function sandboxes are kept warm. With
a frontend webserver, controller, database, load balancer, the current FaaS model, it is not possible to overcome the
and a message queue [38]. Finally, the function’s data remote storage access from the data path.
moves to a warm ( W ) or cold ( C ) sandbox. When the Observation Restricting serverless computing to
execution is finished, the function’s output returns to the stateless functions puts unnecessary constraints on user
user via the gateway ( O ). applications prevents data locality. Stateful functions
The many steps of control logic add double-digit mil- help mitigate some of these challenges, but their efficient
lisecond latency to each invocation and require copying implementation is hindered by in the current FaaS model.
user payload multiple times, even though serverless sys-
tems optimize cold startups, and subsequent invocations 2.3 Serverless Communication
reuse the same warm sandbox when available. Over-
heads of the control plane can dominate the execution Communication in FaaS has always been constrained.
time, and are much higher than the network transmis- The message-passing paradigm cannot be applied di-
sion time needed to transfer input arguments [11]. The rectly to the ephemeral functions, as workers’ lifetime is
long and complex invocation path is even more visible not defined, and functions require mailboxes to store and

3
500
PraaS Concept Inspiration

Communication Time [ms]


2
10
Throughput [MB/s]
400 Process POSIX process model.
300 1
Function Thread in a process.
10
State POSIX process memory.
200 Session Persistence Swapping memory pages.
0
10 Communication channel Socket in a process.
100
Communication model Remote memory access, TCP
0 10
1 Network data plane in Arrakis [22],
Data plane
10
3
10
0
10
1
10
2
10
3
10
4
10
5 kernel bypass in RDMA.
Data Size [KB]
Table 2: In PraaS, we show how the major concepts of
Figure 3: Latency and throughput for peer-to-peer TCP systems design can be used to lift the process model into
communication between Lambda functions. a distributed and serverless space.
Control Plane
access messages. Unfortunately, the widely-used com-
munication pattern with persistent cloud storage adds 1 PraaS Process
Subprocess 0
non-trivial latencies and costs [10]. While direct net-
Session Controller
work communication between functions could alleviate
Session 0 Session 1
the performance issues (Fig. 3), functions do not offer
Function Function Function Function
the abstractions needed to implement communication on
2
groups of functions with dynamic group membership. Object Object Object Object Object Object
Furthermore, in typical FaaS deployments, functions op- 3 Session Memory Session Memory
erate behind NAT and cannot accept incoming connec- Communication Communication
Channel Channel
tions. UDP and TCP hole punching [43, 44] must be
applied to establish communication between function in-
stances. Not only is it a complex process, but the connec-
tions can only be created between two active functions - Figure 4: The process model in PraaS: ephemeral
the programming model of FaaS does not provide mail- functions are executed in the context of sessions with
boxes needed to send messages reliably. shared and transient state and communication channels
Observation Cheap and scalable communication is for quick user access.
necessary to build parallel and distributed applications,
but serverless programming models lack an efficient fab- PraaS functions implement the semantics of
ric and interface for communication. ephemeral workers known from FaaS systems. Our
goal is to allow quick connections and offer a simple
way to manage a function’s state. To achieve this goal,
Summary FaaS can be much more than just a platform each process function invocation is associated with a
for irregular and lightweight workloads, and the server- process session. When process sessions are allocated
less programming model aligns well with requirements within a process, they are assigned a unique identifier,
for massive parallel and granular computing [23, 24]. a communication channel, and a user-defined amount
Nevertheless, serverless systems must first overcome of memory. After the first invocation is hosted in a
critical limitations: complex control plane involved in session, the communication channel is returned to the
every invocation, lack of a fast and cheap function state, user ( 1 ). Subsequent invocations can then bypass the
and expensive, storage-based communication. control plane and use the PraaS data plane ( 2 ). This
allows functions to use data stored by previous function
invocations in the session state, and communicate
3 PraaS: Process as a Service with other concurrent functions running in the same
session( 3 ).
In this section, we present the process as a service model
starting with the core design overview and then providing
details for its features and components. A process (Fig- 3.1 Session and State
ure 4) is a natural extension of ephemeral functions with
a decoupled and transient state. A process can support In FaaS, in order to increase performance the aim is to
multiple users, isolated from each other if necessary, as- retain containers and decrease the frequency of cold star-
sociated with one tenant. Multiple sessions within PraaS tups. Similarly, in PraaS, we enable opportunistic opti-
can cooperate using globally accessible process mem- mizations with session-based direct invocations while al-
ory,using methods described in Sec. 4.1. lowing cloud providers to scale resources transparently.

4
Sessions encompass a unique identifier, the communica- Furthermore, a process is considered active when it
tion channel used to route invocations to already warm has active sessions, and we will not evict a process un-
containers, and the transient state memory that is avail- til all of its sessions are swapped out. Sessions provide
able to functions. This memory is guaranteed to be lo- a natural way of defining process lifetime and selecting
cally accessible to functions invoked in the context of process containers for eviction. By default, when a ses-
the session, ensuring fast access. For this reason, ses- sion is swapped out to storage by the session controller,
sion memory is statically allocated at creation. Sessions all current state information is written to persistent cloud
can dynamically allocate additional process memory on storage under a unique session ID. If a session is no
demand (Sec. 4.1). longer required, the user can also completely remove a
session along with any associated state information.
Isolation and Sharing Sessions allow PraaS to handle Users have the option to create user-defined check-
workloads associated with different users and workflows pointing handlers that define how a sessions’ state should
in a single process instance. Each session has its own be stored into persistent and reliable cloud storage with
private memory that is isolated from other sessions (each the session ID to minimize the storage required.
session has its own page table). When a function is in- When a function is invoked, the user can include
voked, it can only access memory within it’s session, as the session identification to recover from a saved state.
shown in Figure 4 - thus functions within a session can When a new process is allocated, function invocations
share memory and cooperate. The user is responsible for can be queued along with their session IDs, such that
avoiding data races (e.g. locking). Functions cannot ac- the sandbox initialization can be overlapped with down-
cess the memory of a different session directly, but can loading the state from storage, helping to hide the startup
cooperate with functions within other sessions by using latency for stateful functions.
global process memory, which we detail in Sec. 4.1.
Sessions can handle multiple invocations simultane-
ously, but are bound by the memory and compute limi- 3.2 Function Invocation Handling
tations of the system they run on. Cloud providers can
PraaS functions exist in the context of sessions, and it is
limit the number of simultaneous invocations an individ-
the session they are associated with that determines how
ual session can start, for example in relation to the mem-
they are handled. If the session is active, the user con-
ory the session has allocated. To allow larger workflows,
nects directly to the correct communication channel, by-
we discuss scalable management and communication of
passing the control plane altogether and ensuring a zero-
multiple sessions within a process in Sec. 4.1.
copy path from the user to function memory.
Example We consider the widely-used pattern of us-
If the process is not running that session, the control
ing FaaS where persistent and IaaS-based applications
plane is involved: the system then checks whether state
offload computational workload to cheap and elastic
information matching that session already exists in stor-
serverless accelerators [3, 45]. A user can employ the
age. If yes, the session is swapped in and the invocation
same session to invoke functions, and these functions
can use all information stored in the session. Otherwise,
will access the same shared memory pool. The session
a new session is allocated. In both of these instances, the
model in PraaS allows consecutive or concurrent invoca-
communication channel information that allows fast sub-
tions to always access the warm and initialized applica-
sequent invocations is forwarded to the user. We multi-
tions’ state. Furthermore, sessions enable enhanced data
ple functions can share a communication channel within
sharing, e.g., reusing a large machine learning model
a session via multiplexing.
across many inference functions, allowing for local com-
munication and decreasing management overhead.
Local dispatch We allow users to specify a direct de-
Swapping. Sessions are not persistent: their lifetime pendency on the result of one or many previous invoca-
is limited and they can be removed at any point by the tions in each execution request. The execution request
cloud provider. However, sessions are virtualized: their is stalled when there is a match between specified de-
memory is shared across invocations within a single ses- pendencies and active invocations. Co-locating invoca-
sion and it does not perish once the sandbox - container tions is an important optimization technique in server-
or a virtual machine - is removed. less workflows [41], and the local encoding of depen-
Instead, sessions are swapped out to persistent cloud dencies avoids interactions with workflow orchestrators.
storage - and the standard policy is to swap sessions after Furthermore, dependencies allow for offloading func-
a period of inactivity. Within a process, selecting differ- tion invocations while preserving a first-in, first-out or-
ent sessions allow users to isolate their private data from der of invocations. To implement FIFO executions in the
other users even if belonging to the same tenant. current serverless model, users must manually wait for

5
Control Plane lect and optionally allocate resources, and redirect users’
1 PraaS Process payload to the function executor. When many execution
Subprocess 0 2 3 requests are redirected to the same warm container, the
Session Controller repeated control operations are redundant and could then
Session 0 Function
be removed from consecutive invocations. Therefore, for
4 Communication as long as the authorization stays valid, the invocation
Channel can bypass the control operations entirely and move data
directly from the user to the function executor in a pro-
Figure 5: PraaS data plane: bypassing the control plane cess.
allows for zero-copy invocations. In PraaS, the first time a function is invoked ( 1 ), the
control plane notifies the session controller on an avail-
the previous invocation or use a FIFO queue with func-
able subprocess to allocate a session ( 2 ), and the server-
tion triggers. Both options incur additional latencies and
less system performs all configuration and authorization
costs. Meanwhile, PraaS function dependencies allow
tasks at the point. The user receives connection informa-
dispatching a pipeline of functions with minimal invo-
tion to be able to easily and quickly reconnect ( 3 ). Each
cation latency, keep function code integrity intact, and
consecutive function in the same session is invoked via a
avoid unnecessary fusions.
direct connection to the process ( 4 ), as long as the ses-
Multi-source functions Serverless functions can be sion stays alive. Function invocation received data from
seen as unary operations, but the practice has shown that the user to the session’s state memory, and the zero-copy
many aggregation and workflow functions receive input approach helps to achieve high throughput on larger pay-
from more than one function predecessor. loads. Thus, invocation latency is bounded only by the
In PraaS, we provide native support for functions with network fabric and the performance of function thread
a variable number of arguments. Since the session state allocation.
is decoupled from function workers, the state can effi-
ciently receive and store function arguments from multi-
ple producers. Thus, PraaS can start a multi-source func- 3.4 Scaling Sessions
tion as soon as all dependencies are fulfilled. The multi- The PraaS process is a virtual, distributed entity. To en-
source functions can be used in serverless workflows to sure a practical, efficient implementation we introduce
efficiently implement data flow between functions. subprocesses, representing the actual containers used to
Example Aggregation functions are used to imple- host sessions. Subprocesses are invisible to users, and
ment the reduce and allreduce functionalities in server- are automatically managed by the cloud provider. The
less workloads [4, 8, 46]. Each aggregation function re- cloud provider can spawn additional subprocesses to
ceives input from more than one predecessor function. In handle an increased load if the number of concurrent
FaaS, users must use workflow orchestrators and cloud sessions within a process is high, can remove subpro-
storage triggers to handle aggregation, as functions can- cesses with no active sessions, and even compact ses-
not be invoked simultaneously. FaaS lacks support for sions from multiple subprocesses unto one subprocess to
more complex logic, i.e., functions are invoked on each improve resource utilization. For this purpose, the pro-
new file in the storage, and they must manually check cess also contains a global controller keeping track of
if all arguments have already been placed in the storage. subprocesses along with the sessions they hosted.
In PraaS, users define an n-ary reduce function instead. Migration & Compaction. Sessions distributed
When clients invoke the reduce function through a sin- across subprocesses to allow for better load balancing
gle session, they attach the same invocation identifica- and resource retrieval by the cloud provider. Clients with
tion, and the process controller counts the delivered ar- active connections to a session being migrated receive a
guments. The function is invoked as soon as all input packet with migration details, and they can establish a
arguments arrive. The reduction is computed within a connection to the new communication channel.
single process, saving the costs and latencies introduced
by cloud storage.
4 Serverless Communication in PraaS
3.3 PraaS Data Plane
Serverless computing has been adopted to support dis-
PraaS helps to minimize FaaS latencies with the new tributed applications and workflows, but the limitations
concept of a serverless data plane optimized for low- of the FaaS programming model require complex and
latency and high-throughput invocations (Fig. 5). A domain-specific optimizations in scheduling and com-
full serverless invocation needs to authorize requests, se- munication [47–49]. Most advances in serverless com-

6
puting focus on optimizing north-south communication Global Memory Controller

rather than east-west communication. PraaS Process 2


Subprocess 0 Subprocess 1
FaaS applications include a few communication pat- Local Memory Controller Local Memory Controller
terns - message-passing, usually implemented as propa- Process Memory
0
Process Memory
1 Object Object
gation of a function’s output to the input of a consecutive
Session 0 Session 1 3 Session 2
invocation [49]; an aggregation of the outputs from many Function Function Function
functions [46, 50], and using data cached in nearby in-
stances for bypassing storage and faster access [42, 51].
Figure 6: PraaS Global Memory.
In contrast, PraaS provides global memory, an effi-
cient and direct communication model for serverless ap-
Communication Storage-based communication pat-
plications (Sec. 4.1) bypassing cloud storage but backed
terns are slow and expensive [10, 53, 54], but the
up by it. Our communication model does not concentrate
function–to–function communication is difficult to main-
on process invocations since they have a limited lifetime,
tain in FaaS due to the ephemeral nature of workers.
but it is focused on data movement operations, allow-
ing dynamically reshapable applications to benefit from Sessions can communicate with other sessions by
peer–to–peer communication. reading and writing memory objects in the global pro-
cess memory ( 1 ). To establish such a communication
channel, a session requests the location of an object from
4.1 Global Memory the global communication controller ( 2 ). The global
communication controller instructs the local memory
In PraaS, we provide a global communication abstrac- controller on the subprocess where the object is hosted
tion layer to enable optimized process session-to-session and the originating session to create the communication
data exchange and hide the complexity of serverless channel ( 3 ). Therefore, any such operation does not
communication. This communication layer is inspired require sender and receiver parties to push updates to ex-
by MPI Remote Memory Access [52]. We present the ternal services and poll for changes, respectively. Since
overview of the global memory system in Figure 6. Apart the global memory objects identify communication tar-
from the memory sessions can privately use, which is gets, we establish message-passing without identifying
guaranteed to be located on the subprocess where the ses- and enumerating ephemeral and unreliable functions.
sion is hosted, sessions can also write to global memory. Furthermore, the communication replaces storage read
The system provides no guarantees on which subprocess operations with sharing objects between process sessions
a region of global memory is allocated. The global con- since p2p communication achieves higher bandwidths
troller stores the location of global memory objects. The and lower costs.
global memory is backed up by persistent cloud storage.
Example We consider a group of functions that com-
Memory objects are accessed through a memory inter-
pute partial results, apply a global reduction to synchro-
face which hides data operations conducted with storage
nize the progress, and continue with the next batch of
and peer–to–peer communication.
tasks, as is the case in the distributed machine learn-
ing training [4]. In FaaS, functions must resort to stor-
Global controller The global memory layout imple- age to communicate results. Furthermore, the lack of
ments the message-passing paradigm in serverless by re- state means that functions are invoked again for the it-
placing storage operations with direct communication. eration and users accept the penalty of reinitialization, or
We include a global controller service as an external, per- functions are artificially kept alive to keep the data local-
sistent component of the system (Fig. 6), and make it re- ity and they incur costs from unnecessary waiting in the
sponsible for monitoring global memory object location communication round. The PraaS model provides an al-
across subprocesses and relaying requests from sessions ternative that is more affordable and more efficient at the
to local memory controllers within subprocesses. same time: functions store objects corresponding to their
partial results in the shared session or process memory,
allowing them to move reduction data directly to the des-
Interface Process functions interact with the global
tination, and they stop incurring computing charges im-
process memory using a simple interface that resembles
mediately afterward. However, since session state is not
memory addressing operations in OS-level processes. In
immediately evicted, functions benefit from the warm
particular, read and write operations accept an identifier
environment and better data locality in the next iteration.
from a global namespace (think of the address space in
OS-level processes). It is also possible to allocate and
de-allocate global memory using an interface similar to Dynamic subprocess membership In PraaS, changes
malloc and free. to the subprocess configuration can be triggered by the

7
cloud provider to scale the system up or down according system varies. To run and manage containers running
to demand. Each change must be atomic to ensure the va- these subprocesses, we deploy executor servers that are
lidity of exchanged messages. The global memory con- connected to the control plane. The executor server ful-
troller stores where global memory objects are located fills requests to create additional subprocesses if the sys-
across all subprocessses, and this information is updated tem load is high, or requests to compact sessions from
when the overall subprocess configuration changes. We multiple subprocesses with a low load. The subprocesses
define this as an epoch change, and results in a two step can be allocated to servers in a round-robin fashion, but
operation. The controller announces to all subprocesses other optimized allocation policies can be implemented.
that a configuration change is necessary, the subpro- Our system does not have any restrictions in this regard,
cesses notify their sessions. Once all outstanding com- so low-latency schedulers like the one in Lambda can be
munication of the last epoch is handled the sessions no- supported [29]. The executor server can take the form of
tify their subprocesses. Then, the subprocesses notify the either a VM or a serverless function.
controller, the configuration is changed and the subpro-
cesses and their sessions are then notified of the location Subprocess The subprocess runs a session controller,
of all subprocess sessions. Thus, each change in status that accepts requests from the control plane and man-
begins a new epoch, and the global memory controller ages local sessions. When receiving an allocation re-
distributes epoch change announcements to all partici- quest, it forks a new session. It also reports sessions
pating subprocesses. The change notification allows ap- being swapped out or deleted to the control plane. The
plications to readjust messaging policies gracefully and subprocess also runs the local memory controller, which
avoid data failures in the presence of ephemeral workers. we detail in Sec. 5.3.

5 PraaS in Practice 5.2 Session


In this section we will discuss how our proposed design The main components of sessions are locally available
can be implemented in practice, focusing on four major memory that can be swapped out to cloud storage (AWS
aspects: the container serving our runtime, the control S3 in our implementation), and the data plane TCP con-
plane, how global memory works and the client library. nection. Invocation payload is received directly into
memory and can be accessed by the invoked function.
To run user code, a thread pool dispatches the invoked
5.1 Control Plane
functions. Each session has an upper cap for the number
The main responsibility of the control plane is to accept of active functions that can be set by the cloud provider,
user requests and manage processes and sessions. The for example in relation to memory used - limiting the
outward facing part of the system is an HTTP frontend thread pool to N threads. Functions queued beyond this
with the following functionality: number will wait until resources become available.
• allocate session Creates a new session. The allo- Sessions make usage metrics available to the session
cation must specify a maximum amount of memory controller via session memory. The controller uses these
the session can use. The session is hosted on a sub- data plane invocation metrics to decide when sessions are
process with enough resources to support it; no longer in use and can be swapped out. A session also
• retrieve session Loads session state from storage; shuts down when the user closes the connection to it on
• invoke Allocates a new session or retrieves an exist- her side.
ing one and passes the function invocation payload;
• support for process management such as creating Code Deployment We propose that users define con-
a new process or deleting one. tainers with both code and libraries necessary to deploy
Processes, sessions and invocation are all uniquely in the PraaS system. This container is extended by our
identified, and have a hierarchical structure: invocations monitoring system and runs with restricted permissions,
are hosted on sessions, sessions are hosted on subpro- similar to Lambda C++ containers. Only the PraaS con-
cesses and a group of subprocesses forms a process. troller runs with root privileges.
The control plane relies on an in-memory Redis cache
to store information about which sessions belong to
which process as well as which sessions are live and 5.3 Global Memory
which are swapped. To implement a global view of memory physically dis-
tributed across subprocess, each subprocess runs a local
Execution backend Subprocesses are managed by the memory controller that acts as a simple server to an-
system, and scaled up and down as the load on the PraaS swer session’s requests for create, put, get, and delete

8
operations of objects. While the interface is simple, the Warm AWS Lambda up to 32x slower.
mechanism behind it is more involved: when a session 102

RTT time [ms]


requests an object, it will first request it from the lo-
cal memory controller. If the object is not on the same
subprocess, a request to the global memory controller is 101 Near-native TCP performance thanks
made, and a NAT hole punching operation begins. The to bypassing control plane.
global memory controller runs a page table with infor- 100
mation linking objects to subprocesses composed of tu-
ples like {Object name, Location}. The global memory 20 21 22 23 24 25 26 27 28 210 212 214 216 218 220221222
informs the local memory controller owning the object Message size [bytes]
AWS Lambda PraaS TCP/IP
to finalize the connection using NAT hole punching on
its side. Once the connection exists, future communi-
Figure 7: PraaS vs FaaS: bypassing control plane.
cation can occur without involving the global memory
controller.
demonstrate PraaS with a prototype implementation on
Knative [55], a serverless platform deployed with Ku-
5.4 Billing and Accounting bernetes [56] (Sec. 5.6). However, our design with mod-
ified containers and scheduling policies applies to other
Similarly to current FaaS and IaaS platforms, PraaS al- platforms as well, and the implementation can be ported
lows users to create sessions with a predefined amount of seamlessly to other systems.
memory and CPU resources. In addition specifying the The changes to existing platforms encompass multi-
resources for local sessions, users also need to indicate ple aspects. The scheduling must be adjusted to use
how large the process memory should be. To make this an in-memory cache that checks for process and session
task simpler, PraaS allows users to specify process mem- IDs to make decisions. The subprocess scaling decision
ory increments per session so that the total process mem- must take data plane metrics provided by the sessions,
ory grows linearly with the number of sessions. Separat- and not eliminate subprocesses with active sessions. k8s
ing session memory from process memory allocation re- supports scaling down by removing selected containers
sults in better resource utilization and reduces the amount through the mechanism of deletion cost. Finally, our
paid for unused resources [11]. Finally, to support ses- control plane can be deployed to provide to provide the
sion swapping, PraaS also charges users for the storage session-process API, and uses the kubernetes API to al-
required to save swapped sessions. locate containers when needed.
Invocation accounting is performed by subprocesses,
which collect usage metrics across their lifetime for com-
putation, memory usage and I/O. We use the Docker con- 6 Evaluation
tainer metrics for this purpose on the subprocesses and
collate them across all subprocesses to allow for both re- In this section, we evaluate PraaS. In our experiments,
mote observability and granular billing. we always repeat each measurement at least 100 times,
and the plot median along with the 99% non-parametric
CI. In each part of the evaluation we focus providing an-
5.5 Client Library swers to relevant questions about PraaS.
The client library implements the user-end of HTTP re-
quests to manage processes and sessions and binary data
6.1 Data Plane
is transmitted as payload over TCP connections. Clients
can serialize the data using their choice of procedure. A Does the data plane improve invocation latency?
further feature of the client library is the possibility for Our approach allows fast direct invocations by using
the user to implement handlers to be executed if opera- an optimized serverless data plane. To evaluate this
tions time-out or objects or sessions are not available. claim, we compare the round-trip time for processing
an invocation with various payload sized in PraaS to
5.6 Implementing PraaS processing the same invocation on AWS Lambda. The
benchmark is executed from a t3.medium VM and in-
We implement PraaS as a group of extensions on top vokes warm AWS Lambda and PraaS functions. The
of existing CaaS and FaaS platforms to facilitate wide results in Fig. 7 show a consistent, significant benefit
adoption in existing cloud systems. Cloud providers au- for using PraaS, as AWS Lambda is approximately 32
tomatically manage a fleet of process containers running times slower. AWS Lambda overheads increase with
the dedicated PraaS worker (Sec. 5.1). In particular, we payload size quite significantly, which suggests that mul-

9
1000 Function I/O, Lambda Storage, S3 Allocation, Clean Swapping, Disk
Session, PraaS 200 Restoring, Disk Swapping, S3
Restoring, S3
150
Time [ms]

100

Time [ms]
Poor storage performance for
non-trivial data movement. 100
10 PraaS functions are bounded by TCP
and shared memory performance. 50
1 0
20 21 22 23 24 25 26 27 28 210 212 214 216 218 220221222 4 8 16 32 64 128 256 512 1024 2048 4096 8192
10240
Payload size [bytes] Session size [bytes]
Figure 8: PraaS vs FaaS: comm. between functions. Figure 9: Overhead of session allocation and swapping.

tiple copies of the payload on the critical path, which We swap out sessions either to S3 storage or to a
makes large-payload invocations increasingly slow. gp2 instance of Elastic Block Storage attached to EC2
Furthermore, we show that PraaS has effectively no (a general purpose SSD). In the latter version, the cloud
overhead by providing the TCP baseline gathered us- provider can then swap the session to the disk attached to
ing netperf for communicating the payload itself between the virtual machine to remove the overhead of swapping
the user and the executor server. Bypassing the control from the critical path, and store the swapped session to
plane therefore provides a major improvement in round S3 storage asynchronously. S3 has very high durability,
trip time for function invocations. the EBS has annual failure rate of 0.1-0.2% [57].
This has the further benefit that sessions can be re-
6.2 Fast function communication stored quickly when restoring from the disk. Restor-
ing a session has a lower latency than allocating a con-
Does the use of session memory for communication tainer [58, 59] and can be done concurrently, effectively
between functions improve latency? hiding the time required to load the state from memory.
An important concept in serverless workflows is As can be seen in Fig. 9, restoring a session from disk
chaining functions to pass the output of one as input to takes the same time as allocating a fresh session, and
the other one. We claim that using session memory is even restoring from S2 is done in less then 150ms, even
a fast way to allow for functions to communicate. We for the largest session sizes. Swapping out a session also
evaluate this by launching a function with varied input only takes between 25 and 200 ms, depending on session
size, passing the result of to another function, which in size. Therefore, session swapping does not introduce
turn returns data to the user. For this purpose, Dynamo significant overheads.
DB is expensive and too limited (400 kB max element
size) so cannot be considered an option. The options
6.4 Global Memory
left are to either resort to accessing S3 storage or us-
ing Lambda’s function I/O. Figure 8 shows that using Does communication over global process memory im-
PraaS session memory for function communications is prove performance over using S3 storage?
significantly faster by two orders of magnitude then us- We evaluate the communication between sessions us-
ing S3 storage for any data size, and about ten times ing global process memory by comparing its perfor-
faster than Lambda function I/O. The results are not sur- mance to Lambda functions communication over S3, Re-
prising, and underline the need for fast, local commu- dis or even establishing direct communication channels.
nication rather than reliance on cloud storage solutions All measurements were performed on a Lambda instance
for function communication. Fast communication be- with 2 GB of memory. For PraaS, we measure a function
tween functions using session memory as a medium is performing get and put operations of the same. We dis-
much faster than existing alternatives. tinguish between the warm local scenario - data is on the
local memory controller and the warm remote - data is on
6.3 Session Management another subprocess on a different machine. The median
ping to the other machine was 350 microseconds.
Does swapping sessions incur high overheads? The result for both bandwidth (Fig. 10a) and latency
To be practically useful, our system for swapping ses- (Fig. 10b) show clear benefits for using PraaS. The only
sion in and out of storage must be efficient. To evaluate solution that shows similar results is the direct p2p con-
it, we measure the time needed to allocate a clean ses- nection using Lambda functions. We expect it to be faster
sion, restoring a swapped session or to swap out a run- than S3 and Redis as it does not involve additional ser-
ning session. vices, just one copy of the data. However, this solution

10
103 Small input
2000
Large input

200
102
Send-Receive Bandwidth [MB/s]

1500

Time [ms]
150
101 1000
100

100 50 500
255.0
10.0
10 1 0 0

24

36

48

72

24

36

48

72
8

2
12

51

10

15

20

30

51

10

15

20

30
10 2
AWS AWS
PraaS PraaS

10 3
S3 PraaS, Warm Local (a) Thumbnail generation
10 4 Redis PraaS, Warm Remote
P2P Small input Large input
1500 1500
10 3 10 2 10 1 100 101 102 103 104

Time [ms]
Data Size [KB] 1000 1000

(a) Bandwidth comparison 500 500


95.5 102.3
S3 PraaS, Warm Local 0 0
Send-Receive Communication Time [ms]

Redis PraaS, Warm Remote

24

36

48

72

24

36

48

72
2

2
P2P

51

10

15

20

30

51

10

15

20

30
AWS AWS
102 PraaS PraaS

(b) Image recognition


101
Figure 11: Serverless functions in PraaS.

100 decrease cold startup frequency. While arguably such


functions are not serverless, such instances can be treated
10 1 as a limited substitute of warm and low-latency state 1 . In
10 3 10 2 10 1 100 101 102 103 104 addition to paying for using computing resources, users
Data Size [KB] are charged the active state fee Fs that depends on the
(b) Latency comparison memory size and the duration of resource provisioning.
To estimate the cost of keeping PraaS sessions alive in
Figure 10: Latency and bandwidth for communication
memory, we use the memory of other cloud services as a
between Lambda functions and the global memory put-
proxy for the cost to the cloud operator. First, we com-
get operations in PraaS.
pare the cost Vm of changing from a compute-optimized
Vm Cm Fm GV GC
to a memory-optimized virtual machine instances, and di-
AWS 3.53 · 10−3 4.45 · 10−3 1.5 · 10−2 76.43% 70.37%
Azure 3.87 · 10−3 ± 0.005 4.45 · 10−3 1.14 · 10−2 66.05% ± 0.05 60.96%
vide it by the size of gained memory, which estimates the
additional cost of gaining one gigabyte of DRAM. We
Table 3: Decreased costs of PraaS sessions in compari- compare c6g and x2gd instances on AWS and Fs and
son to FaaS provisioned instances. Eadsv5 series on Azure, and we found that Vm almost the
same for each instance size. Then, we select the cost of
does not fit the serverless paradigm – functions must be allocating additional memory when deploying PraaS on
online at the time of transfer and they must be aware managed container systems Cm , and we consider AWS
of each other’s existence, making practical use difficult. Fargate and Azure Container Instances. By comparing
Furthermore we confirm the previous result that Lambda the memory costs Vm ,Cm of PraaS deployment to the cost
shows significant I/O variability [11, 35]. PraaS pro- of provisioned storage Fm on FaaS, we estimate the cost
vides fast, efficient communication even across phys- decrease GV and GC of moving session state from pro-
ically separated machines. visioned FaaS to VMs and containers, respectively. The
results presented in Table 3 prove that PraaS sessions
can be offered at a lower cost, by up to 76%, and the
6.5 Costs
estimation covers the cloud provider costs and profit in-
What are the system costs? cluded in the price of a VM instance. Furthermore, the
PraaS users must be charged for keeping session state memory-optimized instances come with additional SSD
alive and the cost should depend linearly on the life- storage, which could be used to implement a low-latency
time of an active session. The only comparable feature tier for swapped sessions at no additional cost.
on modern commercial FaaS systems is a provisioned
function instance, known as provisioned concurrency on 1 Provisioned concurrency instances on AWS can be recycled and
AWS Lambda and premium plan for Azure Functions. renitialized, which makes state persistence difficult, if not impossible,
Cloud providers guarantee ready function instances to to implement in practice.

11
6.6 Functions sions that retain state. Within a process, sessions can
exchange messages. PraaS eliminates the need for state
Can FaaS functions run on PraaS? synchronization in stateful functions by partitioning state
We select the thumbnail generator as an example into ownership domains managed by sessions. Sessions
of general-purpose image processing and the image- can be swapped at any time but their state is not lost.
recognition benchmark performing ResNet-50 predic- Function Orchestration and Data Locality are also
tion as an example of integrating deep-learning inference being extensively studied. Systems such as Speedo [64]
into applications, both from the SeBS benchmark [11]. and Nightcore [65] optimize function orchestration by
We implement the thumbnail generation with OpenCV either accelerating the control plane [64] or by com-
4.5, and evaluate functions with two images, a 97 kB and pletely skipping it [65] for internal function invoca-
a 3.6 MB one. For the AWS Lambda function, we need to tions. Other systems have looked into how to optimize
submit the binary image data as a base64-encoded string the data path by comparing different function communi-
in the POST request, which adds significant overheads cation strategies and automatically adapting deployment
due to encoding and conversions. On the other hand, decisions [66], or by avoiding moving data by allowing
PraaS functions benefit from a payload format that is not multiple functions to share the execution environment
constrained by cloud API requirements. over time [67].
The image recognition is implemented with the help of PraaS, on the other hand, proposes that users should
PyTorch C++ API, using OpenCV 4.5, libtorch 1.9, and be able to directly connect to sessions and thus com-
torchvision 0.1. We evaluate functions with two inputs, pletely avoid the control plane after the first initialization
a 53 kB image and a 230 kB one. Results (Fig. 11a and step. In addition, by partitioning state among sessions,
11b) prove that PraaS outperforms all AWS instances process functions can exchange data directly among
we compared against. them instead of relying on external storage proposed by
previous systems [7, 8]. Although taking advantage of
6.7 Discussion direct communication between functions whenever pos-
sible, PraaS offers a global memory space abstraction to
Given the features PraaS provides, it could be used as a users instead of directly exposing network primitives to
platform to implementing distributed logistic regression developers [68].
with gradient averaging stochastic gradient descent in Serverless Sandboxes utilize specialized virtualiza-
both an asynchronous and synchronous fashion. The per- tion engines [58, 69] that drastically reduce the startup
formance of the system could improve compared to the time, and memory footprint when compared to tradi-
current implemention [60], due to using direct commu- tional virtual machine managers. However, to continue
nication. The PraaS model should make it easier to ex- improving scalability and elasticity of serverless applica-
press asynchronous computations. Finally, PraaS could tions, soft-isolation systems [70–72] have been proposed
dynamically change the number of functions being in- to co-execute multiple invocations inside the same OS
voked at a given time, while LambdaML [60] relies on a process. PraaS is, in part, inspired by such systems by
fixed number of functions throughout the execution. allowing multiple functions to execute concurrently in-
side a single session. By doing so, resource redundancy
is reduced and new opportunities for local communica-
7 Related Work
tion arise. However, PraaS’s design does not preclude
Stateful Functions open the spectrum of applications other orthogonal optimization techniques such as image
that can benefit from FaaS by allowing functions to pre-initialization [69, 73–75] to optimize session startup
keep state, even if disaggregated. Researchers have time and memory footprint, or unikernels [76–78] to op-
built stateful functions on top of key-value stores spe- timize process startup and memory footprint.
cialized to Serverless [18, 61], and elastic ephemeral
caches [51, 62, 63] which combine different placement 8 Conclusions
strategies to manage cost and performance. For many
applications, however, statefulness is not enough as func- This work proposes PraaS, a new cloud computing ab-
tions can fail at any time. Hence, systems such as straction that brings persistent state to ephemeral work-
Beldi [19], and Boki [20] have been proposed to help de- ers. By taking advantage of processes and sessions, ap-
velopers build consistent and fault tolerant systems atop plications can now benefit from low-latency state, fast
ephemeral functions. Instead of offering yet another stor- invocations that bypass the control plane, and fast com-
age option to persist function state or propose a new tech- munication between sessions. We showed that PraaS is
nique to handle faults, PraaS strikes a new balance be- an efficient abstraction to achieve stateful functions with
tween IaaS and FaaS by proposing the concept of ses- direct connections and fast communication options.

12
References [10] Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-che
Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Car-
[1] Yanqi Zhang, Íñigo Goiri, Gohar Irfan Chaudhry, Rodrigo Fon- reira, Karl Krauth, Neeraja Jayant Yadwadkar, Joseph E. Gonza-
seca, Sameh Elnikety, Christina Delimitrou, and Ricardo Bian- lez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. Cloud
chini. Faster and cheaper serverless computing on harvested re- programming simplified: A berkeley view on serverless comput-
sources. In Proceedings of the ACM SIGOPS 28th Symposium ing. CoRR, abs/1902.03383, 2019. URL http://arxiv.org/
on Operating Systems Principles, SOSP ’21, page 724–739, New abs/1902.03383.
York, NY, USA, 2021. Association for Computing Machinery.
ISBN 9781450387095. doi: 10.1145/3477132.3483580. URL [11] Marcin Copik, Grzegorz Kwasniewski, Maciej Besta, Michal
https://doi.org/10.1145/3477132.3483580. Podstawski, and Torsten Hoefler. Sebs: A serverless bench-
mark suite for function-as-a-service computing. In Proceed-
[2] Amoghavarsha Suresh and Anshul Gandhi. Servermore: Oppor- ings of the 22nd International Middleware Conference, Middle-
tunistic execution of serverless functions in the cloud. SoCC ware ’21. Association for Computing Machinery, 2021. doi: 10.
’21, page 570–584, New York, NY, USA, 2021. Association 1145/3464298.3476133. URL https://doi.org/10.1145/
for Computing Machinery. ISBN 9781450386388. doi: 10. 3464298.3476133.
1145/3472883.3486979. URL https://doi.org/10.1145/
3472883.3486979. [12] Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal,
Joao Carreira, Neeraja J. Yadwadkar, Raluca Ada Popa, Joseph E.
[3] Marcin Copik, Konstantin Taranov, Alexandru Calotoiu, and Gonzalez, Ion Stoica, and David A. Patterson. What serverless
Torsten Hoefler. RFaaS: RDMA-Enabled FaaS Platform for computing is and should become: The next phase of cloud com-
Serverless High-Performance Computing, 2021. URL https: puting. Commun. ACM, 64(5):76–84, April 2021. ISSN 0001-
//arxiv.org/abs/2106.13859. 0782. doi: 10.1145/3406011. URL https://doi.org/10.
1145/3406011.
[4] Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo
Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, and Ce Zhang. [13] L. Feng, P. Kudva, D. Da Silva, and J. Hu. Exploring server-
Towards demystifying serverless machine learning training. In less computing for neural network training. In 2018 IEEE 11th
ACM SIGMOD International Conference on Management of International Conference on Cloud Computing (CLOUD), pages
Data (SIGMOD 2021), June 2021. 334–341, July 2018. doi: 10.1109/CLOUD.2018.00049.

[5] Lixiang Ao, Liz Izhikevich, Geoffrey M. Voelker, and George [14] Hao Wang, Di Niu, and Baochun Li. Distributed machine learn-
Porter. Sprocket: A serverless video processing framework. ing with a serverless architecture. In IEEE INFOCOM 2019-
In Proceedings of the ACM Symposium on Cloud Computing, IEEE Conference on Computer Communications, pages 1288–
SoCC ’18, pages 263–274, New York, NY, USA, 2018. As- 1296. IEEE, 2019.
sociation for Computing Machinery. ISBN 9781450360111.
doi: 10.1145/3267809.3267815. URL https://doi.org/10. [15] John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng,
1145/3267809.3267815. Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Ne-
travali, Miryung Kim, et al. Dorylus: Affordable, scalable, and
[6] Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, accurate {GNN} training with distributed {CPU} servers and
Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul serverless threads. In 15th USENIX Symposium on Operating
Bhalerao, Anirudh Sivaraman, George Porter, and Keith Systems Design and Implementation (OSDI 21), pages 495–514,
Winstein. Encoding, fast and slow: Low-latency video pro- 2021.
cessing using thousands of tiny threads. In 14th USENIX
Symposium on Networked Systems Design and Implemen- [16] Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang,
tation (NSDI 17), pages 363–376, Boston, MA, March and Randy Katz. Cirrus: A serverless framework for end-to-end
2017. USENIX Association. ISBN 978-1-931971-37-9. ml workflows. In Proceedings of the ACM Symposium on Cloud
URL https://www.usenix.org/conference/nsdi17/ Computing, pages 13–24, 2019.
technical-sessions/presentation/fouladi.
[17] Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi,
[7] Matthew Perron, Raul Castro Fernandez, David DeWitt, and Jonas Pfefferle, and Christos Kozyrakis. Pocket: Elastic
Samuel Madden. Starling: A scalable query engine on cloud ephemeral storage for serverless analytics. In Proceedings of the
functions. In Proceedings of the 2020 ACM SIGMOD Inter- 12th USENIX Conference on Operating Systems Design and Im-
national Conference on Management of Data, SIGMOD ’20, plementation, OSDI’18, pages 427–444, USA, 2018. USENIX
page 131–141, New York, NY, USA, 2020. Association for Association. ISBN 9781931971478.
Computing Machinery. ISBN 9781450367356. doi: 10.
1145/3318464.3380609. URL https://doi.org/10.1145/ [18] Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann
3318464.3380609. Schleier-Smith, Joseph E. Gonzalez, Joseph M. Hellerstein, and
Alexey Tumanov. Cloudburst: Stateful functions-as-a-service.
[8] Ingo Müller, Renato Marroquin, and Gustavo Alonso. Lambada: Proc. VLDB Endow., 13(12):2438–2452, July 2020. ISSN 2150-
Interactive data analytics on cold data using serverless cloud in- 8097. doi: 10.14778/3407790.3407836. URL https://doi.
frastructure. ArXiv, abs/1912.00937, 2019. org/10.14778/3407790.3407836.

[9] Qifan Pu, Shivaram Venkataraman, and Ion Stoica. Shuf- [19] Haoran Zhang, Adney Cardoza, Peter Baile Chen, Sebastian An-
fling, fast and slow: Scalable analytics on serverless infrastruc- gel, and Vincent Liu. Fault-tolerant and transactional stateful
ture. In 16th USENIX Symposium on Networked Systems Design serverless workflows. In 14th USENIX Symposium on Operat-
and Implementation (NSDI 19), pages 193–206, Boston, MA, ing Systems Design and Implementation (OSDI 20), pages 1187–
February 2019. USENIX Association. ISBN 978-1-931971-49- 1204. USENIX Association, November 2020. ISBN 978-1-
2. URL https://www.usenix.org/conference/nsdi19/ 939133-19-9. URL https://www.usenix.org/conference/
presentation/pu. osdi20/presentation/zhang-haoran.

13
[20] Zhipeng Jia and Emmett Witchel. Boki: Stateful serverless com- [32] Nilanjan Daw, Umesh Bellur, and Purushottam Kulkarni.
puting with shared logs. In Proceedings of the ACM SIGOPS Xanadu: Mitigating cascading cold starts in serverless function
28th Symposium on Operating Systems Principles, SOSP ’21, chain deployments. In Proceedings of the 21st International
page 691–707, New York, NY, USA, 2021. Association for Middleware Conference, Middleware ’20, page 356–370, New
Computing Machinery. ISBN 9781450387095. doi: 10. York, NY, USA, 2020. Association for Computing Machinery.
1145/3477132.3483541. URL https://doi.org/10.1145/ ISBN 9781450381536. doi: 10.1145/3423211.3425690. URL
3477132.3483541. https://doi.org/10.1145/3423211.3425690.

[21] Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar [33] Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira. Pre-
Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby baking functions to warm the serverless cold start. In Proceed-
Tresness, Mark Russinovich, and Ricardo Bianchini. Server- ings of the 21st International Middleware Conference, Middle-
less in the wild: Characterizing and optimizing the server- ware ’20, page 1–13, New York, NY, USA, 2020. Association
less workload at a large cloud provider. In 2020 USENIX for Computing Machinery. ISBN 9781450381536. doi: 10.
Annual Technical Conference (USENIX ATC 20), pages 205– 1145/3423211.3425682. URL https://doi.org/10.1145/
218. USENIX Association, July 2020. ISBN 978-1-939133- 3423211.3425682.
14-4. URL https://www.usenix.org/conference/atc20/
presentation/shahrad. [34] Kun Suo, Junggab Son, Dazhao Cheng, Wei Chen, and Sabur
Baidya. Tackling cold start of serverless applications by efficient
[22] Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug and adaptive container runtime reusing. In 2021 IEEE Inter-
Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy national Conference on Cluster Computing (CLUSTER), pages
Roscoe. Arrakis: The operating system is the control plane. 433–443, 2021. doi: 10.1109/Cluster48925.2021.00018.
In 11th USENIX Symposium on Operating Systems Design and
Implementation (OSDI 14), pages 1–16, Broomfield, CO, Oc- [35] Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart,
tober 2014. USENIX Association. ISBN 978-1-931971-16- and Michael Swift. Peeking behind the curtains of serverless plat-
4. URL https://www.usenix.org/conference/osdi14/ forms. In Proceedings of the 2018 USENIX Conference on Usenix
technical-sessions/presentation/peter. Annual Technical Conference, USENIX ATC ’18, page 133–145,
USA, 2018. USENIX Association. ISBN 9781931971447.
[23] Collin Lee and John Ousterhout. Granular computing. In Pro-
ceedings of the Workshop on Hot Topics in Operating Systems, [36] Adam Eivy and Joe Weinman. Be wary of the economics of
HotOS ’19, page 149–154, New York, NY, USA, 2019. As- ”serverless” cloud computing. IEEE Cloud Computing, 4(2):6–
sociation for Computing Machinery. ISBN 9781450367271. 12, 2017. doi: 10.1109/MCC.2017.32.
doi: 10.1145/3317550.3321447. URL https://doi.org/10.
1145/3317550.3321447. [37] Apache OpenWhisk. https://openwhisk.apache.org/,
2016. Accessed: 2020-01-20.
[24] Pedro Garcia Lopez, Aleksander Slominski, Michael Behrendt,
and Bernard Metzler. Serverless predictions: 2021-2030, 2021. [38] M. Sciabarrà. Learning Apache OpenWhisk: Developing Open
Serverless Solutions. O’Reilly Media, Incorporated, 2019. ISBN
[25] AWS Lambda. https://aws.amazon.com/lambda/, 2014. 9781492046165.
Accessed: 2020-01-20.
[39] Benjamin Carver, Jingyuan Zhang, Ao Wang, Ali Anwar, Pan-
[26] Azure Functions. https://azure.microsoft.com/en-us/ ruo Wu, and Yue Cheng. Wukong: A scalable and locality-
services/functions/, 2016. Accessed: 2020-01-20. enhanced framework for serverless parallel computing. In Pro-
ceedings of the 11th ACM Symposium on Cloud Computing,
[27] Google Cloud Functions. https://cloud.google.com/ SoCC ’20, page 1–15, New York, NY, USA, 2020. Association
functions/, 2017. Accessed: 2020-01-20. for Computing Machinery. ISBN 9781450381376. doi: 10.
1145/3419111.3421286. URL https://doi.org/10.1145/
[28] IBM Cloud Functions. https://cloud.ibm.com/ 3419111.3421286.
functions/, 2016. Accessed: 2020-01-20.
[40] Arjun Singhvi, Kevin Houck, Arjun Balasubramanian, Mo-
[29] Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony hammed Danish Shaikh, Shivaram Venkataraman, and Aditya
Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. Akella. Archipelago: A scalable low-latency serverless platform,
Firecracker: Lightweight virtualization for serverless applica- 2019.
tions. In 17th USENIX Symposium on Networked Systems Design
and Implementation (NSDI 20), pages 419–434, Santa Clara, CA, [41] Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein,
February 2020. USENIX Association. ISBN 978-1-939133-13- Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt.
7. URL https://www.usenix.org/conference/nsdi20/ Sand: Towards high-performance serverless computing. In Pro-
presentation/agache. ceedings of the 2018 USENIX Conference on Usenix Annual
Technical Conference, USENIX ATC ’18, pages 923–935, USA,
[30] Johannes Manner, Martin EndreB, Tobias Heckel, and Guido 2018. USENIX Association. ISBN 9781931971447.
Wirtz. Cold start influencing factors in function as a service. 2018
IEEE/ACM International Conference on Utility and Cloud Com- [42] Minchen Yu, Tingjia Cao, Wei Wang, and Ruichuan Chen. Re-
puting Companion (UCC Companion), pages 181–188, 2018. structuring serverless computing with data-centric function or-
chestration. arXiv preprint arXiv:2109.13492, 2021.
[31] Anup Mohan, Harshad Sane, Kshitij Doshi, Saikrishna Edupu-
ganti, Naren Nayak, and Vadim Sukhomlinov. Agile cold starts [43] Bryan Ford, Pyda Srisuresh, and Dan Kegel. Peer-to-peer com-
for scalable serverless. In 11th USENIX Workshop on Hot munication across network address translators. In Proceedings
Topics in Cloud Computing (HotCloud 19), Renton, WA, July of the Annual Conference on USENIX Annual Technical Confer-
2019. USENIX Association. URL https://www.usenix.org/ ence, ATEC ’05, page 13, USA, April 2005. USENIX Associa-
conference/hotcloud19/presentation/mohan. tion.

14
[44] J. L. Eppinger. TCP Connections for P2P Apps: A Software Ap- [56] Kubernetes. https://kubernetes.io/, 2021. Accessed:
proach to Solving the NAT Problem. Carnegie Mellon University, 2021-11-29.
Technical Report, ISRI-05-104, January 2005.
[57] AWS S3. https://aws.amazon.com/s3/, 2006. Accessed:
[45] Jashwant Raj Gunasekaran, Prashanth Thinakaran, Mahmut Tay- 2021-06-07.
lan Kandemir, Bhuvan Urgaonkar, George Kesidis, and Chita
Das. Spock: Exploiting serverless functions for slo and cost [58] Firecracker. https://github.com/firecracker-microvm/
aware resource procurement in public cloud. In 2019 IEEE 12th firecracker, 2018. Accessed: 2020-01-20.
International Conference on Cloud Computing (CLOUD), pages
[59] Ethan G. Young, Pengfei Zhu, Tyler Caraza-Harter, Andrea C.
199–208, 2019. doi: 10.1109/CLOUD.2019.00043.
Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. The true cost
[46] V. Giménez-Alventosa, Germán Moltó, and Miguel Caballer. A of containing: A gvisor case study. In 11th USENIX Workshop
framework and a performance assessment for serverless mapre- on Hot Topics in Cloud Computing (HotCloud 19), Renton, WA,
duce on aws lambda. Future Generation Computer Systems, 97: July 2019. USENIX Association. URL https://www.usenix.
259–274, 2019. ISSN 0167-739X. doi: https://doi.org/10.1016/ org/conference/hotcloud19/presentation/young.
j.future.2019.02.057. URL https://www.sciencedirect.
[60] LambdaML. https://github.com/DS3Lab/LambdaML/
com/science/article/pii/S0167739X18325172.
blob/master/examples/lambda/s3/linear_s3_ga_ma.
[47] Ahsan Ali, Riccardo Pinciroli, Feng Yan, and Evgenia Smirni. py, 2020. Accessed: 2020-01-20.
Batch: Machine learning inference serving on serverless plat-
[61] Daniel Barcelona-Pons, Marc Sánchez-Artigas, Gerard Parı́s,
forms with adaptive batching. In SC20: International Confer-
Pierre Sutra, and Pedro Garcı́a-López. On the faas track: Build-
ence for High Performance Computing, Networking, Storage and
ing stateful distributed applications with serverless architectures.
Analysis, pages 1–15, 2020. doi: 10.1109/SC41405.2020.00073.
In Proceedings of the 20th International Middleware Confer-
[48] Minchen Yu, Zhifeng Jiang, Hok Chun Ng, Wei Wang, Ruichuan ence, Middleware ’19, page 41–54, New York, NY, USA, 2019.
Chen, and Bo Li. Gillis: Serving large neural networks in server- Association for Computing Machinery. ISBN 9781450370097.
less functions with automatic model partitioning. In 2021 IEEE doi: 10.1145/3361525.3361535. URL https://doi.org/10.
41st International Conference on Distributed Computing Systems 1145/3361525.3361535.
(ICDCS), pages 138–148, 2021. doi: 10.1109/ICDCS51616.
[62] Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi,
2021.00022.
Jonas Pfefferle, and Christos Kozyrakis. Pocket: Elastic
[49] Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana ephemeral storage for serverless analytics. In Proceedings of
Klimovic, Somali Chaterji, and Saurabh Bagchi. {SONIC}: the 13th USENIX Conference on Operating Systems Design and
Application-aware data passing for chained serverless appli- Implementation, OSDI’18, page 427–444, USA, 2018. USENIX
cations. In 2021 {USENIX} Annual Technical Conference Association. ISBN 9781931971478.
({USENIX}{ATC} 21), pages 285–301, 2021.
[63] Qifan Pu, Shivaram Venkataraman, and Ion Stoica. Shuffling,
[50] Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, fast and slow: Scalable analytics on serverless infrastructure. In
Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Proceedings of the 16th USENIX Conference on Networked Sys-
Bhalerao, Anirudh Sivaraman, George Porter, and Keith Win- tems Design and Implementation, NSDI’19, page 193–206, USA,
stein. Encoding, fast and slow: Low-latency video processing 2019. USENIX Association. ISBN 9781931971492.
using thousands of tiny threads. In Proceedings of the 14th
[64] Nilanjan Daw, Umesh Bellur, and Purushottam Kulkarni.
USENIX Conference on Networked Systems Design and Im-
Speedo: Fast Dispatch and Orchestration of Serverless Work-
plementation, NSDI’17, pages 363–376, USA, 2017. USENIX
flows, page 585–599. Association for Computing Machinery,
Association. ISBN 9781931971379.
New York, NY, USA, 2021. ISBN 9781450386388. URL
[51] Francisco Romero, Gohar Irfan Chaudhry, Íñigo Goiri, Pragna https://doi.org/10.1145/3472883.3486982.
Gopa, Paul Batum, Neeraja J. Yadwadkar, Rodrigo Fonseca,
[65] Zhipeng Jia and Emmett Witchel. Nightcore: Efficient and
Christos Kozyrakis, and Ricardo Bianchini. Faa$t: A transpar-
scalable serverless computing forlatency-sensitive, interactive
ent auto-scaling cache for serverless applications, 2021.
microservices. In Proceedings ofthe 26th ACM International
[52] Torsten Hoefler, J. Dinan, Rajeev Thakur, Brian Barrett, P. Bal- Conference on Architectural Support for Programming Lan-
aji, William Gropp, and K. Underwood. Remote Memory Access guages and Operating Systems, ASPLOS ’21, New York, NY,
Programming in MPI-3. ACM Transactions on Parallel Comput- USA, 2021. Association for Computing Machinery. doi: 10.
ing (TOPC), Jan. 2015. accepted for publication on Dec. 4th. 1145/3445814.3446701. URL https://doi.org/10.1145/
3445814.3446701.
[53] Eric Jonas, Shivaram Venkataraman, Ion Stoica, and Benjamin
Recht. Occupy the cloud: Distributed computing for the 99%. [66] Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana
CoRR, abs/1702.04024, 2017. URL http://arxiv.org/abs/ Klimovic, Somali Chaterji, and Saurabh Bagchi. SONIC:
1702.04024. Application-aware data passing for chained serverless applica-
tions. In 2021 USENIX Annual Technical Conference (USENIX
[54] Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, ATC 21), pages 285–301. USENIX Association, July 2021.
Jonas Pfefferle, and Animesh Trivedi. Understanding ephemeral ISBN 978-1-939133-23-6. URL https://www.usenix.org/
storage for serverless analytics. In 2018 USENIX Annual Tech- conference/atc21/presentation/mahgoub.
nical Conference (USENIX ATC 18), pages 789–794, Boston,
MA, July 2018. USENIX Association. ISBN 978-1-939133- [67] Swaroop Kotni, Ajay Nayak, Vinod Ganapathy, and Arkaprava
01-4. URL https://www.usenix.org/conference/atc18/ Basu. Faastlane: Accelerating Function-as-a-Service work-
presentation/klimovic-serverless. flows. In 2021 USENIX Annual Technical Conference (USENIX
ATC 21), pages 805–820. USENIX Association, July 2021.
[55] Knative. https://knative.dev/, 2021. Accessed: 2021-11- ISBN 978-1-939133-23-6. URL https://www.usenix.org/
21. conference/atc21/presentation/kotni.

15
[68] Mike Wawrzoniak, Ingo Müller, Rodrigo Fraga Barcelos Association for Computing Machinery. ISBN 9781450350853.
Paulus Bruno, and Gustavo Alonso. Boxer: Data analytics on doi: 10.1145/3132747.3132763. URL https://doi.org/10.
network-enabled serverless platforms. In 11th Annual Confer- 1145/3132747.3132763.
ence on Innovative Data Systems Research (CIDR’21), 2021.
[78] Ricardo Koller and Dan Williams. Will serverless end the domi-
[69] Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, nance of linux in the cloud? In Proceedings of the 16th Workshop
Chenggang Qin, Qixuan Wu, and Haibo Chen. Catalyzer: Sub- on Hot Topics in Operating Systems, HotOS ’17, page 169–173,
millisecond startup for serverless computing with initialization- New York, NY, USA, 2017. Association for Computing Machin-
less booting. In Proceedings of the Twenty-Fifth International ery. ISBN 9781450350686. doi: 10.1145/3102980.3103008.
Conference on Architectural Support for Programming Lan- URL https://doi.org/10.1145/3102980.3103008.
guages and Operating Systems, ASPLOS ’20, page 467–481,
New York, NY, USA, 2020. Association for Computing Machin-
ery. ISBN 9781450371025. doi: 10.1145/3373376.3378512.
URL https://doi.org/10.1145/3373376.3378512.

[70] Vojislav Dukic, Rodrigo Bruno, Ankit Singla, and Gustavo


Alonso. Photons: Lambdas on a diet. In Proceedings of the 11th
ACM Symposium on Cloud Computing, SoCC ’20, page 45–59,
New York, NY, USA, 2020. Association for Computing Machin-
ery. ISBN 9781450381376. doi: 10.1145/3419111.3421297.
URL https://doi.org/10.1145/3419111.3421297.

[71] Simon Shillaker and Peter Pietzuch. Faasm: Lightweight


isolation for efficient stateful serverless computing. In 2020
{USENIX} Annual Technical Conference ({USENIX}{ATC} 20),
pages 419–433, 2020.

[72] Sol Boucher, Anuj Kalia, David G. Andersen, and Michael


Kaminsky. Putting the ”micro” back in microservice. In
2018 USENIX Annual Technical Conference (USENIX ATC 18),
pages 645–650, Boston, MA, July 2018. USENIX Association.
ISBN 978-1-939133-01-4. URL https://www.usenix.org/
conference/atc18/presentation/boucher.

[73] Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein,
Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt.
SAND: Towards High-Performance serverless computing. In
2018 USENIX Annual Technical Conference (USENIX ATC 18),
pages 923–935, Boston, MA, July 2018. USENIX Association.
ISBN 978-1-939133-01-4. URL https://www.usenix.org/
conference/atc18/presentation/akkus.

[74] James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran
Krieger, and Jonathan Appavoo. Seuss: Skip redundant paths
to make serverless fast. In Proceedings of the Fifteenth Eu-
ropean Conference on Computer Systems, EuroSys ’20, New
York, NY, USA, 2020. Association for Computing Machinery.
ISBN 9781450368827. doi: 10.1145/3342195.3392698. URL
https://doi.org/10.1145/3342195.3392698.

[75] Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler
Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau.
SOCK: Rapid task provisioning with serverless-optimized con-
tainers. In 2018 USENIX Annual Technical Conference (USENIX
ATC 18), pages 57–70, Boston, MA, July 2018. USENIX Associ-
ation. ISBN 978-1-931971-44-7. URL https://www.usenix.
org/conference/atc18/presentation/oakes.

[76] Yiming Zhang, Jon Crowcroft, Dongsheng Li, Chengfei Zhang,


Huiba Li, Yaozheng Wang, Kai Yu, Yongqiang Xiong, and Gui-
hai Chen. Kylinx: A dynamic library operating system for
simplified and efficient cloud virtualization. USENIX ATC
’18, page 173–185, USA, 2018. USENIX Association. ISBN
9781931971447.

[77] Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Si-
mon Kuenzer, Sumit Sati, Kenichi Yasukata, Costin Raiciu, and
Felipe Huici. My vm is lighter (and safer) than your container. In
Proceedings of the 26th Symposium on Operating Systems Prin-
ciples, SOSP ’17, page 218–233, New York, NY, USA, 2017.

16

You might also like