High Performance Stream Processing

By Stephane Maldini, Glenn Renfro, David Turanski
@smaldini, @cppwfs, @dturanski

U n l e s s ot h erwi se i ndi cat ed , t hese sl id es ar e 201 3-2 01 5 P ivo tal So ftwa re, In c. and lic e nse d u nd e r a C re ativ e C ommons Attrib u tion-N onC omme rc ial li c e ns e : h tt p: / / c r ea t i vec o m m o n s .o r g/ l i c en s es / by- n c / 3 .0 /

Performance as it pertains to:

Message flow

Message Flow: The Myth?

1 million events a second?

Message Flow: The Myth?

It Depends
Message Flow: The Myth?

Check Network speed

Check Disk Read Write speed
Processor Speed (specs)
Hardware: Network
1 Gb Ethernet
Msg Size


10 Gb Ethernet
Msg Size
















100 12,500,000

Hardware: Disk

dd bs=1M count=256 if=/dev/zero of=test

will just commit your 128 MB of data into a RAM buffer
initially fast but server is still writing to disk after test

dd bs=1M count=256 if=/dev/zero of=/tmp/testfile

This tells dd to require a complete sync once, right before it
Ensures all data is on the disk before calculating result
Message Size

Message Size

o Default is 16384
o . vs .
o Default for XD is 1
Testing Tools
Spring XD

Spring AMQP

Per Sec


10 158,564
100 453,926
Network Hops

Adds a cost per hop

Direct Binding
Composed Modules
Custom Module

One last little thing

JMX is disabled by default
When enabled it took a performance hit
because how SI was capturing stats via its

Object Serialization

byte[] required for transporting data between

remote processes
XD uses Kryo except when the payload type is
byte[] or String
XD supports optimizing Kryo for known payload

Serialization Benchmarks
An excellent comparative JVM serializers benchmark:

Best case: ~1500 ns

Serialization Benchmarks
Domain Object (as JSON):
{"uri": "", "title":"Javaone Keynote", "width":640,
262144,"persons":["Bill Gates", "Steven Jobs"],
"player":"JAVA","copyright":"" },
{"uri": "","title":"Javaone Keynote","width":
{ "uri": "", "title":"Javaone
Keynote","width": 320,"height":240, "size":"SMALL"}]
Object Serialization

Size matters: YMMV

Manually optimized Kryo ser/deser ~ 1500 ns = 1.5 s = .0015 ms.
Kafka XD;1000B messages ~ 500,000 msg/sec
2000 ns per message
Serialization overhead
~ 285174 msg/sec (source|sink)
At 50,000 msg/sec, the overhead may still be significant

Optimizing Kryo in XD

Disable references - If you know payload

types do not contain cyclic references.
xd.codec.kryo.references=false (in servers.yml)

This is a global setting for all streams

Register a custom serializer for a known
payload type

Install a jar with containing the required beans in

XD will auto-configure these
Custom Serializers in XD

package spring.xd.bus.ext;

XD scans this
package for beans of
type KryoRegistrar

Each Registration associates
public class CustomKryoRegistrarConfig {
a type to a serializer and a
unique ID
public KryoRegistrar myCustomRegistration() {
List<Registration> registrations = new ArrayList<>();
registrations.add(new Registration(MyObject.class, new MySerializer(),62));
return new KryoRegistrationRegistrar(registrations);
public PojoCodec(java.util.List<KryoRegistrar> kryoRegistrars, boolean useReferences)

Custom Serializers in XD
public class AddressSerializer extends Serializer<Address> {
public void write(Kryo kryo, Output output, Address address) {

public Address read(Kryo kryo, Input input, Class<Address> type) {
return new Address(input.readString(),input.readString(),input.readString());

Serializable Domain Object

(+) Simple: This works
out of the box with no
(-) Requires access to
source or wrapping
(-) Internal
benchmarks indicate

public class Address implements KryoSerializable {


public void write(Kryo kryo, Output output) {

public void read(Kryo kryo, Input input) {
this.street = input.readString(); = input.readString(); = input.readString();

Benchmarking Your Custom Serializers

The spring-xd-samples repo includes a serializationbenchmarks project
Lets look at some code
Sample Results

Ser (ns)

Desr (ns)




Serializable Domain



Custom Serializers



What about Processing ?

Source Msg/s > Sink Msg/s ?

Rate limited by Sink
Blocking transformation (http, file) ?
Rate limited by blocking Processor
Polling Sources Pausing ?

Rate limited by small Prefetch properties

Mitigating Cost of IO
Negative impact ?
Scale Out ?
o Works up to a point
o Network cost
Scale Up ?
o Message passing Overhead
o More In-Flight Data
Un l ess otherwi s e indi cate d, t he se sl i des ar e 201 3 - 201 5 P ivo t al So ft war e, Inc . and lic e nse d u nd e r a
Blocking IO
request A


request B

request C


Request Latency
Network Latency

Rate Degradation = -(A + B + C) ms

Asynchronous Boundary



Rate degradation = -(Async Hand O) ms

Request Latency
Network Latency



Asynchronous IO
Mitigate temporarily slow processors/sink
Back to degraded mode when queue full
Async Hand-Off generates Garbage

Reactor Core: Efficient Asynchronous

Trade-off Memory vs Garbage generation

Pre-Allocated Ring Buffer
Concurrent consuming without duplicating
buffer content
Ring Buffer Consumer Sequences
Ring What?

schedule Message<?> execution

get and publish
next available

Event Loop
read published slot

execute Message<?>

Reactor Core: Efficient Asynchronous

The Spring XD Module
public interface Processor<I, O> {
Publisher<O> process(Stream<I> inputStream);

public class PongMessageProcessor implements Processor<Message, Message> {

public Stream<Message> process(Stream<Message> inputStream) {
return ->
new GenericMessage<String>(message.getPayload() + "-pojopong")

Parallel Scatter Gather !

Request Latency
Network Latency



Reactor Stream and RxJava

Compose asynchronous results

Without blocking (unlike future.get())
Reduce the processor/sink backlog !

Scatter Gather
public class AsyncNetworkProcessor implements Processor<Message, String> {
public Observable<String> process(Observable<Message> inputStream) {
return inputStream.flatMap(message ->
postHttp(/userProfile/ + message.getHeader(user_id)),
postHttp(/userLocation/ + message.getHeader(user_id)),
(respA, respB) -> respA + , + respB
public Observable<String> postHttp(String endpoint){
// An asynchronous HTTP call to forward response as CSV
Reactor Stream and RxJava

Some operators help tuning the right packet

size to send over the network
MicroBatching !

public class AsyncNetworkProcessor implements Processor<String, String> {
public Stream<String> process(Stream<String> inputStream) {
return inputStream.window(1000, 1, TimeUnit.SECONDS)
.flatMap(messages ->
messages.reduce(, (prev, next) -> prev + , + next

Learn More. Stay Connected.


Microservices to Fast Data

John T. Davies

