Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Configuring

Kafka for
High Throughput
Apache Kafka is one of the most popular open source messaging
systems. This whitepaper explores its performance aspects to
achieve high throughput.

www.impetus.com
Introduction
In the last several years, Hadoop has evolved into an excellent and mature batch
processing framework which handles the volume, veracity and variety of Big Data.

However, many use cases across various domains need to handle the velocity of big
data that Hadoop simply is not suited to handle. These use cases require real-time
responses for faster decision making, including but not limited to:

• Credit card fraud analytics


• Network fault prediction using sensor data
• Social media sentiment analysis

Real-time analytics are also needed for large organization who generate significant
log activity. These real time systems need to be able to correlate and predict events
based on streaming data as it happens.

Apache Kafka is a distributed, robust and high-throughput publish-subscribe


messaging system originally developed at LinkedIn for processing LinkedIn’s activity
stream. Its purpose is to process all activity stream/log data at lighting speed. Its
unique design allows it to process in a way that traditional messaging systems
cannot handle.

Let’s explore the issues with traditional messaging systems.

Limitations of Existing Systems


Real-time log data creates new challenges for data systems because its volume is
orders of magnitude larger than the “real” data.

For example, China Mobile collects 5-8TB of phone call records per day and
Facebook gathers almost 6TB of various user activity events every day.

Traditional enterprise messaging systems have existed for a long time, but are not
able to cope with the high velocity and volume of data. Here’s why they are no
longer a good fit. Traditional messaging systems do the following:

• Keep track of the consumer state. Brokers store metadata about the.
Storing metadata for billions of messages creates large overhead for
brokers which negatively affects performance of the broker.
• Do not partition and store messages on multiple machines easily.
• Do not offer an easy way for automatic failover. Therefore, when a server in
the cluster fails, messages are unavailable.
• Do not support any replay capability for consumers.
• Can cause brokers to hang, leak connections, or run out of memory.

A number of specialized log aggregators have been built over the last few years e.g.
Scribe, Flume etc. However, most of those systems are built for consuming the log
data offline. Additionally, most of them use a “push” model in which the broker
forwards data to consumers. These aggregators have no way of avoiding the
possibility of being flooded by messages pushed faster than they can handle.

2 Also it’s difficult to provide a rewind capability to the consumer.


An Overview of Kafka
First, some basic messaging terminology:

• Kafka maintains feeds of messages in categories called topics.


• Processes that publish messages to a Kafka topic are called producers.
• Processes that subscribe to topics and process the feed of published
messages are called consumers.
• Kafka is run as a cluster comprised of one or more servers each of which is
called a broker.

At a high level, producers send messages over the network to the Kafka cluster
which in turn serves them up to consumers like this:

Producer Producer Producer

Kafka Cluster

Consumer Consumer Consumer

All messaging systems consist of these three major components:

1. Message producer
2. Message consumer
3. Message broker

Message producers and message consumers use message queues for


asynchronous inter-process communication. Messaging systems support both
point-to-point as well as publish/subscribe communication.

Kafka is distributed in nature. This means that producers, consumers, and brokers
can all be run on a cluster of machines that serves as a logical group. Kafka follows
publish/subscribe communication. Again, let’s discuss some important definitions
of the components that make up Kafka:

• A stream of messages of a particular type is called a topic.


• A topic is further divided in to partition to store the same kind of messages.
• A producer can publish messages to a topic.
• The published messages are then stored on a set of servers called brokers.
• A consumer can subscribe to one or more topics and consume the
published messages by pulling data from the brokers.

3
Kafka Design and Architecture
As Kafka is distributed in nature, a Kafka cluster typically consists of multiple
brokers. To balance load, a topic is divided into multiple partitions and each broker
stores one or more of those partitions. Multiple producers and consumers can
publish and retrieve messages at the same time.

Kafka follows the traditional design, shared by most messaging systems, where
data is pushed to the broker from the producer and pulled from the broker by the
consumer.

However, there are several important aspect of Kafka design which makes Kafka
different from traditional messaging systems:

• Kafka storage strategy


• Kafka broker design
• Distributed co-ordination
• Delivery semantics & guarantees
• Replication strategy

My Topic
Partition 1

Consumer
0 1 A
Producer
A
Partition 2

Consumer
0 1 2 A

Partition 3

Producer Consumer
B 0 1 A

Figure 1: Kafka Architecture

Kafka Storage Strategy: Kafka layout has topics and each topic is further divided in
to different partitions. A partition is the smallest unit of parallelism. Each partition
of a topic corresponds to a logical log. Physically, a log is implemented as a set of
segment files of equal sizes. The producer publishes a message to a partition, and
the broker only appends the message to the last segment file. The segment file is
flushed to disk after a predefined interval.

Here are some interesting attributes of the storage strategy:

• A message stored in the Kafka system doesn’t have explicit message IDs.
Messages are associated with the logical offset in the log. This avoids the
overhead of maintaining (btree) random-access index structures that map
the message IDs to the actual message locations. Messages IDs are not
consecutive. To calculate the ID of the next message adds a length of the
current message to its logical offset.

4
• The consumer always consumes messages from a particular partition
sequentially, and if the consumer acknowledges a particular message
offset, it means that the consumer has consumed all prior messages. The
consumer issues an asynchronous pull request to the broker to have a
buffer of bytes ready to consume. Each asynchronous pull request contains
the offset of the message to consume.
• Kafka uses a zero-copy technique where data is copied into page cache
exactly once and reused on each consumption instead of being stored in
memory and copied out to kernel space every time it is read.

Kafka Broker: The Kafka broker is stateless. By stateless, we mean it is a


consumer’s responsibility to maintain how much it has consumed.

• The Kafka broker implies a simple time-based SLA for the retention policy. A
message will automatically be deleted after a certain period.
• With the above mentioned time based retention policy, the consumer can
re-consume older data.

Distributed Coordination: To facilitate coordination, Kafka uses a highly available


coordinator service called Zookeeper. By using the Zookeeper API, one can
create/set/read/delete a path. Kafka uses Zookeeper to:

• Detect the addition and the removal of brokers and consumers.


• Trigger a rebalance process in each consumer when the above events
happen.
• Maintain the consumption relationship and keep track of the consumed
offset of each partition.

Kafka guarantees at-least-once delivery: Kafka guarantees at-least-once delivery.

• Most of the time, a message is delivered exactly once to each consumer


group.
• When a consumer process crashes without a clean shutdown, the consum-
er process that takes over those partitions owned by the failed consumer
may get some duplicate messages that are after the last offset successful-
ly committed to zookeeper.
• Kafka guarantees that messages from a single partition are delivered to a
consumer in order. However, there is no guarantee on the ordering of
messages coming from different partitions.

Replication Strategy: Kafka replicates the log for each topic's partitions across
brokers. This allows

• Automatic failover to these replicas when a server in the cluster fails; thus,
messages remain available in the presence of failures.
• The unit of replication is the topic’s partition. The total number of replicas
including the leader constitute the replication factor.
• All reads and writes go to the leader of the partition. All leaders are evenly
distributed among brokers. The logs on the followers are identical to the
leader's log. All have the same offsets and messages in the same order
• Leader keeps track of all "in sync" nodes. If a follower dies or falls behind,
the leader will remove it from the list of in sync replicas. There are some
configurations which tell whether a replica is stuck or has fallen behind.

5
Kafka Cluster Configuration
In this section, we describe how we use Kafka in our use cases. Figure 1 shows a
simplified version of our deployment. We have the main data center where the
Kafka-cluster is composed of producer, broker and consumer components. The
external system generates various kinds of log data over socket/TCP IP and the
producer component parses the stream into messages and publishes it to the local
Kafka brokers in batches. We rely on a custom partitioner to distribute the
published requests to the set of Kafka brokers evenly.

External System

TCP/IP Producer A
1 Gbps link Kafka Broker 1
Producer A 1 Gbps link Producer B
Kafka Broker 2
Producer B Producer N
Kafka Broker 3
Producer N
Kafka Broker 4
Zookeeper
Zookeeper Ensemble (3)
Ensemble (3)

- 32 Core Machine - 32 Core Machine - 32 Core Machine


- 125 GB Memory - 125 GB Memory - 125 GB Memory
- 6 Mounted drive with
11 TB on each box
- Total 24 drives in cluster

Producer Layer Broker Layer Consumer Layer


Figure 2: Kafka Deployment

Message data is maintained for 7 days. Log segments are garbage collected after
this time period whether or not they have been consumed. A Zookeeper Ensemble
of 3 nodes on each Kafka and Storm cluster is deployed to handle distribution,
coordination and client state management.

Engineering for High Throughput


The overall goal is to maximize the end-to-end throughput while providing low
latency. We applied several performance tuning parameters on a Kafka cluster
deployment described in the above section to achieve high throughput. Below is a
performance summary and performance benchmarking which focuses on the
tuning parameters of producer, broker and consumer components. We have used
Kafka 0.8 version for the performance benchmarking.

Performance Verticals
Kafka has three verticals to tune the overall performance of the Kafka cluster.

1. Producer Performance
2. Broker Performance
3. Consumer Performance

6
Producer Performance Benchmarking
Kafka uses three primary techniques to improve the producer’s effective
throughput:

1. The first is to batch messages together to send larger chunks at once.


2. The second is shrinking the data or compression to make the total data
sent smaller.
3. The third is partitioning the data so that production, brokering, and
consumption of data are all handled by clusters of machines that can be
scaled incrementally as load increases.

We will discuss each one them and how it helped us in achieving high performance
throughput.

1. Batching
The Kafka producer can be configured to send messages in either a
synchronous or asynchronous fashion. The async mode allows the client to
batch small random messages into larger data chunks before sending it
over the network. The broker does not do further batching of file system
writes as the cost of a buffered write is extremely low, especially after the
client side batching has taken place. Instead Kafka simply relies on the file
system page cache to buffer writes and delay the flush to disk in order to
batch the expensive physical disk writes. As we implemented batching, we
saw an overall improvement of around 9x over sync mode.

Sync producer - 40x producer threads, replication 1, ack 1

Async producer - 7x producer threads, batch size 8000, replication 1, ack 1

Producer Type Rate (messages/sec) Num of messages published


sync 12000 9940874
async 104000 9940874

Producer Type Rate (MB/sec) Num of messages published


sync 14.3 9940874
async 96.4 9940874

In the case of batching, we had to keep a very precise control over the
timeout and message count threshold that triggers the send, thus guaran-
teeing that we never batch more than a given number of messages and
that no message is held for more than a configurable number of millisec-
onds. This precise control is important since messages sent asynchro-
nously are at risk of loss if the application crashes. Also the number of
producer threads play a quite critical role in the rate at which the messages
are sent to the broker. There has to be an adequate number of threads for
sending the batch, and the balance has to be maintained between the
number of threads and the batch size. Keeping a proper balance between
them resulted in a 1.6x (rate of 104k messages/sec) increase in perfor-
mance as compared to an improper balance between them. The figure
below highlights the same:

7
Batch 2000 Batch 3500 Batch 6000 Batch 8000

110K

Rate (messages/sec)
90K

70K

50K

0 7 20 30 40
Producer Threads

2. Compression
Compressing data is the most important performance technique as
bandwidth is one of the most fundamental bottlenecks between data
center facilities. This bandwidth is more expensive per-byte to scale than
disk I/O, CPU, or network bandwidth capacity within a facility. Kafka allows
the batch compression of multiple messages into a single composite
message set to allow effective compression.

Compression is done as part of the batching in the producer API.


Compressed message sets containing a few hundred messages are sent to
Kafka where they are retained and served in compressed form. Kafka
supports a couple of compression algorithms such as Snappy and Gzip. We
actually saw an improvement in throughput in our tests using Gzip.

Compression Producer Rate N/W Bandwidth Num of Messages


Codec (messages/sec) Utilization Published
none 104k 50 60% 9940874
Gzip 103k 20% 9940874
Snappy 119k 15% 9940874

3. Partitioning
• A topic can be divided into multiple partitions.
• Each broker stores one or more of those partitions.
• Multiple producers and consumers can publish and retrieve messages at
the same time.
• Partitions are a way to parallelize the consumption of the messages.
• The total number of partitions in a broker cluster need to be at least the
same as the number of consumers in a consumer group to make sense of
the partitioning feature.
• Consumers in a consumer group will split the burden of processing the
topic between themselves according to the partitioning. This allows the
consumer to only concern itself with messages in the partition it is
"assigned to."
• Partitioning can either be explicitly set using a partition key on the producer
side or if not provided, a random partition will be selected for every
message.

8
We ran couple of tests using the producer performance script which
comes shipped with Kafka to evaluate the behavior of the producer while
increasing the number of partitions. The results are below:

Number of Broker Partitions Producer Rate (messages/sec) Num of messages published


8 120k 9940874
16 170k 9940874
24 250k 9940874

4. Socket Buffer Size


The Socket Buffer size plays a very critical role in Kafka performance
tuning. It denotes the buffer size of the TCP socket. The default value of
send.buffer.size (for producer) is 100KB. In our environment, RTT > ~10ms
between the producer and the broker. Increasing the value of this
parameter from 100KB to 4MB gave a significant performance boost. This
value is also dependent on receive and send buffer sizes of the broker.
These values must be greater than the producer send.buffer.size. In our
case, we kept receive and send buffer sizes of the broker as 20MB.

Broker Performance Benchmarking


Kafka uses three primary techniques to improve the broker’s effective throughput.

• The first is to max socket buffer size to enable a high-performance data


transfer between data centers. The specific settings for the broker are:

1. socket.send.buffer.bytes. It denotes the SO_SNDBUFF buffer the


server prefers for socket connections.

2. socket.receive.buffer.bytes. It denotes the SO_RCVBUFF buffer the


server prefers for socket connections.

The default value of both attributes is 100KB. In our case, we kept receive
and send buffer sizes of the broker as 20MB.

• The second is the Disk and Filesystem. We recommend using multiple


drives to get good throughput. We also recommend not sharing the same
drives used for Kafka data with application logs or other OS filesystem
activity. This ensures good latency. If you configure multiple data directo-
ries, partitions will be assigned round-robin to data directories. Each
partition will be entirely in one of the data directories. If data is not well
balanced among partitions, this can lead to load imbalance between disks.

• The third is the logging level. We kept the default log4j logging level as
ERROR to keep the disk I/O minimum. Streams, number of fetcher threads,
socket receive buffer size, fetch size.

Consumer Performance Benchmarking


Consumer performance is entirely dependent upon the code written on the
consumer side; however, Kafka also provides four important parameters to throttle
the consumer performance:

9
1. The first is number of streams. We recommended using the same number
of streams as the number of partitions of the topic in Kafka from which the
consumer is consuming the data to achieve the best parallelism.
2. The second is fetch.message.max.bytes. The default value is 1MB. It is the
number of byes of messages to attempt to fetch for each topic-partition in
each fetch request. These bytes are read into memory for each partition, so
this helps control the memory used by the consumer. The fetch request
size must be at least as large as the maximum message size the server
allows or else it is possible for the producer to send messages larger than
the consumer can fetch.
3. The third is num.consumer.fetchers. The default value is 1. It specifies the
number of fetcher threads used to fetch data.
4. The fourth is socket.receive.buffer.bytes. The default value is 64KB. It
specifies the socket receive buffer for network requests.

Performance Summary
• Producer performance
• 104K messages/sec
• 95.5 MB/sec
• Consumer performance
• 102k messages/sec
• 95 MB/sec
• 4 Brokers
• 1 topic with 8 partitions (custom partitioner to evenly distribute across
brokers)
• Replication factor = 1
• Message size = 980 Bytes
• Parallel Producer threads = 7
• Producer type = async
• Batch size = 8K
• Max queue buffering (in ms) = 1200
• Socket send buffer size (in bytes) = 64K

10
Monitoring
We monitored the performance of the Kafka cluster as follows:

• Kafka uses Yammer Metrics for metrics reporting in both the server and the
client. The easiest way to see the available metrics is to use jconsole and
point it at a running Kafka client or server; this allows browsing on all
metrics with JMX.
• Some key metrics are – MessagesInPerSec, MessagesOutPerSec and
BytesInPerSec, BytesOutPerSec.
• Nload utility in linux provides a way to measure the data inflow/outflow rate
of ethernet link used by Kafka.
• JvisualVM to monitor the live thread states (waiting vs running).
• Kafka specific tool – (1) GetOffsetShell – to view the latest and earliest
count of messages in each partition on Broker side (2) ConsumerOffset-
Checker – to view the number of messages consumed by each consumer
group.

About Impetus

© 2015 Impetus Technologies, Inc. Impetus is a Software Solutions and Services Company with deep technical
All rights reserved. Product and maturity that brings you thought leadership, proactive innovation, and a track
company names mentioned herein record of success. Our Services and Solutions portfolio includes Carrier grade
may be trademarks of their large systems, Big Data, Cloud, Enterprise Mobility, and Test and Performance
respective companies. Engineering.
Aug 2015
Visit www.impetus.com or write to us at inquiry@impetus.com

You might also like