Professional Documents
Culture Documents
W Java133
W Java133
W Java133
Preface 2
Introduction 2
Installing and Configuring Kafka 2
Downloading Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Configuring Kafka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Listing Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Describing a Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Consuming Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Consumer Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Consumer Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Kafka Connect 4
Source Connectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Sink Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Kafka Streams 5
Key Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Windowing Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Interactive Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Kafka Security 6
Authentication and Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Secure Replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Replication Factor 7
How Replication Factor Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Partitions 8
How Partitions Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Benefits of Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Partition Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Modifying Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Batch Size 9
How Batch Size Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Compression 10
How Compression Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Retention Policy 11
How Retention Policy Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Size-based Retention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Log Compaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
PREFACE
PREFACE CONFIGURING KAFKA
This cheatsheet is designed to be your quick and Navigate to the config directory and modify the
handy reference to the essential concepts, following configuration files as needed:
commands, and best practices associated with
Apache Kafka. Whether you are a seasoned Kafka server.properties: Main Kafka broker
expert looking for a convenient memory aid or a configuration.
newcomer exploring the world of distributed
systems, this cheatsheet will serve as your reliable zookeeper.properties: ZooKeeper configuration for
companion. Kafka.
INTRODUCTION STARTINGKAFKA
STARTING KAFKAAND
ANDZOOKEEPER
ZOOKEEPER
INTRODUCTION
Apache Kafka is a distributed event streaming To run Kafka, you need to start ZooKeeper first, as
platform designed to handle large-scale real-time Kafka depends on ZooKeeper for maintaining its
data streams. It was originally developed by cluster state. Here’s how to do it:
LinkedIn and later open-sourced as an Apache
project. Kafka is known for its high-throughput, STARTING ZOOKEEPER
fault-tolerance, scalability, and low-latency
characteristics, making it an excellent choice for
various use cases, such as real-time data pipelines, bin/zookeeper-server-start.sh
stream processing, log aggregation, and more. config/zookeeper.properties
To list all the topics in the Kafka cluster, use the Consumer groups allow multiple consumers to
following command: work together to read from a topic. Each consumer
in a group will get a subset of the messages. To use
consumer groups, provide a group id when
bin/kafka-topics.sh --list consuming messages:
--bootstrap-server localhost:9092
bin/kafka-console-consumer.sh
DESCRIBING A TOPIC --topic my_topic --bootstrap-server
localhost:9092 --group
To get detailed information about a specific topic, my_consumer_group
use the following command:
CONFIGURINGKAFKA
CONFIGURING KAFKAPRODUCERS
PRODUCERS
bin/kafka-topics.sh --describe
AND
AND CONSUMERS
CONSUMERS
--topic my_topic --bootstrap-server
localhost:9092
Kafka provides various configurations for
producers and consumers to optimize their
behavior. Here are some essential configurations:
PRODUCINGAND
PRODUCING ANDCONSUMING
CONSUMING
MESSAGES
MESSAGES PRODUCER CONFIGURATION
Now that we have a topic, let’s explore how to
To configure a Kafka producer, create a
produce and consume messages in Kafka.
producer.properties file and set properties like
bootstrap.servers, key.serializer, and
PRODUCING MESSAGES value.serializer.
bootstrap.servers=localhost:9092
bin/kafka-console-producer.sh
key.serializer=org.apache.kafka.comm
--topic my_topic --bootstrap-server
on.serialization.StringSerializer
localhost:9092
value.serializer=org.apache.kafka.co
mmon.serialization.StringSerializer
After running this command, you can start typing
your messages. Press Enter to send each message.
Use the following command to run the producer
with the specified configuration:
CONSUMING MESSAGES
SINK CONNECTORS
# consumer.properties
Sink Connectors allow you to export data from
bootstrap.servers=localhost:9092 Kafka to external systems. They act as consumers,
key.deserializer=org.apache.kafka.co reading data from Kafka topics and writing it to the
mmon.serialization.StringDeserialize target systems. Some popular sink connectors
r include:
value.deserializer=org.apache.kafka.
• JDBC Sink Connector: Writes data from Kafka
common.serialization.StringDeseriali
topics to relational databases using JDBC.
zer
group.id=my_consumer_group • HDFS Sink Connector: Stores data from Kafka
topics in Hadoop Distributed File System
(HDFS).
Run the consumer using the configuration file:
• Elasticsearch Sink Connector: Indexes data
from Kafka topics into Elasticsearch for search
bin/kafka-console-consumer.sh and analysis.
name=my-file-source-connector
Kafka Connect consists of two main components:
connector.class=org.apache.kafka.con
Source Connectors and Sink Connectors.
nect.file.FileStreamSourceConnector
tasks.max=1
SOURCE CONNECTORS
file=/path/to/inputfile.txt
Source Connectors allow you to import data from topic=my_topic
various external systems into Kafka. They act as
producers, capturing data from the source and
writing it to Kafka topics. Some popular source RUNNING KAFKA CONNECT
connectors include:
To run Kafka Connect, you can use the connect-
• JDBC Source Connector: Captures data from standalone.sh or connect-distributed.sh scripts that
relational databases using JDBC. come with Kafka.
script to run connectors in standalone mode: incoming data records and produces new
output records.
Kafka Connect exposes several metrics that can be Create a Properties Object
monitored for understanding the performance and
health of your connectors. You can use tools like Start by creating a Properties object to configure
JConsole, JVisualVM, or integrate Kafka Connect your Kafka Streams application. This includes
with monitoring systems like Prometheus and properties like the Kafka broker address,
Grafana to monitor the cluster. application ID, default serializers, and deserializers.
KAFKASTREAMS
KAFKA STREAMS
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_
Kafka Streams is a client library in Apache Kafka
ID_CONFIG, "my-streams-app");
that enables real-time stream processing of data. It
allows you to build applications that consume data
props.put(StreamsConfig.BOOTSTRAP_SE
from Kafka topics, process the data, and produce RVERS_CONFIG, "localhost:9092");
the results back to Kafka or other external systems. props.put(StreamsConfig.DEFAULT_KEY_
Kafka Streams provides a simple and lightweight SERDE_CLASS_CONFIG,
approach to stream processing, making it an Serdes.String().getClass());
attractive choice for building real-time data props.put(StreamsConfig.DEFAULT_VALU
processing pipelines. E_SERDE_CLASS_CONFIG,
Serdes.String().getClass());
KEY CONCEPTS
Before diving into the details of Kafka Streams, let’s Define the Topology
explore some key concepts:
Next, define the topology of your Kafka Streams
• Stream: A continuous flow of data records in application. This involves creating processing steps
Kafka is represented as a stream. Each record and connecting them together.
in the stream consists of a key, a value, and a
timestamp.
StreamsBuilder builder = new
• Processor: A processor is a fundamental StreamsBuilder();
building block in Kafka Streams that processes
brokers in the cluster. By having multiple replicas message’s key or using a round-robin mechanism if
of each partition, Kafka ensures that even if some no key is provided.
brokers or machines fail, the data remains
accessible and the cluster remains operational. BENEFITS OF PARTITIONS
To increase the number of partitions, you can Monitoring the batch size is crucial for optimizing
create a new topic with the desired partition count producer performance. You can use Kafka’s built-in
and use Kafka tools like kafka-topics.sh to reassign metrics and monitoring tools to track batch size-
messages from the old topic to the new one. related metrics, such as average batch size,
maximum batch size, and batch send time.
Decreasing Partitions
COMPRESSION
COMPRESSION Compression Description
Algorithm
Compression in Apache Kafka is a feature that
allows data to be compressed before it is stored on LZ4 LZ4 is another fast
brokers or transmitted between producers and compression algorithm
consumers. Kafka supports various compression that provides even
algorithms to reduce data size, improve network lower compression
utilization, and enhance overall system ratios than Snappy but
performance. Understanding compression options with even lower
in Kafka is essential for optimizing storage and data processing overhead.
transfer efficiency. Like Snappy, it is well-
suited for low-latency
use cases.
HOW COMPRESSION WORKS
Zstandard (Zstd) Zstd is a more recent
When a producer sends messages to Kafka, it can addition to Kafka’s
choose to compress the messages before compression options. It
transmitting them to the brokers. Similarly, when provides a good balance
messages are stored on the brokers, Kafka can between compression
apply compression to reduce the storage footprint. ratios and processing
On the consumer side, messages can be speed, making it a
decompressed before being delivered to consumers. versatile choice for
various use cases.
COMPRESSION ALGORITHMS IN KAFKA
CONFIGURING COMPRESSION IN KAFKA
Kafka supports the following compression
algorithms: To enable compression in Kafka, you need to
configure the producer and broker properties.
Compression Description
Algorithm Producer Configuration
reduce storage requirements for stateful data in the retention policy determines when Kafka will
Kafka Streams application. automatically delete old data from topics, helping to
manage storage usage and prevent unbounded data
growth.
CONSIDERATIONS FOR COMPRESSION
In addition to time-based retention, Kafka also Storage Capacity Ensure that your Kafka
supports size-based retention. With size-based cluster has sufficient
retention, you can set a maximum size for the storage capacity to
partition log. Once the log size exceeds the specified retain data for the
value, the oldest messages in the log are deleted to desired retention
make space for new messages. period, especially if you
are using size-based
To enable size-based retention, you can use the retention or log
log.retention.bytes property. For example: compaction.
compaction retains only the latest message for each retention period to
unique key in a topic, ensuring that the most recent allow consumers to
value for each key is always available. This feature catch up.
is useful for maintaining the latest state of an entity Message Importance For some topics, older
or for storing changelog-like data. messages might become
less important over
To enable log compaction for a topic, you can use time. In such cases, you
the cleanup.policy property. For example: can use a shorter
retention period to
reduce storage usage.
cleanup.policy=compact
KAFKAMONITORING
KAFKA MONITORINGAND
AND
CONSIDERATIONS FOR RETENTION MANAGEMENT
MANAGEMENT
POLICY
Monitoring Kafka is essential to ensure its smooth
When configuring the retention policy, consider the operation. Here are some tools and techniques for
following factors: effective Kafka monitoring:
Tool/component Description
CONCLUSION
CONCLUSION
JCG delivers over 1 million pages each month to more than 700K software
developers, architects and decision makers. JCG offers something for everyone,
including news, tutorials, cheat sheets, research guides, feature articles, source code
and more.
CHEATSHEET FEEDBACK
WELCOME
support@javacodegeeks.com
Copyright © 2014 Exelixis Media P.C. All rights reserved. No part of this publication may be SPONSORSHIP
reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, OPPORTUNITIES
mechanical, photocopying, or otherwise, without prior written permission of the publisher. sales@javacodegeeks.com