Kafka Streams _ Part I. Kafka Streams is a Powerful Library for… _ by Kamini Kamal _ Medium

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

11/7/24, 21:25 Kafka Streams | Part I.

Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

Open in app Sign up Sign in

Search

Kafka Streams | Part I


Kamini Kamal · Follow
5 min read · Jul 30, 2023

Listen
Be part of a better internet.
Share

Get 20% off membership for a limited time.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Kafka Connectors
Sign up for free

Kafka Streams is a powerful library for building stream-processing applications


using Apache Kafka. It provides a high-level DSL (Domain-Specific Language) and
APIs for processing, transforming,
Membershipand analyzing continuous streams of records.
Get 20% off

Here are some key features of Kafka Streams:


Read member-only stories
1. Stream Processing: Kafka Streams enable real-time processing of streams of
records. It allows youSupport writers you
to consume readdata
input most from Kafka topics, apply operations

and transformations Earn


on the data,
money andwriting
for your produce results back to Kafka topics.
Stream processing can be performed on individual records or in windowed
Listen to audio narrations
aggregations, enabling near real-time analytics, monitoring, and data
Read offline with the Medium app
enrichment.

2. Event-Time Processing: Kafka Streams provides support for event-time


Try for 5 $ 4 $/month
processing, allowing you to handle out-of-order records by using timestamps

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 1/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

associated with the records. It offers windowing operations based on event-time


semantics, enabling time-based aggregations, sessionization, and window joins.

3. Stateful Processing: Kafka Streams allows you to maintain and update the state
during stream processing. It provides built-in support for state stores, which are
key-value stores that can be queried and updated within a processing topology.
Stateful operations enable advanced stream processing tasks such as joins,
aggregations, and anomaly detection.

4. Exactly-Once Processing: Kafka Streams provides end-to-end exactly-once


processing semantics, ensuring that each record is processed exactly once, even
Be part
in the presence of failures. Thisof
is a betterbyinternet.
achieved leveraging the strong durability
guaranteesGet 20% off membership
and transactional capabilities of for a limited
Apache Kafka. time.

5. Windowing and Time-Based Operations: Kafka Streams offers a range of


windowing operations,
Free allowing you to perform computations over fixed time
windows, tumbling windows, sliding windows, or session windows. Windowed
Distraction-free
operations enable time-based reading. No ads.
aggregations, time-sensitive analysis, and event-
based triggers. Organize your knowledge with lists and highlights.

6. Join Operations: Kafka


TellStreams
your story.provides join operations to combine data from
Find your audience.
multiple streams or tables based on key matching. It supports inner joins, left
joins, and outer joins, allowing you to perform powerful data integrations and
enrichments in real time.

7. Interactive Queries: Kafka Streams enables interactive querying of the state


stores within a stream processing application. This allows applications to
Membership
respond to ad-hoc queries by serving the latest stateGetor20%
off
aggregated results,
making it suitable for building interactive real-time dashboards and
Read member-only stories
applications.
Support writers you read most
8. Integration with Kafka Ecosystem: Kafka Streams seamlessly integrates with
Earn money
Apache Kafka, leveraging for your
Kafka’s writing
distributed storage, scalability, and fault
tolerance. It also integrates
Listen to with other components of the Kafka ecosystem, such
audio narrations
as Kafka Connect for easy data integration and Kafka’s built-in security and
Read offline with the Medium app
authentication mechanisms.

9. Developer-Friendly APIs: Kafka Streams provides high-level DSL and APIs that
are designed to be developer-friendly and easy to use. It offers a functional
programming model with operators and fluent API syntax, making it accessible

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 2/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

for developers to express complex stream processing logic in a concise and


readable manner.

Kafka Streams is a versatile and robust stream processing library that allows you to
build scalable, fault-tolerant, and real-time applications for processing continuous
streams of data. Its features empower developers to implement sophisticated stream
processing tasks, enabling real-time analytics, data transformations, event-driven
architectures, and more.

W here are Kstream values stored?

BeStreams
In Apache Kafka’s part oflibrary,
a better internet.
the values of a KStream are not stored in
Get 20%
Kafka itself. Instead, off membership
the values fortransformed
are processed and a limited in-memory
time. within
the stream processing application.

Free
When you define a KStream in your Kafka Streams application, it represents an
abstraction over the input topic(s) from which the stream is consumed. The stream
Distraction-free reading. No ads.
processing operations defined on the KStream, such as filtering, mapping,
Organize
aggregating, or joining, are your to
applied knowledge with lists
the records asand highlights.
they are consumed.
Tell your story. Find your audience.
The processed records and intermediate results are stored in memory within the
stream processing application. The Kafka Streams library provides mechanisms to
manage and maintain this in-memory state across multiple instances or threads of
the application. The state can include windowed state, key-value stores, or any other
stateful data structures used by the application for stream processing.

The in-memory state allows the application to maintainGet


Membership context
20% off and store the

necessary information required for processing, such as aggregations, join results, or


windowed computations.Read
This state is constantly
member-only stories updated as new records are
processed, and it is usedSupport
to generate the output records or further process the
writers you read most
stream.
Earn money for your writing

It’s important to note thatListen


the to
in-memory state is transient and exists only within the
audio narrations
lifetime of the stream processing application. If the application is restarted or stops
Read offline with the Medium app
processing, the state is lost, and the application would need to rebuild the state from
the input topics or any external data sources upon restart.

However, the input and output records of the KStream can be stored in Kafka topics
if desired. The processed records can be written to new topics using the Kafka

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 3/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

producer, and the results can be consumed from these output topics by other
applications or downstream processes.

So, while Kafka is the underlying messaging system used for input and output
topics, the actual values of a KStream in the Kafka Streams library are stored and
processed in-memory within the stream processing application.

T hen what makes Kafka Streams fault-tolerant?

While Kafka Streams processes data in-memory, it still provides fault


tolerance through several mechanisms to ensure data integrity and resiliency in the
Be
event of failures. Here are part
some keyofaspects
a better
that internet.
make Kafka Streams fault-tolerant:
Get 20% off membership for a limited time.
1. Input Topic Replication: Kafka itself provides fault tolerance through topic
replication. Input topics can be configured with a replication factor greater than
one, meaning thatFree
multiple replicas of each partition are maintained across
different brokers. If a broker or partition becomes unavailable, Kafka
Distraction-free
automatically redirects consumers reading.
to theNoavailable
ads. replicas, ensuring
continuous data ingestion.
Organize your knowledge with lists and highlights.

Tell your
2. Stateful Processing and story. FindTopics:
Changelog your audience.
Kafka Streams maintains the
necessary state information for stream processing. Intermediate results,
aggregations, and stateful operations are stored in internal state stores. These
state stores are also backed by Kafka topics called changelog topics, which
record all updates to the state stores. This allows the state to be reconstructed in
case of failures or application restarts.
Membership Get 20% off
3. Offset Management: Kafka Streams tracks the offsets of consumed records and
periodically commits them to Kafka. This enables the library to resume
Read member-only stories
processing from the last committed offset in case of failures or restarts. It
Support writers
ensures that the application you read
can pick up most
from where it left off without
reprocessing previously
Earn processed records.
money for your writing

4. Application Rebalancing:
Listen Kafka
to audioStreams
narrations leverages Kafka’s consumer group

mechanism for fault Read


tolerance. If an instance of the stream processing
offline with the Medium app
application fails or new instances are added, Kafka Streams automatically
triggers a rebalancing process. During rebalancing, partitions and tasks are
redistributed among the active instances to ensure even workload distribution
and maintain fault tolerance.

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 4/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

5. Exactly-Once Semantics: Kafka Streams provides support for exactly-once


processing semantics, which guarantees that each record will be processed
exactly once, even in the face of failures or retries. This is achieved through the
combination of transactional producer and consumer operations, along with the
use of internal Kafka topics for storing offsets and maintaining state.

These fault-tolerance mechanisms in Kafka Streams ensure that data integrity is


maintained, stateful processing is resilient to failures, and the application can
continue processing from the last known state in case of disruptions. This allows
stream processing applications built with Kafka Streams to handle failures
Be part
gracefully and provide reliable andof a better internet.
fault-tolerant processing of data streams.

Kafka Streams
GetKafka
20% offKafka
membership
Connect
for a limited time.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Follow

Written by Kamini Kamal


452 Followers Membership Get 20% off

Software Engineer
Read member-only stories

Support writers you read most

More from Kamini Kamal Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 5/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

Be part of a better internet.


Get 20% off membership for a limited time.
Kamini Kamal

Free
KStream vs KTable
KStream and KTable are two important concepts in Apache Kafka’s Streams library, which is a
Distraction-free reading. No ads.
powerful framework for building real-time…
Organize your knowledge with lists and highlights.
Jul 30, 2023 91
Tell your story. Find your audience.

Membership Get 20% off

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app


Kamini Kamal in Level Up Coding

Optimize the performance of the poll loop in Kafka Consumer


The poll loop is a critical component of the Kafka consumer API. It is responsible for fetching
messages from the Kafka brokers and…

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 6/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

Apr 18, 2023 308 4

Be part of a better internet.


Get 20% off membership for a limited time.

Free

Kamini Kamal in Javarevisited


Distraction-free reading. No ads.

Running Kafka without zookeeper


Organize your knowledge with lists and highlights.
Kafka uses the Raft consensus algorithm for leader election in its Kafka Raft Metadata mode,
Tell youron
which eliminates the dependency story. Find your audience.
ZooKeeper…

Apr 25, 2023 55 2

Membership Get 20% off

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Kamini Kamal in Level Up Coding

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 7/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

Idempotent producer in Kafka


An idempotent producer in Apache Kafka is a producer configuration setting that ensures that
messages sent by the producer are delivered to…

Jun 28, 2023 98 3

See all from Kamini Kamal

Be part of a better internet.


Get 20% off membership for a limited time.

Recommended from Medium


Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership Get 20% off

Read member-only stories

Support writers you read most


Gavin F.
Earn money for your writing
Kafka Streams — How to magically join multiple data streams
Listen to audio narrations
Seamless Kafka Streams joining just like SQL table joins
Read offline with the Medium app
Nov 6, 2023 250 3

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 8/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

Be part of a better internet.


Get 20% off membership for a limited time.
Kishore Veeramallu

Free
Kafka Streaming — Part 1
Overview of Kafka
Distraction-free reading. No ads.

Jun 16 2 Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Lists

Staff Picks
685 stories · 1126 saves

Membership
Stories to Help You Level-Up at Work
Get 20% off

19 stories · 686 saves


Read member-only stories
Self-Improvement 101
Support
20 stories · 2282 saveswriters you read most

ProductivityEarn
101 money for your writing
20 stories · 2015 saves
Listen to audio narrations

Read offline with the Medium app

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 9/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

Be part of a better internet.


Get 20% off membership for a limited time.
AnalytixLabs

Free
Flink vs. Kafka: A Quick Guide to Stream Processing Engines
The surge in data generation, fueled by IoT and digitization, has led to the challenge of
handling massive datasets, Distraction-free
commonly known reading. No ads.
as big…
Organize your knowledge with lists and highlights.
Jan 22 79
Tell your story. Find your audience.

Membership Get 20% off

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app


Kai Waehner

The Data Streaming Landscape 2024


Data Streaming landscape 2024: Comparison of open source Apache Kafka and Flink stream
processing products, cloud, competition, market…

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 10/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

Mar 16 563 6

Be part of a better internet.


Get 20% off membership for a limited time.

Free

SPOUD
Distraction-free reading. No ads.

Kafka Streams — Working with


Organize your Time
knowledge with lists and highlights.
Time based aggregations are frequently required in streaming applications. We may want
Tell your story.
count daily visitors, create monthly Find your audience.
revenue…

Jan 29 3

Membership Get 20% off

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Alexander Nguyen in Level Up Coding

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 11/12
11/7/24, 21:25 Kafka Streams | Part I. Kafka Streams is a powerful library for… | by Kamini Kamal | Medium

The resume that got a software engineer a $300,000 job at Google.


1-page. Well-formatted.

Jun 1 12.1K 162

See more recommendations

Be part of a better internet.


Get 20% off membership for a limited time.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership Get 20% off

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

https://medium.com/@kamini.velvet/kafka-streams-part-i-373a5e09a539 12/12

You might also like