Professional Documents
Culture Documents
Cours - Kafka
Cours - Kafka
Cours - Kafka
IPPON 2019
2018
What is Kafka Kafka architecture Produce and consume
messages
● Message bus
○ Written in Scala
○ Heavily inspired by transaction logs
● Initially created at LinkedIn in 2010
○ Open sourced in 2011
○ Became an apache top level project in 2012
● Designed to support batch and real time analytics
● Performs very well, especially at very large scale
What is Confluent ?
● Producing messages
Partitioning
The architecture ●
● Consuming messages
● Zookeeper
Fundamentals
● Apache project
● It is a configuration centralisation tool
● It is used by Kafka’s internals
Global architecture
● HDFS & RDBMS
C A P
Kafka X X
MongoDB X X
Cassandra X X
HDFS X X
● Partitions
● For each consumer group and each partition, Kafka keeps an offset (an
integer)
● It is the position of the last element read by a given consumer group in a
given partition
Offset
● When a consumer asks for a message, Kafka search for any offset it has
for this consumer group (in any partition of the requested topic) and send
the corresponding message
● When a consumer gets a message, it will commit it
● When a consumer commits, Kafka increments the offset for the given
partition
● We can ask Kafka to read from a specific offset. Thus the consumer can
consume from wherever it wants
Réplicas
Topic-Partition-1 Topic-Partition-2
Topic-Replica-2 Topic-Replica-1
Broker 1 Broker 2
Réplicas
● If a broker is down, the replica becomes the leading partition and thus we
can still consume / produce messages
Topic-Partition-1 Topic-Partition-2
Topic-Replica-2 Topic-Partition-1
Broker 1 Broker 2
● Start Kafka
● Produce
consume ● Consume
Start Kafka
● To produce a message
○ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
● To consume a topic
○ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
--from-beginning
Scala dependencies
● To send a message
Produce
● Schema registry
○ Offers possibility to apply schemas to messages
● Kafka Streams
○ High level library (offers a DSL) to transform data between topics
○ Plays the role of T in ETL
● Kafka Connect
○ Offers connectors to supply Kafka with data or transform data from Kafka to other
systems
■ There are connectors for HDFS, file system, cassandra etc.
○ Plays the role of E in ETL if the connector is a source and L if it is a sink
● etc.
Kafka Streams
● Simple example
Kafka Streams
● Simple example
Kafka Streams