Kafka: Apache Kafka Is Distributed Publish-Subscribe Messaging System Ex: Youtube Is Publish-Subscribe of Video System

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Kafka

Apache Kafka is distributed publish-subscribe messaging system


Ex: YouTube is publish-subscribe of video system.
Broker/Server
Broker Cluster:
If one broker/server fails another broker will
serve the purpose.

Broker:
1. Kafka broker saves msg’s in files on hard-drive.
2. Producers are able to append msg’s to those
files.
3. Consumers are able to read from the same
files.

What if broker fails?


-Nobody will be able to serve producer and
consumer; for that we have cluster of
brokers.
Zookeeper
If there are multiple brokers; how the brokers synchronize/communicate between each other ?

1. Maintains list of active brokers.


2. Elects controller(there is a single controller
in every cluster.)
3. Manages configuration of topics and
partition.

What if zookeeper fails?


-we have cluster of zookeeper, known as ensemble.
Ensemble and quorum
Usually the count of zookeeper cluster should be
a odd number, known as quorum.
Minimum quorum should be set.
For 4 servers:
1. If 1 of 3 fails ensemble will be still up.
2. If 2 of 3 fails ensemble will be down.

How to decide the quorum?


If 9 servers = quorum 5
If 15 servers = quorum 8

Formula:
Quorum = (n+1)/2
n = qty of servers.

Similarly many kafka clusters can be as above.


Ports
• Zookeeper localhost: 2181
- For multiple zookeeper create multiple ports
- For ex :2181 ,:2182 , :2183

• Kafka server(broker) localhost: 9092


- For multiple servers create multiple ports
- For ex :9091, :9092, :9093
Topics 1. Every kafka topic has unique name
2. Every message in it has its own unique number.
Topic A 3. New msgs can only append to existing msgs, we cannot
insert msgs in between.
4. All the msgs would be deleted automatically if retention
period is expired
i.e 7 days(168 hrs), however we can extend it.
Every message and offset no. is
immutable, and small in size.

Message Structure:
1. Timestamp(can be assigned by
broker or producer)
2. Offset no. (unique across partition)
3. Key(optional)
4. Value(seq. of bytes)
Partition
• Topics can be further divided into partitions.
• Which divides the data on multiple servers.
• Single topic can have multiple partitions in different server or on single server as well
like topic D
• If partition 0 fails, topic A will
still work on partition 2 .
• Avoid creating single partition
on 1 server.
Broker and partition.

• Producer decides which partition to write msg.


• Every msg in a partition has unique no and
starts with zero.
• With partition we achieve parallel writing and
reading operation.
• IF BROKER 1 is fails, partition 1 will loose all the
msg’s, and we cannot consume them back.
• For this use replication.
Leader and follower
• Partition leader handles partition read/write.
• We can create multiple replicas of partition.
• Followers do not serve to consumers or append
data from producers, only leader does that.
• Producer/consumer only communicate through
leader.
• Now, if broker 1 fails the ownership is transferred
to another broker and it will act as leader.
• For production: replication factor =3 is
recommended.
Controller

Zookeeper just elects the controller


If broker 1 fails partition 1 also fails,
Producers

1. Producer writes messages to partitions.


2. Single producer can write into multiple
partitions also.
3. Multiple producers can write into same
partition.
4. Messages we append to existing messages.
Consumers
1. Consumer can consume msg from one
partition or several partition or all
partitions.
2. Consumer can also tell the kafka cluster
if it wants msgs from beginning or the
latest.
3. Consumer can also specify specific offset
number from where he wants msgs.
4. Multiple consumers can read same msgs
parallelly

You might also like