Professional Documents
Culture Documents
Kafka
Kafka
Apache Kafka
Apache Kafka is an open-source, high performance, fault-tolerant, and scalable platform
for building real-time streaming data pipelines and applications. Apache Kafka is a
streaming data store that decouples applications producing streaming data (producers)
into its data store from applications consuming streaming data (consumers) from its
data store. Organizations use Apache Kafka as a data source for applications that
continuously analyze and react to streaming data.
Amazon MSK
Amazon MSK is a new AWS streaming data service that manages Apache Kafka infrastructure
and operations, making it easy for developers and DevOps managers to run Apache Kafka
applications on AWS without the need to become experts in operating Apache Kafka clusters.
Amazon MSK is an ideal place to run existing or new Apache Kafka applications in AWS. Amazon
MSK operates and maintains Apache Kafka clusters, provides enterprise-grade security features
out of the box, and has built-in AWS integrations that accelerate development of streaming
data applications. To get started, migrate existing Apache Kafka workloads into Amazon MSK, or
with a few clicks, build new ones from scratch in minutes. There are no data transfer charges
for in-cluster traffic, and no commitments or upfront payments required. Only pay for the
resources that you use.
With a few clicks in the console, provision an Amazon MSK cluster. From there, Amazon MSK
replaces unhealthy brokers, automatically replicates data for high availability, manages Apache
ZooKeeper nodes, automatically deploys hardware patches as needed, manages the
integrations with AWS services, makes important metrics visible through the console, and
supports Apache Kafka version upgrades so you can take advantage of improvements to the
open-source version of Apache Kafka.
Amazon MSK Cluster Creation:
Customizable Configurations:
auto.create.topics.enable=false
default.replication.factor=3
min.insync.replicas=2
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000
AWS
Ask Recommended Available Options
Apache Kafka Version to be installed 2.6.1 2.2.1 to 2.8.0
Number of HA zones 3 2
Zone name for HA
Broker Type (vCPU & Memory)
Number of broker per Zone
EBS Volume Per Zone
1. IAM access control
2. SASL/SCRAM authentication
3. TLS client authentication
through
Access Control Methods 4. AWS certificate manager (ACM)
1. Deliver to Amazon Cloud Watch
Logs
2. Deliver to Amazon S3
3. Deliver to Amazon Kinesis Data
Firehose
4. Ingest logs to Third party tools
Broker Log Delivery like Splunk, ELK, etc
Can we enable encryption within clusters? (Customer Choice)
1. Aws managed CMK
Encryption data at rest 2. Customer Managed CMK
1. Cloud Watch
Monitoring 2.Prometheus
MSK CLUSTER SIZING AND COSTS
Hourly Cost Monthly Cost
Instance Recommended Broke Data AZ
Type Brokers r Storage Transfer* Total Broker+Storage Transfer Total
t3.small 3 0.14 0.06 0.09 0.29 142.05 68.44 210.49
m5.large 3 0.63 0.06 0.09 0.78 502.09 68.44 570.53
m5.xlarge 3 1.26 0.06 0.09 1.41 961.99 68.44 1030.43
m5.2xlarge 3 2.52 0.06 0.09 2.67 1881.79 68.44 1950.23
m5.4xlarge 3 5.04 0.06 0.09 5.19 3721.39 68.44 3789.83
m5.8xlarge 3 10.08 0.06 0.09 10.23 7400.59 68.44 7469.03
m5.12xlarge 3 15.12 0.06 0.09 15.27 11079.79 68.44 11148.23
m5.16xlarge 3 20.16 0.06 0.09 20.31 14758.99 68.44 14827.43
m5.24xlarge 3 30.24 0.06 0.09 30.39 22117.39 68.44 22185.83
us-east-1 pricing
*Cross AZ Costs
Refer: https://docs.aws.amazon.com/msk/latest/developerguide/version-
upgrades.html
Deployment considerations and patterns for Kafka in EC2 Instances:
One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and
consumer instances.
Pros Cons
Highly available Very high operational overhead:
Can sustain the failure of All changes need to be deployed three times, one for
two AZs each Kafka cluster
No message loss during
Maintaining and monitoring three Kafka clusters
failover
Simple deployment Maintaining and monitoring three consumer clusters
Another typical deployment pattern (active-standby) is in a single AWS Region with a single
Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar
Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka
mirroring with MirrorMaker to replicate messages between any two clusters.
Kafka producers are deployed on all three AZs.
Only one Kafka cluster is deployed across three AZs (active).
ZooKeeper instances are deployed on each AZ.
Brokers are spread evenly across all three AZs.
Kafka consumers can be deployed across all three AZs.
Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.
Pros Cons
Less operational overhead
Added latency due to cross-AZ data transfer among
when compared to the first
Kafka brokers
option
Only one Kafka cluster to For Kafka versions before 0.10, replicas for topic
manage and consume data partitions have to be assigned so they’re distributed
from to the brokers on different AZs (rack-awareness)
Can handle single AZ The cluster can become unavailable in case of a
failures without activating a network glitch, where ZooKeeper does not see
standby Kafka cluster Kafka brokers
Possibility of in-transit message loss during failover
Upgrades:
Rolling or in-place upgrade
In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into
consideration the recommendations for doing rolling restarts to avoid downtime for end users.
Downtime upgrade
Take entire cluster down, upgrade each Kafka broker, and then restart the cluster.
Blue/green upgrade