Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Kafka

Apache Kafka
Apache Kafka is an open-source, high performance, fault-tolerant, and scalable platform
for building real-time streaming data pipelines and applications. Apache Kafka is a
streaming data store that decouples applications producing streaming data (producers)
into its data store from applications consuming streaming data (consumers) from its
data store. Organizations use Apache Kafka as a data source for applications that
continuously analyze and react to streaming data.

Amazon MSK
Amazon MSK is a new AWS streaming data service that manages Apache Kafka infrastructure
and operations, making it easy for developers and DevOps managers to run Apache Kafka
applications on AWS without the need to become experts in operating Apache Kafka clusters.
Amazon MSK is an ideal place to run existing or new Apache Kafka applications in AWS. Amazon
MSK operates and maintains Apache Kafka clusters, provides enterprise-grade security features
out of the box, and has built-in AWS integrations that accelerate development of streaming
data applications. To get started, migrate existing Apache Kafka workloads into Amazon MSK, or
with a few clicks, build new ones from scratch in minutes. There are no data transfer charges
for in-cluster traffic, and no commitments or upfront payments required. Only pay for the
resources that you use.
With a few clicks in the console, provision an Amazon MSK cluster. From there, Amazon MSK
replaces unhealthy brokers, automatically replicates data for high availability, manages Apache
ZooKeeper nodes, automatically deploys hardware patches as needed, manages the
integrations with AWS services, makes important metrics visible through the console, and
supports Apache Kafka version upgrades so you can take advantage of improvements to the
open-source version of Apache Kafka.
Amazon MSK Cluster Creation:

Customizable Configurations:
auto.create.topics.enable=false
default.replication.factor=3
min.insync.replicas=2
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000

AWS
Ask Recommended Available Options
Apache Kafka Version to be installed 2.6.1 2.2.1 to 2.8.0
Number of HA zones 3 2
Zone name for HA    
Broker Type (vCPU & Memory)    
Number of broker per Zone    
EBS Volume Per Zone    
1. IAM access control
2. SASL/SCRAM authentication
3. TLS client authentication
through
Access Control Methods   4. AWS certificate manager (ACM)
1. Deliver to Amazon Cloud Watch
Logs
2. Deliver to Amazon S3
3. Deliver to Amazon Kinesis Data
Firehose
4. Ingest logs to Third party tools
Broker Log Delivery   like Splunk, ELK, etc
Can we enable encryption within clusters?  (Customer Choice)  
1. Aws managed CMK
Encryption data at rest   2. Customer Managed CMK
 1. Cloud Watch
Monitoring   2.Prometheus
MSK CLUSTER SIZING AND COSTS
Hourly Cost Monthly Cost
Instance Recommended Broke Data AZ
Type Brokers r Storage Transfer* Total Broker+Storage Transfer Total
t3.small 3 0.14 0.06 0.09 0.29 142.05 68.44 210.49
m5.large 3 0.63 0.06 0.09 0.78 502.09 68.44 570.53
m5.xlarge 3 1.26 0.06 0.09 1.41 961.99 68.44 1030.43
m5.2xlarge 3 2.52 0.06 0.09 2.67 1881.79 68.44 1950.23
m5.4xlarge 3 5.04 0.06 0.09 5.19 3721.39 68.44 3789.83
m5.8xlarge 3 10.08 0.06 0.09 10.23 7400.59 68.44 7469.03
m5.12xlarge 3 15.12 0.06 0.09 15.27 11079.79 68.44 11148.23
m5.16xlarge 3 20.16 0.06 0.09 20.31 14758.99 68.44 14827.43
m5.24xlarge 3 30.24 0.06 0.09 30.39 22117.39 68.44 22185.83

us-east-1 pricing

*Cross AZ Costs

EC2 SELF MANAGED KAFKA COSTS


Recomm Hourly Cost Hourly Monthly
Instance ended Data +
Type Brokers Broker Storage Transfer* Replication Zookeeper Engineers Total Devs Total + Devs
t3.small 3 0.04 0.06 0.09 0.14 0.37 34.25 0.71 34.95 515.50 25515.50
m5.large 3 0.18 0.06 0.09 0.14 0.37 34.25 0.85 35.09 618.43 25618.43
m5.xlarge 3 0.36 0.06 0.09 0.14 0.37 34.25 1.03 35.28 752.02 25752.02
m5.2xlarge 3 0.73 0.06 0.09 0.14 0.37 34.25 1.39 35.64 1017.01 26017.01
m5.4xlarge 3 1.45 0.06 0.09 0.14 0.37 34.25 2.12 36.37 1546.99 26546.99
m5.8xlarge 3 2.90 0.06 0.09 0.14 0.37 34.25 3.57 37.82 2606.95 27606.95
m5.12xlarge 3 4.36 0.06 0.09 0.14 0.37 34.25 5.02 39.27 3666.91 28666.91
m5.16xlarge 3 5.81 0.06 0.09 0.14 0.37 34.25 6.47 40.72 4724.68 29724.68
m5.24xlarge 3 8.71 0.06 0.09 0.14 0.37 34.25 9.38 43.62 6844.60 31844.60

MSK Version Upgrade:


Amazon MSK supports fully managed in-place Apache Kafka version upgrades.

Refer: https://docs.aws.amazon.com/msk/latest/developerguide/version-
upgrades.html
Deployment considerations and patterns for Kafka in EC2 Instances:

Single AWS Region, Three Availability Zones, All Active:

 One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and
consumer instances.

 Kafka producers and Kafka cluster are deployed on each AZ.


 Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
 Kafka consumers aggregate data from all three Kafka clusters.
Kafka cluster failover occurs this way:

 Mark down all Kafka producers


 Stop consumers
 Debug and restack Kafka
 Restart consumers
 Restart Kafka producer

Pros Cons
 Highly available  Very high operational overhead:
 Can sustain the failure of  All changes need to be deployed three times, one for
two AZs each Kafka cluster
 No message loss during
 Maintaining and monitoring three Kafka clusters
failover
 Simple deployment  Maintaining and monitoring three consumer clusters

Single Region, Three Availability Zones, Active-Standby

Another typical deployment pattern (active-standby) is in a single AWS Region with a single
Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar
Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka
mirroring with MirrorMaker to replicate messages between any two clusters.
 Kafka producers are deployed on all three AZs.
 Only one Kafka cluster is deployed across three AZs (active).
 ZooKeeper instances are deployed on each AZ.
 Brokers are spread evenly across all three AZs.
 Kafka consumers can be deployed across all three AZs.
 Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.

Kafka cluster failover occurs this way:

 Switch traffic to standby Kafka producers cluster and Kafka cluster.


 Restart consumers to consume from standby Kafka cluster.

Pros Cons
 Less operational overhead
 Added latency due to cross-AZ data transfer among
when compared to the first
Kafka brokers
option
 Only one Kafka cluster to  For Kafka versions before 0.10, replicas for topic
manage and consume data partitions have to be assigned so they’re distributed
from to the brokers on different AZs (rack-awareness)
 Can handle single AZ  The cluster can become unavailable in case of a
failures without activating a network glitch, where ZooKeeper does not see
standby Kafka cluster Kafka brokers
   Possibility of in-transit message loss during failover

Upgrades:
Rolling or in-place upgrade

In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into
consideration the recommendations for doing rolling restarts to avoid downtime for end users.

Downtime upgrade

Take entire cluster down, upgrade each Kafka broker, and then restart the cluster.

Blue/green upgrade

 Create a new Kafka cluster on AWS.


 Create a new Kafka producers stack to point to the new Kafka cluster.
 Create topics on the new Kafka cluster.
 Test the green deployment end to end (sanity check).
 Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the
new green Kafka environment that you have created.

You might also like