Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Amazon Managed Streaming for Apache Kafka

Amazon MSK is a fully managed service that provide to build and run applications that use
Apache Kafka to process streaming data.
Apache Kafka is an open-source platform for building real-time streaming data pipelines and
applications.

Amazon MSK provide to use native Apache Kafka APIs to populate data lakes, stream
changes to and from databases, and power machine learning and analytics applications

Below are the challenges when we run Apache Kafka on own:


1. Need to provision servers
2. Configure Apache Kafka manually
3. Replace servers when they fail
4. Orchestrate server patches and upgrades
5. Architect the cluster for high availability
6. Ensure data is durably stored and secured
7. Setup monitoring and alarms
8. Plan scaling events to support load changes

Amazon MSK eliminates all these challenges and makes it easy to build and run production
applications on Apache Kafka without needing Apache Kafka infrastructure management
expertise. That means spend less time managing infrastructure and more time building
applications.

Amazon MSK automatically provisions and runs your Apache Kafka clusters.
Amazon MSK continuously monitors cluster health and automatically replaces unhealthy
nodes with no downtime to your application.

High Level how Apache kafka operates:

producer: Any devices such as Web server, IoT devces,Ec2 instances etc
Consumer: Application reading data from Kafka
Kafka: Cluster framework in which Kafka is running
Broker: It is storage node that produces right data to be consumed by consumers
Zookeeper: Maintain state of resources involved in kafka cluster

key concepts of Apache Kafka:


Apache Kafka stores records in topics. Data producers write records to topics and consumers
read records from topics.
Each record in Apache Kafka consists of a key, a value, and a timestamp.
Apache Kafka partitions topics and replicates these partitions across multiple nodes called
brokers.
Apache Kafka runs as a cluster on one or more brokers, and brokers can be located in
multiple AWS availability zones to create a highly available cluster.
Apache Kafka relies on Apache ZooKeeper to coordinate cluster tasks and can maintain state
for resources interacting with an Apache Kafka cluster.

Kafka automatically replicate data between brokers

How Amazon MSK works:


Apache Kafka is a streaming data store that decouples applications producing streaming data
(producers) into its data store from applications consuming streaming data (consumers) from
its data store.

An overview of how Amazon MSK works:


High level steps for Setting Up Amazon MSK:
Step 1: Create a VPC for MSK Cluster
Step 2: Enable High Availability and Fault Tolerance
Step 3: Create an Amazon MSK Cluster
Step 4: Create a Client Machine
client machine is used to create a topic that produces and consumes data
Step 5: Create a Topic
Step 6: Produce and Consume Data
Step 7: Use Amazon CloudWatch to View Amazon MSK Metrics
Step 8: Delete the Amazon MSK Cluster

Hands-on:
Step 1: Create a VPC for MSK Cluster:

Step 2: Enable High Availability and Fault Tolerance


Create 3 subnet
Step 3: Create an Amazon MSK Cluster
Step 4: Create a Client Machine
Launch AWS instance: This client machine to create a topic that produces and
consumes data

Here issue will be connecting to instance:


Reason: Created private VPC so need to attach IGW
Resolution: Choose Create Internet Gateway to create an internet gateway. Select the
internet gateway, and then choose Attach to VPC

In the navigation pane, choose Subnets, and then select subnet.


On the Route Table tab, verify that there is a route with 0.0.0.0/0 as the destination and the
internet gateway for VPC as the target
a. Choose the ID of the route table (rtb-xxxxxxxx) to navigate to the route table.
b. On the Routes tab, choose Edit routes. Choose Add route, use 0.0.0.0/0 as the destination
and the internet gateway as the target.

Step 5: Create a Topic


Connect to aws instance

Install Java on the client machine by running the following command:


sudo yum install java-1.8.0

Run the following command to download Apache Kafka.


wget https://archive.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz
tar -xzf kafka_2.12-2.2.1.tgz
Go to the kafka_2.12-2.2.1 directory

bin/kafka-topics.sh --create --zookeeper ZookeeperConnectString --replication-factor 3


--partitions 1 --topic AWSKafkaTutorialTopic

Note: In case having network connectivity issue then check


client machine's security group id is added in Inbound rules of created VPC

Step 6: Produce and Consume Data


To produce and consume messages, Here using JVM truststore to talk to the MSK cluster

1. Go to bin directory of Apache Kafka installation and run the following command
cp /usr/lib/jvm/JDKFolder/jre/lib/security/cacerts /tmp/kafka.client.truststore.jks

2. Create file client.properties in the bin folder of the Apache Kafka installation on the client
machine with the following contents.
security.protocol=SSL
ssl.truststore.location=/tmp/kafka.client.truststore.jks
3. Run the following command in the bin folder, replacing BootstrapBrokerStringTls with the
Bootstrap servers url.
./kafka-console-producer.sh --broker-list BootstrapBrokerStringTls --producer.config
client.properties --topic AWSKafkaTutorialTopic

4. Keep the connection to the client machine open, and then open a second, separate
connection to that machine in a new window
Run the following command with replacing BootstrapBrokerStringTls url

./kafka-console-consumer.sh --bootstrap-server BootstrapBrokerStringTls --consumer.config


client.properties --topic AWSKafkaTutorialTopic --from-beginning

Producer:

Consumer:

You might also like