Introduction To Confluent Components

Introduction to Confluent
Components
Customer Success Engineering
June 2022
Agenda
1. Confluent Platform
What components make up the Confluent Platform?
2. Kafka Concepts
Events, Distributed Commit Log, Event Streaming/Processing
3. Confluent Platform Components

Brokers, Zookeeper, Clients, REST Proxy, Schema Registry , Connect, Kafka Streams, ksqlDB and
Control Center
4. Additional Features
Multi-Region Clusters, Tiered Storage, Cluster Linking and Self Balancing clusters
5. Deployment
How can I deploy the Confluent platform?
Copyright 2022, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2
Motivation
4
Destination
What is the Confluent Platform?
An Enterprise Event Streaming Platform

built around Apache Kafka
6
Confluent Platform Components
Application Microservice Schema Registry Kafka Connect
Leader Follower
Worker + Worker +
Sticky Load Balancer Connectors Connectors
ksqlDB
REST Proxy ksqlDB ksqlDB
Server Server
Proxy Proxy Application
Clients
Kafka Brokers
Broker + Broker + Broker + Broker + KStreams
Rebalancer Rebalancer Rebalancer Rebalancer pp
Streams
ZooKeeper Nodes
Confluent
ZK ZK ZK ZK ZK Control Center
https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
7
Confluent completes Apache Kafka
DEVELOPER OPERATOR ARCHITECT EXECUTIVE
Unrestricted Efficient Production-stage Partnership for

Developer Productivity Operations at Scale Prerequisites Business Success
Multi-language Development Management & Monitoring Enterprise-grade Security

Non-Java Clients | REST Proxy Cloud Data Flow | Metrics API RBAC | BYOK | Private Networking Complete Engagement Model
Admin REST APIs Control Center | Health+ Encryption | Audit Logs
Rich Pre-built Ecosystem Flexible DevOps Automation Data Compatibility

Connectors | Hub | Schema Registry
Confluent for K8s | Ansible Playbooks
Schema Registry | Schema Validation
Revenue / Cost / Risk Impact
Marketplace Availability
Dynamic Performance & Elasticity Global Resilience

Streaming Database Elastic Scaling | Infinite Storage Multi AZ Clusters | 99.95% SLA | Replicator TCO / ROI
ksqlDB
Self-Balancing Clusters | Tiered Storage Multi-Region Clusters | Cluster Linking
Apache Kafka
Cloud service
Software
Fully Managed Cloud Service Availability Everywhere Self-managed Software
Enterprise Professional
Support Services Committer-driven Expertise Training Partners
8
Confluent Platform: Features and Licensing
Features Licensing
Open Source features
Apache Kafka® (with Connect & Streams) Apache 2.0 License
Apache ZooKeeper™
Non-Java Clients Free. Unlimited Kafka brokers
Ansible Playbooks Community support
Community features
Pre-built Connectors Confluent Community License
REST Proxy
ksqlDB Free. Unlimited Kafka brokers
Schema Registry Community support
Commercial features Enterprise License (paid)
Pre-built Connectors
Control Center ● Annual subscription
Health+
● 24x7 Confluent support
Confluent for Kubernetes
Replicator
Secret Protection
Developer License Evaluation License
Auto Data Balancer
MQTT Proxy ● Free ● Free 30-day trial
Role-Based Access Control ● Limited to 1 Kafka broker ● Unlimited Kafka brokers
Structured Audit Logs
● Community support ● Community support
Schema Validation
Confluent Server Self-Balancing Clusters
Tiered Storage
Multi-Region Clusters
Cluster Linking (preview) Best-effort Confluent support
2. Kafka Concepts
Apache Kafka is a Distributed Commit Log
Publish and subscribe to 110101 Similar to a message queue

streams of events
010111
001101
100010
110101
Store streams of events 010111
001101
In a fault tolerant way
100010
110101
010111
Process streams of events 001101 In real time, as they occur
and produce new ones 100010
11
Anatomy of a Kafka Topic
Partition 0 1 2 3 4 5 6 7 8 9 10 11 12
Partition 1 1 2 3 4 5 6 7 8 9 Writes
Partition 2 1 2 3 4 5 6 7 8 9 10 11 12
Old New
Consumer A Consumer B
Producers
(offset=4) (offset=7)
Reads Writes
1 2 3 4 5 6 7 8 9 10 11 12
Components
Brokers & Zookeeper
Apache Kafka: Scale Out Vs. Failover
Topic1 Topic1 Topic1

partition1 partition1 partition1



Broker 1 Broker 2 Broker 3 Broker 4
15
Apache Zookeeper - Cluster coordination
partition partition partition partition
partition Broker 2 partition partition

Broker 1 (controller) Broker 3 Broker 4
Stores metadata: Zookeeper 3

heartbeats, watches, Zookeeper 1 Zookeeper 2
(leader)
controller elections,
cluster/topic configs,
permissions writes go to
leader 16
Java Clients & more
Producer
A Kafka producer sends data to

multiple partitions based on partition 1
partitioning strategy (default uses
hash of the key).
partition 2
P
partition 3
partition 4
Data is sent in batch per partition and bundled into a request for the broker.
Can configure compression.type, batch.size, linger.ms and acks.
18
Producer
Broker 1
Replica 1
Broker 2
P Replica 2
Broker 3
Replica 3
acks=all
min.insync.replica=2
replication.factor=3
Consumer
Partition 1
commit
offset
heartbeat
Partition 2
C
Partition 3
poll records
Partition 4
20
Consumers - Consumer group members
C C
C C
Within the same application (consumer
group), different partitions can be
assigned to different consumers to
increase parallel consumption as well as 21
support failover
Consumers - Consumer Groups
CC
C1
CC
C2
Different applications can
independently read from same
topic partitions at their own pace 22
Make Kafka Confluent Clients
Widely Accessible Battle-tested and high performing
producer and consumer APIs (plus
to Developers admin client)
Enable all developers to leverage Kafka

throughout the organization with a wide variety
of Confluent clients
23
REST Proxy
Connect Any REST Proxy
Application to Kafka
Non-Java
Applications
Provides a RESTful
REST / HTTP
interface to a Kafka cluster
REST Proxy
Communicate via
HTTP-connected devices
Schema Registry
Allows third-party apps to

produce and consume
messages Native Kafka Java
Applications
25
25
Schema Registry
Enforce Producer/Consumer compatibility
Schema Registry
Enable Application
Development !
Compatibility App 1
Serializer
Develop using standard schemas

Kafka
• Store and share a versioned history of all
standard schemas topic
• Validate data compatibility at the client !
Schema
level
Registry
Reduce operational complexity App 1
Serializer
• Avoid time-consuming coordination
among developers to standardize on
schemas
Schema Registry: Key Features
• Manage schemas and enforce schema policies

Define, per Kafka topic, a set of compatible schemas that are “allowed”
Schemas can be defined by an admin or by clients at runtime
Avro, Protobuf, and JSON schemas all supported
• Automatic validation when data is written to a topic

If the data doesn’t match the schema, the producer gets an error
• Works transparently
When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams
• Integrates with Kafka Connect
• Integrates with Kafka Streams
• Supports high availability (within a datacenter)
28
Schema Registry: Key
Features Schema Validation
2. Error
message
1. Invalid
Scale schemas reliably schema confluent.value.schema.validation=true
• Automated broker-side schema validation id

and enforcement Producer Broker
• Direct interface from the broker to
Confluent Schema Registry
Granular control
• Enabled validation at the topic level
Schema
Registry
Kafka Connect
No Code connectivity to many systems
Kafka Connect
No-Code way of connecting known systems (databases, object storage, queues, etc)
to Apache Kafka
Some code can be written to do custom transforms and data conversions though
maybe out of the box Single Message Transforms and Converters exist
Data Data
sources sinks
Kafka Connect Kafka Connect
31
Instantly Connect Popular Data Sources & Sinks
Data Diode
190+
pre-built
connectors
80+ Confluent Supported 50+ Partner Supported, Confluent Verified

32
Kafka Connect
Kafka Connect:
Durable Data Schema
Registry
Pipelines
Integrate upstream and downstream systems

with Apache Kafka®
• Capture schema from sources, use schema
to inform data sinks
• Highly Available workers ensure data
pipelines aren’t interrupted
• Extensible framework API for building
Worker Kafka Worker
custom connectors
Worker Cluster Worker
Instantly Connect
Popular Data
Sources & Sinks
Confluent HUB
Easily browse connectors by:
• Source vs Sinks
• Confluent vs Partner supported
• Commercial vs Free
• Available in Confluent Cloud
confluent.io/hub
Kafka Streams
Build apps which with stream processing inside
Stream Processing by Analogy
Connect API Stream Processing Connect API
$ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt
Kafka Cluster
36
36
Stream Processing in Kafka
Flexibility
Simplicity
Producer/Consume Kafka Streams API

r
● subscribe() ● filter()
● poll() ● map()
● send() ● join()
● flush() ● aggregate()
● commit()
37
Where does the processing code run?
Same app, many instances

App App App
Streams Streams Streams
API API API
38
ksqlDB
Stream processing using SQL and much more
Stream Processing in Kafka
Flexibility Simplicity
Producer/Consume Kafka Streams API ksqlDB

r
● subscribe() ● filter()
● poll() ● map() ● Select…from…
● send() ● join() ● Join…where…
● flush() ● aggregate() ● Group by..
40
Can I do stream processing with SQL?
My data is in a topic. Can I explore it?
SELECT status, bytes

FROM clickstream
WHERE user_agent =
'Mozilla/5.0 (compatible; MSIE 6.0)';
41
Can I pipe filtered and joined data to a new topic automatically?
Why, yes. You can!
CREATE STREAM vip_actions AS

SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
42
What if I could describe an anomaly detector in SQL and have it write the
results to a topic? You can do that too!
CREATE TABLE possible_fraud AS

SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
43
Confluent Control Center

The simplest way to operate and build
applications with Apache Kafka
For Operators
Centrally manage and monitor multi-cluster
environments and security.
For Developers
View messages, topics and schemas,
manage connectors and build ksqlDB
queries
45
Adhere to Established
Event Streaming SLAs Broker overview
Monitor and optimize

system health
Cluster overview
• Broker and ZooKeeper uptime
• Under replicated partitions
• Out of sync replicas
• Disk usage and distribution
• Alerting
Accelerate Application Development and
Integration
Messages Topics Schemas

Browse messages, and search Create, edit, delete and view all Create, edit and view topic
offsets or timestamps by topics in one place schemas, and compare schema
partition versions
4. Additional Features
Additional Features!
Self Balancing Clusters
automate partition rebalances to improve Kafka’s
performance, elasticity, and ease of operations.
In addition to the behaviour and components we Tiered Storage

have discussed so far, the following features are Enable infinite data retention and elastic
scalability by allowing Kafka to recognize two tiers
also available through Confluent Platform and of storage: Local and Object Store
merit further investigation
Multi Region Clusters

Take advantage of clusters being spread across
zones by implementing new functionality:
Follower-Fetching, Observers, Replica Placement
Cluster Linking
Enables the direct connection of clusters to mirror
topics between them
49
Rebalances are required regularly to
optimize cluster performance:
Uneven
load
Self-Balancing Clusters automate
partition rebalances to improve Kafka’s
performance, elasticity, and ease of
operations
Expansion
Shrinkage
51
Manual Rebalance Process:
Self Balancing Clusters $ cat partitions-to-move.json

{
"partitions": [{
"topic": "foo",
"partition": 1,
Self-Balancing Clusters automate "replicas": [1, 2, 4]
partition rebalances to improve Kafka’s }, ...],
performance, elasticity, and ease of "version": 1
operations }
$ kafka-reassign-partitions ...
Confluent Platform:
Self-Balancing
No complex math, no risk of human error

52
Tiered Storage
Event Streaming is storage-intensive:
Data Stores Logs
Tiered Storage
Main- Object Device
Hadoop
frame Storage
...
Logs ...
Tiered Storage enables infinite data

retention and elastic scalability by
decoupling the compute and storage
layers in Kafka
Micro-
Splunk SFDC ... App
service ...
3rd Party Apps Custom Apps /

Microservices
Tiered Storage allows Kafka to
recognize two layers of storage:
Tiered Storage Brokers
Tiered Storage enables infinite data

layers in Kafka Cost-effective Offload old data
Object Storage to object store
Tiered Storage delivers three primary
benefits that revolutionize the way
Tiered Storage our customers experience Kafka:
Infinite data retention

Tiered Storage enables infinite data Reimagine what event streaming apps can do
layers in Kafka
Reduced infrastructure costs
Offload data to cost-effective object storage
Platform elasticity
Scale compute and storage independently
Multi Region Clusters
automated
Multi Region Clusters client

failover
Site failure! Failover site
Change the game for disaster recovery for Kafka Client A Client B Client A Client B
Client D Client F Client G
Minimal downtime:
• Automated client failover Single Kafka Cluster
Streamlined DR operations
• Leverages Kafka’s internal replication Broker
w-1
Broker
w-2
Broker
w-3
Broker
e-4
Broker
e-5
Broker
e-6
• No separate Connect clusters
Single multi-region cluster with high write Broker
w-4
Broker
w-5
Broker
w-6
Broker
e-1
Broker
e-2
Broker
e-3
throughput
• Asynchronous replication using “Observer” ZK1 ZK2
Observer
replicas replicas
Low bandwidth costs and high read throughput us-west-1 us-east-1
• Remote consumers read data locally, directly
from Observers
ZK3
“tie-breaker”
us-central-1
datacenter
Cluster Linking
Sharing data between independent
clusters or migrating clusters presents
Cluster Linking two challenges:
Cluster Linking simplifies hybrid

cloud and multi-cloud deployments 11. Requires deploying a separate Connect
for Kafka cluster
DC 1: DC 2:
2.
2 Offsets are not preserved, so messages
are at risk of being skipped or reread
Topic 1, DC 1: 0 1 2 3 4 ...
Topic 1, DC 2: 4 5 6 7 8 ... 60
Cluster Linking requires no additional
infrastructure and preserves offsets:
Cluster Linking
Cluster Linking simplifies hybrid

cloud and multi-cloud deployments
for Kafka
Migrate
clusters to
Confluent
Cloud
61
5. Deployment
Confluent Platform Deployment Options
Confluent Platform
Confluent Cloud / Kubernetes / Package Managers /

Terraform Ansible Tarballs
Low DevOps Medium DevOps High DevOps
63
Deploy to Confluent
Cloud Terraform
Infrastructure as code
done right
Benefit from:
• Industry Standard
• Human Readable Configuration
• Manage Critical Confluent Cloud
Resources
• Consistent Deployability
• Multi-Cloud With Ease
• Scale Quickly
Accelerate Deployment
to Production Production-ready
Non-containerized Ansible Playbooks
environments
Deploy a complete event streaming
platform at scale:
• Kafka Brokers & Zookeeper
• Kafka Connect
• REST Proxy
• ksqlDB
• Control Center
• Schema Registry
• Schema Validation
• RBAC
Simplify Day-to-Day
Operations
Confluent for Kubernetes: Automated
rolling upgrades
Deploy a production-ready
Perform automated rolling upgrades after a Confluent
event streaming platform in Platform version, configuration, or resource update
minutes without impacting Kafka availability
Questions?
67

Introduction To Confluent Components

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Confluent Components

Uploaded by

Copyright:

Available Formats

Introduction to Conﬂuent

3. Conﬂuent Platform Components

An Enterprise Event Streaming Platform

Unrestricted Efﬁcient Production-stage Partnership for

Multi-language Development Management & Monitoring Enterprise-grade Security

Rich Pre-built Ecosystem Flexible DevOps Automation Data Compatibility

Dynamic Performance & Elasticity Global Resilience

Fully Managed Cloud Service Availability Everywhere Self-managed Software

Publish and subscribe to 110101 Similar to a message queue

Topic1 Topic1 Topic1

Topic1 Topic1 Topic1

Topic1 Topic1 Topic1

Topic1 Topic1 Topic1

Broker 1 Broker 2 Broker 3 Broker 4

partition partition partition partition

partition Broker 2 partition partition

Stores metadata: Zookeeper 3

A Kafka producer sends data to

Enable all developers to leverage Kafka

Allows third-party apps to

Develop using standard schemas

• Manage schemas and enforce schema policies

• Automatic validation when data is written to a topic

• Integrates with Kafka Connect

• Integrates with Kafka Streams

• Supports high availability (within a datacenter)

• Automated broker-side schema validation id

80+ Confluent Supported 50+ Partner Supported, Confluent Verified

Integrate upstream and downstream systems

Connect API Stream Processing Connect API

$ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt

Producer/Consume Kafka Streams API

Same app, many instances

Producer/Consume Kafka Streams API ksqlDB

My data is in a topic. Can I explore it?

SELECT status, bytes

Can I pipe ﬁltered and joined data to a new topic automatically?

Why, yes. You can!

CREATE STREAM vip_actions AS

CREATE TABLE possible_fraud AS

Conﬂuent Control Center

Monitor and optimize

Messages Topics Schemas

In addition to the behaviour and components we Tiered Storage

Multi Region Clusters

Self Balancing Clusters $ cat partitions-to-move.json

No complex math, no risk of human error

Tiered Storage enables inﬁnite data

3rd Party Apps Custom Apps /

Tiered Storage enables inﬁnite data

Inﬁnite data retention

Multi Region Clusters client

Client D Client F Client G

Cluster Linking simpliﬁes hybrid

Cluster Linking simpliﬁes hybrid

Conﬂuent Cloud / Kubernetes / Package Managers /

Low DevOps Medium DevOps High DevOps

You might also like