Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Introduction to Confluent

Components
Customer Success Engineering
June 2022
Agenda

1. Confluent Platform
What components make up the Confluent Platform?

2. Kafka Concepts
Events, Distributed Commit Log, Event Streaming/Processing

3. Confluent Platform Components


Brokers, Zookeeper, Clients, REST Proxy, Schema Registry , Connect, Kafka Streams, ksqlDB and
Control Center

4. Additional Features
Multi-Region Clusters, Tiered Storage, Cluster Linking and Self Balancing clusters

5. Deployment
How can I deploy the Confluent platform?

Copyright 2022, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2
1. Confluent Platform
Motivation

4
Destination
What is the Confluent Platform?

An Enterprise Event Streaming Platform


built around Apache Kafka

6
Confluent Platform Components
Application Microservice Schema Registry Kafka Connect

Leader Follower
Worker + Worker +
Sticky Load Balancer Connectors Connectors

ksqlDB
REST Proxy ksqlDB ksqlDB
Server Server
Proxy Proxy Application

Clients
Kafka Brokers
Broker + Broker + Broker + Broker + KStreams
Rebalancer Rebalancer Rebalancer Rebalancer pp
Streams

ZooKeeper Nodes
Confluent
ZK ZK ZK ZK ZK Control Center

https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
7
Confluent completes Apache Kafka
DEVELOPER OPERATOR ARCHITECT EXECUTIVE

Unrestricted Efficient Production-stage Partnership for


Developer Productivity Operations at Scale Prerequisites Business Success

Multi-language Development Management & Monitoring Enterprise-grade Security


Non-Java Clients | REST Proxy Cloud Data Flow | Metrics API RBAC | BYOK | Private Networking Complete Engagement Model
Admin REST APIs Control Center | Health+ Encryption | Audit Logs

Rich Pre-built Ecosystem Flexible DevOps Automation Data Compatibility


Connectors | Hub | Schema Registry
Confluent for K8s | Ansible Playbooks
Schema Registry | Schema Validation
Revenue / Cost / Risk Impact
Marketplace Availability

Dynamic Performance & Elasticity Global Resilience


Streaming Database Elastic Scaling | Infinite Storage Multi AZ Clusters | 99.95% SLA | Replicator TCO / ROI
ksqlDB
Self-Balancing Clusters | Tiered Storage Multi-Region Clusters | Cluster Linking

Apache Kafka
Cloud service
Software

Fully Managed Cloud Service Availability Everywhere Self-managed Software

Enterprise Professional
Support Services Committer-driven Expertise Training Partners
8
Confluent Platform: Features and Licensing
Features Licensing
Open Source features
Apache Kafka® (with Connect & Streams) Apache 2.0 License
Apache ZooKeeper™
Non-Java Clients Free. Unlimited Kafka brokers
Ansible Playbooks Community support
Community features
Pre-built Connectors Confluent Community License
REST Proxy
ksqlDB Free. Unlimited Kafka brokers
Schema Registry Community support
Commercial features Enterprise License (paid)
Pre-built Connectors
Control Center ● Annual subscription
Health+
● 24x7 Confluent support
Confluent for Kubernetes
Replicator
Secret Protection
Developer License Evaluation License
Auto Data Balancer
MQTT Proxy ● Free ● Free 30-day trial
Role-Based Access Control ● Limited to 1 Kafka broker ● Unlimited Kafka brokers
Structured Audit Logs
● Community support ● Community support
Schema Validation
Confluent Server Self-Balancing Clusters
Tiered Storage
Multi-Region Clusters
Cluster Linking (preview) Best-effort Confluent support
2. Kafka Concepts
Apache Kafka is a Distributed Commit Log

Publish and subscribe to 110101 Similar to a message queue


streams of events
010111
001101
100010

110101
Store streams of events 010111
001101
In a fault tolerant way
100010

110101
010111
Process streams of events 001101 In real time, as they occur
and produce new ones 100010

11
Anatomy of a Kafka Topic
Partition 0 1 2 3 4 5 6 7 8 9 10 11 12

Partition 1 1 2 3 4 5 6 7 8 9 Writes

Partition 2 1 2 3 4 5 6 7 8 9 10 11 12

Old New

Consumer A Consumer B
Producers
(offset=4) (offset=7)

Reads Writes

1 2 3 4 5 6 7 8 9 10 11 12
03. Confluent Platform
Components
Brokers & Zookeeper
Apache Kafka: Scale Out Vs. Failover

Topic1 Topic1 Topic1


partition1 partition1 partition1

Topic1 Topic1 Topic1


partition2 partition2 partition2

Topic1 Topic1 Topic1


partition3 partition3 partition3

Topic1 Topic1 Topic1


partition4 partition4 partition4

Broker 1 Broker 2 Broker 3 Broker 4

15
Apache Zookeeper - Cluster coordination

partition partition partition partition

partition Broker 2 partition partition


Broker 1 (controller) Broker 3 Broker 4

Stores metadata: Zookeeper 3


heartbeats, watches, Zookeeper 1 Zookeeper 2
(leader)
controller elections,
cluster/topic configs,
permissions writes go to
leader 16
Java Clients & more
Producer

A Kafka producer sends data to


multiple partitions based on partition 1
partitioning strategy (default uses
hash of the key).

partition 2
P
partition 3

partition 4

Data is sent in batch per partition and bundled into a request for the broker.
Can configure compression.type, batch.size, linger.ms and acks.
18
Producer

Broker 1
Replica 1

Broker 2
P Replica 2

Broker 3
Replica 3

acks=all
min.insync.replica=2
replication.factor=3
Consumer

Partition 1
commit
offset
heartbeat
Partition 2

C
Partition 3
poll records

Partition 4

20
Consumers - Consumer group members

C C

C C
Within the same application (consumer
group), different partitions can be
assigned to different consumers to
increase parallel consumption as well as 21

support failover
Consumers - Consumer Groups

CC
C1

CC
C2
Different applications can
independently read from same
topic partitions at their own pace 22
Make Kafka Confluent Clients
Widely Accessible Battle-tested and high performing
producer and consumer APIs (plus
to Developers admin client)

Enable all developers to leverage Kafka


throughout the organization with a wide variety
of Confluent clients

23
REST Proxy
Connect Any REST Proxy
Application to Kafka
Non-Java
Applications
Provides a RESTful
REST / HTTP
interface to a Kafka cluster
REST Proxy

Communicate via
HTTP-connected devices
Schema Registry

Allows third-party apps to


produce and consume
messages Native Kafka Java
Applications
25
25
Schema Registry
Enforce Producer/Consumer compatibility
Schema Registry
Enable Application
Development !
Compatibility App 1
Serializer

Develop using standard schemas


Kafka
• Store and share a versioned history of all
standard schemas topic
• Validate data compatibility at the client !
Schema
level
Registry
Reduce operational complexity App 1
Serializer
• Avoid time-consuming coordination
among developers to standardize on
schemas
Schema Registry: Key Features

• Manage schemas and enforce schema policies


Define, per Kafka topic, a set of compatible schemas that are “allowed”
Schemas can be defined by an admin or by clients at runtime
Avro, Protobuf, and JSON schemas all supported

• Automatic validation when data is written to a topic


If the data doesn’t match the schema, the producer gets an error

• Works transparently
When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams

• Integrates with Kafka Connect

• Integrates with Kafka Streams

• Supports high availability (within a datacenter)

28
Schema Registry: Key
Features Schema Validation
2. Error
message

1. Invalid
Scale schemas reliably schema confluent.value.schema.validation=true

• Automated broker-side schema validation id


and enforcement Producer Broker
• Direct interface from the broker to
Confluent Schema Registry
Granular control
• Enabled validation at the topic level

Schema
Registry
Kafka Connect
No Code connectivity to many systems
Kafka Connect

No-Code way of connecting known systems (databases, object storage, queues, etc)
to Apache Kafka

Some code can be written to do custom transforms and data conversions though
maybe out of the box Single Message Transforms and Converters exist

Data Data
sources sinks
Kafka Connect Kafka Connect

31
Instantly Connect Popular Data Sources & Sinks

Data Diode

190+
pre-built
connectors

80+ Confluent Supported 50+ Partner Supported, Confluent Verified


32
Kafka Connect
Kafka Connect:
Durable Data Schema
Registry

Pipelines

Integrate upstream and downstream systems


with Apache Kafka®
• Capture schema from sources, use schema
to inform data sinks
• Highly Available workers ensure data
pipelines aren’t interrupted
• Extensible framework API for building
Worker Kafka Worker
custom connectors
Worker Cluster Worker
Instantly Connect
Popular Data
Sources & Sinks

Confluent HUB
Easily browse connectors by:
• Source vs Sinks
• Confluent vs Partner supported
• Commercial vs Free
• Available in Confluent Cloud

confluent.io/hub
Kafka Streams
Build apps which with stream processing inside
Stream Processing by Analogy

Connect API Stream Processing Connect API

$ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt

Kafka Cluster
36

36
Stream Processing in Kafka

Flexibility
Simplicity

Producer/Consume Kafka Streams API


r
● subscribe() ● filter()
● poll() ● map()
● send() ● join()
● flush() ● aggregate()
● commit()

37
Where does the processing code run?

Same app, many instances


App App App
Streams Streams Streams
API API API

38
ksqlDB
Stream processing using SQL and much more
Stream Processing in Kafka

Flexibility Simplicity

Producer/Consume Kafka Streams API ksqlDB


r
● subscribe() ● filter()
● poll() ● map() ● Select…from…
● send() ● join() ● Join…where…
● flush() ● aggregate() ● Group by..

40
Can I do stream processing with SQL?

My data is in a topic. Can I explore it?

SELECT status, bytes


FROM clickstream
WHERE user_agent =
'Mozilla/5.0 (compatible; MSIE 6.0)';

41
Can I do stream processing with SQL?

Can I pipe filtered and joined data to a new topic automatically?

Why, yes. You can!

CREATE STREAM vip_actions AS


SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
42
Can I do stream processing with SQL?

What if I could describe an anomaly detector in SQL and have it write the
results to a topic? You can do that too!

CREATE TABLE possible_fraud AS


SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
43
Confluent Control Center
Confluent Control Center

Confluent Control Center


The simplest way to operate and build
applications with Apache Kafka

For Operators
Centrally manage and monitor multi-cluster
environments and security.

For Developers
View messages, topics and schemas,
manage connectors and build ksqlDB
queries

45
Adhere to Established
Event Streaming SLAs Broker overview

Monitor and optimize


system health
Cluster overview
• Broker and ZooKeeper uptime
• Under replicated partitions
• Out of sync replicas
• Disk usage and distribution
• Alerting
Accelerate Application Development and
Integration

Messages Topics Schemas


Browse messages, and search Create, edit, delete and view all Create, edit and view topic
offsets or timestamps by topics in one place schemas, and compare schema
partition versions
4. Additional Features
Additional Features!
Self Balancing Clusters
automate partition rebalances to improve Kafka’s
performance, elasticity, and ease of operations.

In addition to the behaviour and components we Tiered Storage


have discussed so far, the following features are Enable infinite data retention and elastic
scalability by allowing Kafka to recognize two tiers
also available through Confluent Platform and of storage: Local and Object Store
merit further investigation

Multi Region Clusters


Take advantage of clusters being spread across
zones by implementing new functionality:
Follower-Fetching, Observers, Replica Placement

Cluster Linking
Enables the direct connection of clusters to mirror
topics between them
49
Self Balancing Clusters
Rebalances are required regularly to
optimize cluster performance:
Self Balancing Clusters
Uneven
load
Self-Balancing Clusters automate
partition rebalances to improve Kafka’s
performance, elasticity, and ease of
operations
Expansion

Shrinkage

51
Manual Rebalance Process:

Self Balancing Clusters $ cat partitions-to-move.json


{
"partitions": [{
"topic": "foo",
"partition": 1,
Self-Balancing Clusters automate "replicas": [1, 2, 4]
partition rebalances to improve Kafka’s }, ...],
performance, elasticity, and ease of "version": 1
operations }

$ kafka-reassign-partitions ...

Confluent Platform:

Self-Balancing

No complex math, no risk of human error


52
Tiered Storage
Event Streaming is storage-intensive:
Data Stores Logs
Tiered Storage
Main- Object Device
Hadoop
frame Storage
...
Logs ...

Tiered Storage enables infinite data


retention and elastic scalability by
decoupling the compute and storage
layers in Kafka

Micro-
Splunk SFDC ... App
service ...

3rd Party Apps Custom Apps /


Microservices
Tiered Storage allows Kafka to
recognize two layers of storage:
Tiered Storage Brokers

Tiered Storage enables infinite data


retention and elastic scalability by
decoupling the compute and storage
layers in Kafka Cost-effective Offload old data
Object Storage to object store
Tiered Storage delivers three primary
benefits that revolutionize the way
Tiered Storage our customers experience Kafka:

Infinite data retention


Tiered Storage enables infinite data Reimagine what event streaming apps can do
retention and elastic scalability by
decoupling the compute and storage
layers in Kafka
Reduced infrastructure costs
Offload data to cost-effective object storage

Platform elasticity
Scale compute and storage independently
Multi Region Clusters
automated

Multi Region Clusters client


failover
Site failure! Failover site

Change the game for disaster recovery for Kafka Client A Client B Client A Client B

Client D Client F Client G

Minimal downtime:
• Automated client failover Single Kafka Cluster

Streamlined DR operations
• Leverages Kafka’s internal replication Broker
w-1
Broker
w-2
Broker
w-3
Broker
e-4
Broker
e-5
Broker
e-6
• No separate Connect clusters
Single multi-region cluster with high write Broker
w-4
Broker
w-5
Broker
w-6
Broker
e-1
Broker
e-2
Broker
e-3
throughput
• Asynchronous replication using “Observer” ZK1 ZK2
Observer
replicas replicas
Low bandwidth costs and high read throughput us-west-1 us-east-1
• Remote consumers read data locally, directly
from Observers
ZK3
“tie-breaker”
us-central-1
datacenter
Cluster Linking
Sharing data between independent
clusters or migrating clusters presents
Cluster Linking two challenges:

Cluster Linking simplifies hybrid


cloud and multi-cloud deployments 11. Requires deploying a separate Connect
for Kafka cluster
DC 1: DC 2:

2.
2 Offsets are not preserved, so messages
are at risk of being skipped or reread

Topic 1, DC 1: 0 1 2 3 4 ...

Topic 1, DC 2: 4 5 6 7 8 ... 60
Cluster Linking requires no additional
infrastructure and preserves offsets:
Cluster Linking

Cluster Linking simplifies hybrid


cloud and multi-cloud deployments
for Kafka

Migrate
clusters to
Confluent
Cloud

61
5. Deployment
Confluent Platform Deployment Options

Confluent Platform

Confluent Cloud / Kubernetes / Package Managers /


Terraform Ansible Tarballs

Low DevOps Medium DevOps High DevOps

63
Deploy to Confluent
Cloud Terraform
Infrastructure as code
done right
Benefit from:
• Industry Standard
• Human Readable Configuration
• Manage Critical Confluent Cloud
Resources
• Consistent Deployability
• Multi-Cloud With Ease
• Scale Quickly
Accelerate Deployment
to Production Production-ready
Non-containerized Ansible Playbooks
environments
Deploy a complete event streaming
platform at scale:
• Kafka Brokers & Zookeeper
• Kafka Connect
• REST Proxy
• ksqlDB
• Control Center
• Schema Registry
• Schema Validation
• RBAC
Simplify Day-to-Day
Operations
Confluent for Kubernetes: Automated
rolling upgrades
Deploy a production-ready
Perform automated rolling upgrades after a Confluent
event streaming platform in Platform version, configuration, or resource update
minutes without impacting Kafka availability
Questions?

67

You might also like