Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Streaming

Jamie Grier | @jgrier

1
Agenda

• Goals of Lyft’s Streaming Platform

• Streaming Platform Overview

• Why Flink

• Why Kafka

• Open problems

2
Goals of Lyft’s Streaming Platform

• Make it easy to build real-time, event-driven, stateful,


microservices

• Solve the hard parts of stream processing ONCE for the entire
company

• Be a force multiplier for other teams within Lyft

3
Streaming Platform Overview
Stream Compute
Pub/Sub Streaming Pub/Sub
Service One

Streaming
Service Two

Streaming
Service Three

Stream / Schema Deployment Metrics &


Alerts Logging
Registry Tooling Dashboards

Amazon Salt
Amazon S3 Wavefront Docker
EC2 (Conifg / Orca) 4
Lyft Streaming Platform - Streaming Compute Criteria

API Considerations: Operational Considerations


● Stateful Computation and Exactly-
● Functional / Fluent API
once Processing Semantics
● Flexible Windowing API ● Robust State Management
● Event Time Support ● Data Reprocessing (backfill)
● Asynchronous Checkpoints
● Apache Beam Support
● Back-pressure
● Stream SQL
● High throughput and low-latency
● Powerful Direct API ● Deployment Architecture
● Late Data Handling

The contenders: Apache Flink, Apache Spark Streaming, Apache Kafka Streams
5
Why Flink? API Considerations
• Functional / Fluent API
• Flexible Windowing API
• Event Time Support
• Apache Beam Support
• Stream SQL
• Powerful Direct API
• Late Data Handling

6
Why Flink? Operational Considerations

• Stateful Computation and Exactly-once Processing Semantics


• Robust State Management
• Stateful Data Reprocessing (backfill)
• Asynchronous Checkpoints
• Back-pressure
• High throughput and low-latency
• Deployment Architecture

7
Lyft Streaming Platform - Pub/Sub Criteria

Semantics / Features Operational Considerations


● Write Latency
● Durability
● Read Latency
● Consumer Fanout
● Project Maturity
● Transactions / Idempotent Writes ● Vendor Support
● Per-Key Ordering Guarantees
● Long-Term Data Storage
● Auto-Scaling

The contenders: Apache Kafka, Amazon Kinesis, Pravega


8
Why Kafka?
Pros
• Durability & Write Latency
• Read Latency & Consumer Fanout
• Transactions & Idempotent Writes
• Operational Concerns & Vendor Support
Cons
• No ordering by key, only partition
• Long term data storage still an issue
• Auto-Scaling still an issue

9
Open Problems

• Rescaling Kafka while preserving per-key ordering

• Efficient Dynamic Computations over streams

• Long term storage for events: real-time and historical reads

• Zero Downtime deployments for streaming services

10
Rescaling Kafka

• Rescaling Kafka while preserving per-key ordering

• Kafka only provides partition ordering guarantees!

• We want per-key ordering guarantees

• Guarantees should hold across re-partitioning events

• Basic approach: Read old partitions completely before reading


new

• Achieve this using something akin to Flink’s checkpoint 11


Rescaling Kafka while preserving per-key ordering

12
Rescaling Kafka while preserving per-key ordering

13
Efficient Dynamic Computation Over Streams

• Enable many users to dynamically submit small streaming


computations

• Share bandwidth amongst multiple computations

• Share computed sub-results amongst multiple computations

• Correctly handle bootstrapping of computations which


depend on historical data

• Basic approach: Map any computation into a fixed/general


14
Efficient Dynamic Computations over streams

15
Efficient Dynamic Computations over streams

16
Long term storage for events: Real-time and historical reads

17
Zero Downtime deployments for streaming services

18
Summary

• Lyft is building a next generation streaming platform based


on Apache Flink and Apache Kafka

• Stateful stream processing is not a “solved problem”

• There are many hard / open problems left to solve

• If these sort of problems interest you please come join us!

We’re Hiring!
19
Thank you!
Jamie Grier

20

You might also like