Stream Processing at Lyft

Streaming
Jamie Grier | @jgrier
1
Agenda
• Goals of Lyft’s Streaming Platform
• Streaming Platform Overview
• Why Flink
• Why Kafka
• Open problems
2
Goals of Lyft’s Streaming Platform
• Make it easy to build real-time, event-driven, stateful,

microservices
• Solve the hard parts of stream processing ONCE for the entire
company
• Be a force multiplier for other teams within Lyft
3
Streaming Platform Overview
Stream Compute
Pub/Sub Streaming Pub/Sub
Service One
Streaming
Service Two
Streaming
Service Three
Stream / Schema Deployment Metrics &

Alerts Logging
Registry Tooling Dashboards
Amazon Salt
Amazon S3 Wavefront Docker
EC2 (Conifg / Orca) 4
Lyft Streaming Platform - Streaming Compute Criteria
API Considerations: Operational Considerations

● Stateful Computation and Exactly-
● Functional / Fluent API
once Processing Semantics
● Flexible Windowing API ● Robust State Management
● Event Time Support ● Data Reprocessing (backfill)
● Asynchronous Checkpoints
● Apache Beam Support
● Back-pressure
● Stream SQL
● High throughput and low-latency
● Powerful Direct API ● Deployment Architecture
● Late Data Handling
The contenders: Apache Flink, Apache Spark Streaming, Apache Kafka Streams
5
Why Flink? API Considerations
• Functional / Fluent API
• Flexible Windowing API
• Event Time Support
• Apache Beam Support
• Stream SQL
• Powerful Direct API
• Late Data Handling
6
Why Flink? Operational Considerations
• Stateful Computation and Exactly-once Processing Semantics

• Robust State Management
• Stateful Data Reprocessing (backfill)
• Asynchronous Checkpoints
• Back-pressure
• High throughput and low-latency
• Deployment Architecture
7
Lyft Streaming Platform - Pub/Sub Criteria
Semantics / Features Operational Considerations

● Write Latency
● Durability
● Read Latency
● Consumer Fanout
● Project Maturity
● Transactions / Idempotent Writes ● Vendor Support
● Per-Key Ordering Guarantees
● Long-Term Data Storage
● Auto-Scaling
The contenders: Apache Kafka, Amazon Kinesis, Pravega

8
Why Kafka?
Pros
• Durability & Write Latency
• Read Latency & Consumer Fanout
• Transactions & Idempotent Writes
• Operational Concerns & Vendor Support
Cons
• No ordering by key, only partition
• Long term data storage still an issue
• Auto-Scaling still an issue
9
Open Problems
• Rescaling Kafka while preserving per-key ordering
• Efficient Dynamic Computations over streams
• Long term storage for events: real-time and historical reads
• Zero Downtime deployments for streaming services
10
Rescaling Kafka
• Rescaling Kafka while preserving per-key ordering
• Kafka only provides partition ordering guarantees!
• We want per-key ordering guarantees
• Guarantees should hold across re-partitioning events
• Basic approach: Read old partitions completely before reading

new
• Achieve this using something akin to Flink’s checkpoint 11

Rescaling Kafka while preserving per-key ordering
12
Rescaling Kafka while preserving per-key ordering
13
Efficient Dynamic Computation Over Streams
• Enable many users to dynamically submit small streaming

computations
• Share bandwidth amongst multiple computations
• Share computed sub-results amongst multiple computations
• Correctly handle bootstrapping of computations which

depend on historical data
• Basic approach: Map any computation into a fixed/general

14
Efficient Dynamic Computations over streams
15
Efficient Dynamic Computations over streams
16
Long term storage for events: Real-time and historical reads
17
Zero Downtime deployments for streaming services
18
Summary
• Lyft is building a next generation streaming platform based

on Apache Flink and Apache Kafka
• Stateful stream processing is not a “solved problem”
• There are many hard / open problems left to solve
• If these sort of problems interest you please come join us!
We’re Hiring!
19
Thank you!
Jamie Grier
20

Stream Processing at Lyft

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stream Processing at Lyft

Uploaded by

Copyright:

Available Formats

Streaming

Jamie Grier | @jgrier

• Goals of Lyft’s Streaming Platform

• Streaming Platform Overview

• Make it easy to build real-time, event-driven, stateful,

• Be a force multiplier for other teams within Lyft

Stream / Schema Deployment Metrics &

API Considerations: Operational Considerations

• Stateful Computation and Exactly-once Processing Semantics

Semantics / Features Operational Considerations

The contenders: Apache Kafka, Amazon Kinesis, Pravega

• Rescaling Kafka while preserving per-key ordering

• Efficient Dynamic Computations over streams

• Long term storage for events: real-time and historical reads

• Zero Downtime deployments for streaming services

• Rescaling Kafka while preserving per-key ordering

• Kafka only provides partition ordering guarantees!

• We want per-key ordering guarantees

• Guarantees should hold across re-partitioning events

• Basic approach: Read old partitions completely before reading

• Achieve this using something akin to Flink’s checkpoint 11

• Enable many users to dynamically submit small streaming

• Share bandwidth amongst multiple computations

• Share computed sub-results amongst multiple computations

• Correctly handle bootstrapping of computations which

• Basic approach: Map any computation into a fixed/general

• Lyft is building a next generation streaming platform based

• Stateful stream processing is not a “solved problem”

• There are many hard / open problems left to solve

• If these sort of problems interest you please come join us!

You might also like