Professional Documents
Culture Documents
4 Spark Cassandra
4 Spark Cassandra
4 Spark Cassandra
Ahmad Alkilani
DATA ARCHITECT
@akizl
Advanced Streaming Operations
Checkpointing
Window Operations
Stateful Transformations
Streaming Cardinality Estimation
Checkpointing
Advanced Window Operations
Streaming
Stateful Transformations
Operations
Streaming Cardinality Estimation
Checkpointing
7 1 6 3 2 4
RDD RDD
7 1 6 3 2 4
• Window Size
• Slide Interval
2 Seconds
7 1 6 3 2 4 Original DStream
2 Second (Slide)
y b
x
6 Second (Window Size)
12
t
a
14 10 11 Windowed DStream
Stateful Transformations
DStream
/ 5
4
Key State
F E D C B A
Current State
RDD RDD
Key State
C ( 3 , 8 , 6 )
myStream.mapWithState(…).print()
D ( 12 , 4 , 7 )
myStream.mapWithState(…).stateSnapshots().print()
Cardinality Estimation using HyperLogLog
HLL - Cardinality estimation
Unique observations
Timeframe expanded
Cardinality Estimation using HyperLogLog
HLL - Cardinality estimation
Unique observations
Naïve solution 3 Billion events > 60 GB of memory
HLL solves this in < 100s of KB
Based on bit pattern observables
Associative Data Structure
HLL HLL HLL
Summary
§ Window Operations
§ Stateful Transformations
o updateStateByKey
Streaming Ingest with
Apache Kafka
o mapWithState