Professional Documents
Culture Documents
Adi Krishnan, Sr. Product Manager Amazon Kinesis: November 13, 2014 - Las Vegas, NV
Adi Krishnan, Sr. Product Manager Amazon Kinesis: November 13, 2014 - Las Vegas, NV
Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data
Digital Ad Tech./ Advertising Data aggregation Advertising metrics like coverage, yield, Analytics on User engagement with
Marketing conversion Ads, Optimized bid/ buy engines
Software/ IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational
Technology Intelligence
Financial Services Market/ Financial Transaction order data Financial market data metrics Fraud monitoring, and Value-at-Risk
collection assessment, Auditing of market order
data
Consumer Online/ Online customer engagement data Consumer engagement metrics like Customer clickstream analytics,
E-Commerce aggregation page views, CTR Recommendation engines
Amazon Kinesis
Managed Service for streaming data ingestion, and processing Aggregate and
archive to S3
Real-time
Front
dashboards
End
and alarms
Ordered stream
Authentication AZ AZ AZ of events supports
Millions of multiple readers
sources producing Authorization
Durable, highly consistent storage replicates data
100s of terabytes across three data centers (availability zones)
per hour
Amazon Web Services Machine learning
algorithms or
Inexpensive: $0.028 per million puts sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Real-time Ingest Continuous Processing FX
• Highly Scalable • Elastic
• Durable • Load-balancing incoming streams
• Elastic • Fault-tolerance, Checkpoint / Replay
• Replay-able Reads • Enable multiple processing apps in parallel
Managed Service
HTTP Post
Get* APIs
Fluentd
Building Kinesis Applications: Kinesis Client Library
Open Source library for fault-tolerant, continuous processing apps
• Java client library, also available for Python Developers
Redshift
Amazon Kinesis Connectors
• S3 Connector
– Batch writes files for archive into S3
– Uses sequence-based file naming scheme Kinesis
• Redshift Connector
– Once written to S3, loads to Redshift
– Provides manifest support
– Supports user defined transformers S3 Dynamo Redshift
DB
• DynamoDB Connector
– BatchPut appends to a table
– Supports user defined transformers
Best Practices: Processing Data From Kinesis
Build applications as part of an Auto Scaling group
• Simply helps with application availability
• Scales in response to incoming spikes in-data volume,
assuming Shards have been provisioned
• Select scaling metrics based on nature of Kinesis
application
– Instance metrics: CPU, Memory, and others
– Kinesis Metrics: PutRecord.Bytes, GetRecord.Bytes
Metric Units
PutRecord.Bytes Bytes
PutRecord.Latency Milliseconds
PutRecord.Success Count
GetRecords.Bytes Bytes
GetRecords.IteratorAge Milliseconds
GetRecords.Latency Milliseconds
Getrecords.Success Count
Best Practices: Processing Data From Kinesis
Build an flush-to-S3 consumer app
• App can specify three conditions that can trigger a buffer
flush:
– Number of records
– Total byte count
– Time since last flush
• The buffer is flushed and the data is emitted to the
destination when any of these thresholds is crossed.
# Flush when buffer exceeds 8 Kinesis records, 1 KB size limit or
when time since last emit exceeds 10 minutes
bufferSizeByteLimit = 1024
bufferRecordCountLimit = 8
bufferMillisecondsLimit = 600000
Best Practices: Processing Data From Kinesis
• In KCL app, ensure data being processed is persisted to durable store like
DynamoDB, or S3, prior to check-pointing.
• Duplicates: Make the authoritative data repository (usually at the end of the
data flow) resilient to duplicates. That way the rest of the system has a simple
policy – keep retrying until you succeed.