DAT341 - Working With Amazon ElastiCache For Redis

AWS re:INVENT
Working with Amazon ElastiCache

for Redis
Michael Labib, Specialist SA, AWS
November 28, 2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to expect from this session
1) Amazon ElastiCache overview
2) Amazon ElastiCache Usage Patterns
3) Best practices
4) Caching strategies
5) Hands-on workshop Prerequisites
• Lab 1: Performance testing • Have your own laptop
• Lab 2: Working with Amazon ElastiCache for Redis • Have an AWS account set up
• Have installed the latest AWS Command Line
Interface (AWS CLI) tool
Amazon ElastiCache overview
In-memory key-value store supporting
• Redis 3.2.10
• Memcached 1.4.34
High-performance
Fully managed; zero admin
Amazon Highly available and reliable
ElastiCache Hardened by Amazon
Low HDFS
Amazon
Amazon S3 Glacier
Structure
Amazon
ElastiCache
and Amazon Amazon
DynamoDB Amazon
DynamoDB CloudSearch and
Accelerator
Amazon Elasticsearch
(DAX) Service
High Amazon
RDS
High Low
Request rate
Low High
Latency
Low High
Data volume
Redis overview
Ridiculously fast! In-memory data structure server

<1ms latency for most commands
Powerful
Open source ~200 commands + Lua scripting
Utility data structures

Persistence Strings, lists, hashes, sets, sorted sets,
bitmaps & HyperLogLogs
Highly available Simple

replication
Atomic operations
supports transactions
But wait, there’s more!
Run Lua scripts Geospatial queries Pub/sub
Redis topologies
Vertically Scaled
Cluster Mode Disabled
Slot 0
Keyspace
Max Storage 407 GiB
Primary Endpoint Slot 1 …
Slot 16383
I Primary 0–5 Replicas
Horizontally Scaled
Cluster Mode Enabled
Max Storage 6+ TiB Slot 0-5461
Keyspace
Configuration Endpoint Slot 5462-10922
Slot 10923-16383
0–5 Replicas
1–15 Primaries/Shards
Redis cluster-mode enabled vs. disabled
Feature Enabled Disabled
Failover 15–30 sec ~1.5 min

(Non-DNS) (DNS-based)
Failover risk • Writes affected—partial dataset (less risk with • Writes affected on entire dataset
more partitions) • Reads available
• Reads available
Performance Scales with cluster size 6 nodes (1 primary + 0–5 replicas)
(90 nodes—15 primaries + 0–5 replicas per shard)
Max connections • Primaries (65,000 x 15 = 975,000) • Primary: 65,000
• Replicas (65,000 x 75 = 4,875,000) • Replicas: (65,000 x 5 = 325,000)
Storage 6+ TiB 407 GB
Cost Smaller nodes but more $$ Larger nodes less $
Example: Assume
workload needs 175 GB 9 x cache.r3.xlarge ($0.455hr) = $4.095 hr 255.6 GB 1 X cache.r3.8xlarge = $3.640, 237 GB
Closer look at cluster-mode enabled
Redis cluster: automatic client-side sharding
 16384 hash slots per cluster
 Slot for a key is CRC16(key) mod 16384
 Slots are distributed across the cluster into shards
 Developers must use a Redis cluster aware client
 Clients are redirected to the correct shard
 Smart clients store a map
S1 Shard S1 = slots 0–3276

Shard S2 = slots 3277–6553
Shard S3 = slots 6554–9829
S5 S2 Shard S4 = slots 9830–13106
Shard S5 = slots 13107–16383
S4 S3
Client
Redis cluster—architecture
Example: 3-shard cluster,
2 read replicas
Redis cluster—multi-AZ
A cluster consists of 1 to 15 shards
Redis Cluster
slots 0–5454 slots 5455–10909 slots 0–5454 slots 0–5454 slots 5455–10909
slots 5455–10909
slots 10910–16363 slots 10910–16363

slots 10910–16363
Availability Zone A Availability Zone B Availability Zone C
Each shard has a primary node
Shard and up to five replica nodes
Redis Cluster
Primary Replica Replica
slots 5455–10909
slots 10910–16363 slots 10910–16363

slots 10910–16363
Each shard has a primary node
Shard and up to five replica nodes
Redis Cluster
Replica Primary Replica
slots 5455–10909
slots 10910–16363 slots 10910–16363

slots 10910–16363
Shard Each shard has a primary node
and up to five replica nodes
Redis Cluster
slots 5455–10909
Replica Replica Primary
slots 10910–16363 slots 10910–16363

slots 10910–16363
Scenario 1: Single primary failure
Redis Cluster
slots 0–5454 slots 5455–10909

slots 0–5454 slots 5455–10909 slots 0–5454 slots 5455–10909
slots 10910–16363 slots 10910–16363 slots 10910–16363
Scenario 1: Single primary failure
Mitigation:
1. Automatic failure detection and replica promotion (~15–30 s)
2. Repair failed node
Redis Cluster
slots 0–5454 slots 5455–10909

Scenario 2: Majority of primaries fail
Redis Cluster
slots 0–5454 slots 5455–10909

Scenario 2: Majority of primaries fail
Mitigation: Redis enhancements on ElastiCache
• Automatic failure detection and replica promotion
• Repair failed nodes
Redis Cluster
slots 0–5454 slots 5455–10909

Resizing via backup and restore Pro tip: DR strategy—enable
CRR on Amazon S3 bucket
triggering AWS Lambda function
to hydrate destination cluster
Downtime
rdb
New writes
not in
snapshot
3 Shards 5 Shards
Step 1 aws elasticache create-snapshot --replication-group-id redisclusterID --snapshot-name sname
aws elasticache copy-snapshot --source-snapshot-name sname --target-snapshot-name sname

Step 2 --target-bucket s3ucketname
Step 3 aws elasticache create-replication-group --replication-group-id NewRedisClusterID … --snapshot-arns

arn:aws:s3:::bucketname/redisbackup-0001.rdb, etc.
Step 4 Once the new cluster is up, update your app with new Amazon ElastiCache endpoint, then terminate old cluster
Zero-downtime Online Re-sharding
Online Re-Sharding—zero downtime
Shard 1 Shard 2 Shard 3
0-5461 5462--10922 10923-16383
Simple API
aws elasticache modify-replication-group-shard-configuration --replication-group-id rep-group-id

--apply-immediately --node-group-count 5
Scale In || Out
Online Re-Sharding—zero downtime: Scale Out

0-2909, 5462-5783, 10923-14199
10923-16383
0-5461 5462--10922
5095-5461 6876-9830
Uniform slot distribution across shards

reads/writes
No application interruption Shard 4 Shard 5
2910-5094, 5784-6875,
9831--10922 14200-16383
Online Re-Sharding—zero downtime: Scale In
0-5461 5462--10922 10923-16383
Uniform slot distribution across shards

reads/writes
No Application Interruption Shard 4 Shard 5
Online Re-Sharding—CW alarm triggered
Amazon
CloudWatch AWS SNS
MEMORY
HIGH!
AWS Lambda
…
var params = {
ApplyImmediately: true,
NodeGroupCount: 5,
ReplicationGroupId: ‘rep-group-id’,
…}
elasticache.modifyReplicationGroupShardConfiguration(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data);
}); …
Cluster Resized
35 Shards
Healthy
reads
reads/
writes search
reads
AZ1
AZ2
reads
clients
relational
data
search
cache cluster
Heavy
pressure reads
reads/
writes search
reads
AZ1
AZ2
reads
clients
relational
data
search
cache cluster
Healthy—
auto scaled reads
reads/
out writes search
reads
AZ1
AZ2
reads
clients
relational
data
search
cache cluster
Common Usage Patterns
Usage Patterns
Session APIs
Database caching
management (HTTP responses)
Streaming data
IOT analytics Pub/sub
(Filtering/aggregation)
Standalone
Social media database Leaderboards
(Sentiment analysis)
(Metadata store)
Caching Relational data
reads/
writes
Amazon
Clients RDS
mysql.lambda_async
Object data
reads/writes
write-through Amazon
Elastic Load Amazon Amazon
Balancing EC2 S3
ElastiCache
Redis
reads/
writes
DDB streams
Amazon
DynamoDB
Unstructured data
Caching NoSQL  Smaller NoSQL DB clusters needed = lower costs
 Faster data retrieval = better performance
Clients
reads
Elasticsearch
Cluster
Amazon
EC2 reads/
writes
Cassandra
Cluster
MongoDB
Cluster
Caching NoSQL databases with Amazon ElastiCache
 Smaller NoSQL DB clusters needed = lower costs
 Faster data retrieval = better performance
reads reads
Amazon
Amazon
ElastiCache
ElastiCache
Redis
Redis
Amazon Amazon
EC2 EC2
reads/ reads/
writes writes
MongoDB Cassandra
Cluster Cluster
DBObject doc = collection.findOne(); ResultSet rs = session.execute(stmt);

Cache serialized DBObject in Redis (good) Cache serialized ResultSet in Redis (good)
Cache rows in Redis hash (faster/more efficient) Cache rows in Redis hash (faster/more efficient)
Streaming data enrichment/processing
Data sources
raw cleansed
Amazon stream stream
AWS Lambda function 1 Amazon Amazon
Kinesis Kinesis Kinesis
Streams Continual data Streams Analytics
filtering/enrichment
Amazon
ElastiCache
(Redis)
Subscribers
Real-time
pub/sub
AWS Lambda function 2
Big data architectures using Redis
Collect Process
Spark Streaming Analyze

on Amazon EMR
Amazon Kinesis
Apache Kafka Store
Data Sources
Spark on
Amazon
Apache Storm Amazon EMR
on EMR ElastiCache
AWS
Amazon
Lambda
EC2
Amazon
AWS Lambda
S3 Custom
app
Amazon
Kinesis app
AWS IoT
IoT powered by ElastiCache
Direct integration
S3 DDB Kinesis
SNS Lambda SQS

AWS
IoT devices Rules Engine
Sensor store
AWS
IoT
AWS Amazon
Lambda ElastiCache
Redis
Mobile apps powered by ElastiCache
Amazon
EC2
GEORADIUS
Search points of interest

Amazon API AWS Amazon
Gateway Lambda ElastiCache
Redis
GEOADD
Update points of interest
DDB streams
Amazon
DynamoDB
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/blogs/database/amazon-elasticache-utilizing-redis-geospatial-capabilities/
Ad tech powered by ElastiCache
Clients
Advertisers Ad network
Clickstream Ad slot Ad slot Amazon

(shopping publishers Consumer ElastiCache
events) Redis
User visits Publisher

Ad placement page places ad slot
(websites/apps) for auction
Winners bid
ad displayed <40 ms
Ad network
Bidders respond calls for bids
with bids
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/caching/database-caching/
Chat apps powered by ElastiCache
Elastic
Beanstalk
Clients
PubSub
persistent
Server
WebSockets
Application Load connections
Balancer
Chat apps
Amazon
ElastiCache
Redis
SUBSCRIBE chat_channel:114
PUBLISH chat_channel:114 "Hello all"
>> ["message", "chat_channel:114", "Hello all"]
UNSUBSCRIBE chat_channel:114
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/blogs/database/amazon-elasticache-utilizing-redis-geospatial-capabilities/
Gaming—real-time leaderboards
• Very popular for gaming apps that need ZADD "leaderboard" 1201 "Gollum”
uniqueness and ordering ZADD "leaderboard" 963 "Sauron"
• Easy with Redis sorted sets ZADD "leaderboard" 1092 "Bilbo"
ZADD "leaderboard" 1383 "Frodo”
ZREVRANGE "leaderboard" 0 -1
1) "Frodo"
2) "Gollum"
3) "Bilbo"
4) "Sauron”
ZREVRANK "leaderboard" "Sauron"

(integer) 3
Rate limiting
FUNCTION LIMIT_API_CALL(APIaccesskey)
limit = HGET(APIaccesskey, “limit”)
Ex: throttling requests to an API time = CURRENT_UNIX_TIME()
keyname = APIaccesskey + ":” + time
uses Redis counters count = GET(keyname)
IF current != NULL && count > limit THEN
ERROR ”API request limit exceeded"
ELSE
MULTI
INCR(keyname)
EXPIRE(keyname,10)
EXEC
ELB PERFORM_API_CALL()
END
Externally Reference: http://redis.io/commands/INCR

facing API
Amazon ElastiCache: Database caching
strategies using Redis
What data should I cache?
What data to cache
• Reference data: product categories (user longer TTLs)
• Reference data: product images (user longer TTLs)
• Reference data: product details (user longer TTLs)
• Dynamic: database result sets: (user shorter TTLs)
• Dynamic: API responses (user shorter TTLs)
• Anything that is cacheable!
How do I cache my data?
Cache-aside—lazy loading
Cache-aside—lazy loading 1) Check cache, if HIT return
3) Update cache
1
Amazon
3 ElastiCache
2) If Cache MISS
2
Amazon
Applications RDS
Cache-aside—write-through
Cache-aside—write-through
2) Update cache
2
Amazon
ElastiCache
1) Update primary DB
1
Amazon
Applications RDS
Cache strategies
DB result set
ID First_Name Last_Name City

123 Michael Labib Chicago
SELECT * FROM x WHERE y ResultSet object (ROW) Key: query, value: CRS as byte array
Pro
When data retrieval logic is abstracted from the code consuming the ResultSet, caching the ROW can
be extremely effective and can be implemented against any RDBMS
Con
Data retrieval still requires extracting values from the ROW and does not further simplify data access, it
only reduces data-retrieval latency
Cache strategies
JSON

SELECT * FROM x WHERE y String firstName = rs.getString(First_Name) Key: 123, Value: ‘{ “firstname”: “Michael”,
“lastname”: “Labib”,
“city”: “Chicago” } ‘
Pro
Very easy to implement. Cache any desired database fields and values into a Redis string. For example,
store your retrieved data into a JSON object stored in a Redis string.
Con
Cannot get individual JSON properties
Cache strategies
Application objects

String firstName = rs.getString(First_Name);

customer.setFirstName(firstName);
SELECT * FROM x WHERE y Key: CUSTOMER_ID:123,
String lastName = rs.getString(Last_Name); Value: Customer object as byte array
customer.setLastName(lastName);
Pro
Use application objects in their native structure and data state when serialized
Cache strategies
Using Redis data structures

String firstName = rs.getString(First_Name);

rsHash.put(“firstName", firstName);
SELECT * FROM x WHERE y Key: CUSTOMER_ID:123, Value: rsHash
String lastName = rs.getString(Last_Name);
rsHash.put(“lastName", lastName);
jedis.hmset(“CUSTOMER_ID:123", rsHash);
Pro
In addition to reducing data retrieval latency, cache data into specific data structure that simplifies the
data access pattern
Caching tips
• Understand the frequency of change of underlying data

• Set appropriate TTLs on keys that match that frequency
• Choose appropriate eviction policies that are aligned with application requirements
• Isolate your cluster by purpose (for example, cache cluster, queue, standalone database, and so on)
• Maintain cache freshness with write-throughs
• Performance test and size your cluster appropriately
• Monitor Cache HIT/MISS ratio and alarm on poor performance
• Use failover API to test application resiliency
Amazon ElastiCache: Best practices
Cluster sizing best practices
• Storage—clusters should have adequate memory
• Recommended: memory needed + 25% reserved memory (for Redis) + some room for growth
(optional 10%)
• Optimize using eviction policies and TTLs
• Scale up or out before reaching max-memory using CloudWatch alarms
• Use memory optimized nodes for cost effectiveness (R4 support)
• Performance—performance should not be compromised
• Benchmark operations using Redis Benchmark tool
• For more READIOPS—add replicas
• For more WRITEIOPS—add shards (scale out)
• For more network IO—use network optimized instances and scale out
• Use pipelining for bulk reads/writes
• Consider Big(O) time complexity for data structure commands
• Cluster isolation (apps sharing key space)—choose a strategy that works for your workload
• Identify what kind of isolation is needed based on the workload and environment
• Isolation: No Isolation $ | Isolation by Purpose $$ | Full Isolation $$$
Redis benchmark tool
Open source utility to benchmark performance
example: src/redis-benchmark -h r3-xlarge-perf.foio87.0001.use1.cache.amazonaws.com -p 6379 -n -150000 -d 100
Syntax:
redis-benchmark -h <host> -p <port> -c 50 -n 1000 -d 500 –q
-c <clients>—Specifies the number of parallel connections (default 50).

-n <requests>—Specifies the number of requests (default 1000000).
-d <size>—Specifies the data size of GET and SET values in bytes.
-t <test1,test2>—Comma-separated list of tests to perform.
-q—Quiet operation, displays only the result.
Redis max-memory policies
Select a max-memory policy based on your workload needs
• noeviction: return errors when the memory limit has been reached and the client is trying to execute
commands that might result in more memory to be used
• allkeys-lru: evict keys trying to remove the less recently used (LRU) keys first
• volatile-lru: evict keys trying to remove the less recently used (LRU) keys first, but only among keys
that have an expire set
• allkeys-random: evict random keys to make space for the new data added
• volatile-random: evict random keys to make space for the new data added, but only evict keys with an
expire set
• volatile-ttl: evict only keys with an expire set, and try to evict keys with a shorter time to live (TTL) first
Key ElastiCache CloudWatch metrics
• CPUUtilization
• Memcached—up to 90% ok
• Redis—divide by cores (ex: 90% / 4 = 22.5%)
• SwapUsage low
• CacheMisses/CacheHits Ratio low/stable
• Evictions near zero
• Exception: Russian-doll caching
• CurrConnections stable
• Setup alarms with CloudWatch metrics
ElastiCache modifiable parameters
• Maxclients: 65000 (unchangeable)
• Use connection pooling
• timeout—closes a connection after it has been idle for a given interval
• tcp-keepalive—detects dead peers given an interval
• Databases: 16 (default) for non-clustered mode
• Logical partition
• Reserved-memory: 25% (default)
• Recommended
 50% of maxmemory to use before 2.8.22
 25% after 2.8.22—ElastiCache
• Maxmemory-policy:
• The eviction policy for keys when maximum memory usage is reached
• Possible values: volatile-lru, allkeys-lru, volatile-random, allkeys-random, volatile-ttl,
noeviction
Hands-on workshop overview
Lab 1: Performance testing: http://chilp.it/f0dd089
What we’re building
Performance testing lab—infrastructure VPC: REINVENT 10.0.0.0/16
topology
Test
Instance
SSH,
HTTP
REINVENT-APP-SG
Public1:
10.0.0.0/24
APACHE HTTP SERVER

REDIS 3.2 CLIENT
Test JMETER 3.2
Instance JUNIT 4 TEST APP
JAVA 1.8.0
REINVENT-RDS-SG REINVENT-EC-SG
Private1: 10.0.2.0/24
Availability Zone #1
Workshop: JMeter and JUnit tests overview
RDSLoad:testRDSLoad() (This executes only one time) Data loading
1. Drops database CustomerDB if it exists, referencing SQL stored in DropCustomerDB

2. Creates database CustomerDB, referencing SQL stored in CreateCustomerDB
3. Creates table customer, referencing SQL stored in CreateCustomerTbl
4. Loads 1000 customer records stored in CUSTOMER.sql
RedisLoad:testRedisLoad() (This executes only one time)
1. Flushes all the data in the Redis Cluster to prepare it

2. Queries the customer database table for the total count of customer records, referencing SQL stored in
CountCustomerTbl
3. Iterates over the total count and queries for each row, referencing SQL stored in SelectCustomerTbl
4. Stores each SQL ResultSet row into a Redis HashMap with the key name being the SQL statement for
that individual customer query
Stress testing: 20k requests/second
RDSBenchmark:testRDS()
1. Randomly selects a customer row to retrieve from customer IDs 1–1000
2. Retrieves the SELECT SQL STATEMENT and appends the random customer ID integer referencing SQL stored in
SelectCustomerTbl
3. Executes SQL statement and iterates/displays the fetched row
RedisBenchmark:testRedis()
1. Randomly selects a customer hashmap to retrieve from customer IDs 1–1000
2. Retrieves the SELECT SQL STATEMENT and appends the random customer ID integer referencing SQL stored in
SelectCustomerTbl
3. Executes Redis HASHMAP HGETALL command to retrieve/display the fetched hash
//Generating a random num 1-1000 representing customer ids
randomKey = Integer.toString(rand.nextInt(1000));
//Retrieving Customer Select SQL Statement & appending customer id
sql = query + randomKey;
stmt = connection.createStatement();
//executing query
rs = stmt.executeQuery(sql);
//iterating and displaying results
while( rs.next() ) { …
//Generating a random num 1-1000 representing customer ids

randomKey = Integer.toString(rand.nextInt(1000));
//Retrieving Customer Select SQL Statement & appending customer id
key = query + randomKey;
//executing command
map = jedis.hgetAll(key);
//iterating and displaying results
for (String name : map.keySet() ) { …
Hands-on workshop overview
Lab 2: Working with Amazon ElastiCache for Redis

http://chilp.it/546ec4d
Thank you!

DAT341 - Working With Amazon ElastiCache For Redis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DAT341 - Working With Amazon ElastiCache For Redis

Uploaded by

Copyright:

Available Formats

AWS re:INVENT

Working with Amazon ElastiCache

November 28, 2017

Ridiculously fast! In-memory data structure server

Utility data structures

Highly available Simple

Run Lua scripts Geospatial queries Pub/sub

Primary Endpoint Slot 1 …

Failover 15–30 sec ~1.5 min

Cost Smaller nodes but more $$ Larger nodes less $

S1 Shard S1 = slots 0–3276

slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

Primary Replica Replica

slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

Replica Primary Replica

slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

Replica Replica Primary

slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

slots 0–5454 slots 5455–10909

slots 10910–16363 slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

slots 0–5454 slots 5455–10909

slots 10910–16363 slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

slots 0–5454 slots 5455–10909

slots 10910–16363 slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

slots 0–5454 slots 5455–10909

slots 10910–16363 slots 10910–16363 slots 10910–16363

Availability Zone A Availability Zone B Availability Zone C

Step 1 aws elasticache create-snapshot --replication-group-id redisclusterID --snapshot-name sname

aws elasticache copy-snapshot --source-snapshot-name sname --target-snapshot-name sname

Step 3 aws elasticache create-replication-group --replication-group-id NewRedisClusterID … --snapshot-arns

Shard 1 Shard 2 Shard 3

0-5461 5462--10922 10923-16383

aws elasticache modify-replication-group-shard-configuration --replication-group-id rep-group-id

Shard 1 Shard 2 Shard 3

Uniform slot distribution across shards

No application interruption Shard 4 Shard 5

Shard 1 Shard 2 Shard 3

0-5461 5462--10922 10923-16383

Uniform slot distribution across shards

No Application Interruption Shard 4 Shard 5

out writes search

DBObject doc = collection.findOne(); ResultSet rs = session.execute(stmt);

AWS Lambda function 2

Spark Streaming Analyze

SNS Lambda SQS

Search points of interest

Update points of interest

Clickstream Ad slot Ad slot Amazon

User visits Publisher

ZREVRANK "leaderboard" "Sauron"

Externally Reference: http://redis.io/commands/INCR

• Anything that is cacheable!

Cache-aside—lazy loading 1) Check cache, if HIT return

ID First_Name Last_Name City

ID First_Name Last_Name City

ID First_Name Last_Name City