Professional Documents
Culture Documents
No Silver Bullet: The Slightly Less Painful Way To Sharding
No Silver Bullet: The Slightly Less Painful Way To Sharding
No Silver Bullet: The Slightly Less Painful Way To Sharding
2009/2010 Pythian
Pythian
2009/2010 Pythian
Agenda
Why Avoid Sharding How to Avoid Sharding Living with Shards Tools
2009/2010 Pythian
2009/2010 Pythian
More Customers More Monitors More Data per Minute More Stored Data
2009/2010 Pythian
Version I
Version II
Old Data
Very Old Data
2009/2010 Pythian
Version III
Metadata
Todays Data Last Week Data Old Data Very Old Data Todays Data Last Week Data Todays Data Last Week Data
Old Data
Very Old Data
Old Data
Very Old Data
2009/2010 Pythian
Version VI
MetaMetadata Metadata
Todays Data Last Week Data Old Data Very Old Data Todays Data Last Week Data
Metadata
Todays Data Last Week Data Old Data Very Old Data
Old Data
Very Old Data
2009/2010 Pythian
This has given us the luxury of building against a NOSQL database, which means we can put the horrors of MySQL sharding and expensive scalability behind us.
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/
2009/2010 Pythian
Why Shard?
Big Data Lots of writes Some applications shard naturally Easier to test and plan performance How else would I scale?
2009/2010 Pythian
Incidental Pain
Inherent Pain
Queries are limited to shards More databases to manage More databases to manage
Cross-shard DML means 2PC Performance can still suck Application changes More overall downtime
2009/2010 Pythian
2009/2010 Pythian
2009/2010 Pythian
Get rid of old data Aggregate data Offload data Partition tables
2009/2010 Pythian
Split Databases
Separate OLTP from DW Other logical separations: Separate DB for separate functionality NoSQL for key-value data Document DB Lucene
2009/2010 Pythian
2009/2010 Pythian
2009/2010 Pythian
2009/2010 Pythian
2009/2010 Pythian
Capacity Planning
When to add new shards When to rebalance Talk to marketing Monitor Model the data Plan the resharding
2009/2010 Pythian
2009/2010 Pythian
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.
Amdahl Law
2009/2010 Pythian
Extrapolation
2009/2010 Pythian
Application Changes
Choose sharding key Find the shard Survive missing shards Multi-versioning Remove distributed transactions App servers per shard?
2009/2010 Pythian
More Downtime
If a server crashes once a year 100 Servers crash every 3 days
2009/2010 Pythian
Tools
2009/2010 Pythian
Flexviews (AKA Materialized views) Broadcast queries Map/Reduce your queries Data warehouse
2009/2010 Pythian
Standardize configurations Central config Manage monitoring Build your own tools
2009/2010 Pythian
Shard Management
Central reports Central management Purging tool Move customers Add / remove shards Versioning Upgrades
2009/2010 Pythian
Summary:
Dont Shard Prepare for pain
2009/2010 Pythian
Consistent Hashing
Map each node to several virtual nodes Sort virtual nodes by ID Each virtual node responsible for data with keys smaller than its ID. Each node responsible for data in its virtual nodes
2009/2010 Pythian
7 F
2 4
E D 3
2009/2010 Pythian
Removing a node: Virtual nodes are deleted from list Adding a node Add new virtual nodes to list Data is remapped to nearest virtual node Which moves it to a new physical node Only relevant data is moved
2009/2010 Pythian
Load Balancing: More virtual nodes for strong machines Remove virtual nodes to reduce load Replication: Map data to N virtual nodes Write to R nodes Read from W nodes
2009/2010 Pythian