No Silver Bullet: The Slightly Less Painful Way To Sharding

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

No Silver Bullet

The Slightly Less Painful Way to Sharding

Kick ass on bikes


Heterogeneous DBA Pretty good with Oracle too. Member of the Oak Table Network. 11 years with a pager and counting

2009/2010 Pythian

Pythian

Global leader in database and applications infrastructure services

Unmatched expertise in Oracle, Oracle Apps, MySQL, SQL Server


Founded in 1997, employ 90+ DBAs in offices worldwide who support over 140 clients across the globe 24/7

Recent Oracle Exadata implementation at LinkShare Corporation in NYC


2009/2010 Pythian

Silver Bullet vs. Bag of Tools

2009/2010 Pythian

Agenda

Why Avoid Sharding How to Avoid Sharding Living with Shards Tools

2009/2010 Pythian

Sad and true story: My First Sharded Architecture

2009/2010 Pythian

SaaS Monitoring Service


More Customers More Monitors More Data per Minute More Stored Data

2009/2010 Pythian

Version I

Version II

Metadata Customer Data + Metadata Todays Data Last Week Data

Old Data
Very Old Data

2009/2010 Pythian

Version III
Metadata
Todays Data Last Week Data Old Data Very Old Data Todays Data Last Week Data Todays Data Last Week Data

Old Data
Very Old Data

Old Data
Very Old Data

2009/2010 Pythian

Version VI
MetaMetadata Metadata
Todays Data Last Week Data Old Data Very Old Data Todays Data Last Week Data

Metadata
Todays Data Last Week Data Old Data Very Old Data

Old Data
Very Old Data

2009/2010 Pythian

This has given us the luxury of building against a NOSQL database, which means we can put the horrors of MySQL sharding and expensive scalability behind us.
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/

2009/2010 Pythian

Why Shard?

Big Data Lots of writes Some applications shard naturally Easier to test and plan performance How else would I scale?

2009/2010 Pythian

Sharding Pain Index


Paypal, banks

Reddit, eBay, Linked-In SaaS, SalesForce, Ning


2009/2010 Pythian

Incidental Pain

Inherent Pain

Queries are limited to shards More databases to manage More databases to manage

Cross-shard DML means 2PC Performance can still suck Application changes More overall downtime

2009/2010 Pythian

How to Avoid Sharding:

Quick Tips for Better Performance

2009/2010 Pythian

How to avoid sharding?


Less Data Split Databases Protect the Database Bigger Hardware

2009/2010 Pythian

Query Less Data


Get rid of old data Aggregate data Offload data Partition tables

2009/2010 Pythian

Split Databases

Separate OLTP from DW Other logical separations: Separate DB for separate functionality NoSQL for key-value data Document DB Lucene

2009/2010 Pythian

Protect the Database


The DB is the worst place to control concurrency

Read Slaves Cache Small connection pools Queues Fail Whales

2009/2010 Pythian

Living with Shards

2009/2010 Pythian

Avoid Distributed Transactions


They will drive you to NoSQL

2009/2010 Pythian

But teacher, I sharded and performance still sucks

2009/2010 Pythian

Capacity Planning

When to add new shards When to rebalance Talk to marketing Monitor Model the data Plan the resharding

2009/2010 Pythian

adding manpower to a late software project makes it later


Brooks Law

2009/2010 Pythian

The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.
Amdahl Law

2009/2010 Pythian

Performance Testing and the dangers of :

Extrapolation

2009/2010 Pythian

Application Changes

Choose sharding key Find the shard Survive missing shards Multi-versioning Remove distributed transactions App servers per shard?

2009/2010 Pythian

More Downtime
If a server crashes once a year 100 Servers crash every 3 days

2009/2010 Pythian

Tools

2009/2010 Pythian

Queries are limited to shards


Flexviews (AKA Materialized views) Broadcast queries Map/Reduce your queries Data warehouse

2009/2010 Pythian

Manage More Databases


Standardize configurations Central config Manage monitoring Build your own tools

2009/2010 Pythian

Shard Management

Central reports Central management Purging tool Move customers Add / remove shards Versioning Upgrades

2009/2010 Pythian

Summary:
Dont Shard Prepare for pain

Use the right tools

2009/2010 Pythian

Consistent Hashing

Map each node to several virtual nodes Sort virtual nodes by ID Each virtual node responsible for data with keys smaller than its ID. Each node responsible for data in its virtual nodes

2009/2010 Pythian

7 F

2 4

E D 3

2009/2010 Pythian

Removing a node: Virtual nodes are deleted from list Adding a node Add new virtual nodes to list Data is remapped to nearest virtual node Which moves it to a new physical node Only relevant data is moved

2009/2010 Pythian

Load Balancing: More virtual nodes for strong machines Remove virtual nodes to reduce load Replication: Map data to N virtual nodes Write to R nodes Read from W nodes

2009/2010 Pythian

You might also like