Welcome to Scribd!

DVPmysqlucFederation at Flickr: Doing Billions of Queries Per Day

Uploaded by

95% found this document useful (22 votes)

30K views26 pages

Dathan Vance Pattishall presents on Flickr's transition to a federated database architecture to address scalability problems. The previous master-slave topology led to slave lag and single points of failure. The new design uses database shards in active-master replication rings for high availability and writes. A global ring maps user/photo IDs to shards. PHP logic distributes queries across shards. This allows Flickr to handle billions of queries per day across 12TB of user data served from inexpensive, commodity hardware.

Original Description:

Federation at Flickr: Doing Billions of Queries Per Day

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

95% found this document useful (22 votes)

30K views26 pages

DVPmysqlucFederation at Flickr: Doing Billions of Queries Per Day

Uploaded by

Dathan Vance Pattishall

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 26

Search inside document

Federation at Flickr: Doing

Billions of Queries Per Day

Dathan Vance Pattishall
Who am I ?
• Name: Dathan Vance Pattishall
• Job: I’m the Flickr Database guy
• Things I do: A little bit of everything.
Contents
• What Flickr was Having Problems with
• Design to fix problems
• History
• Solution & Design decisions
• Stats
• Wrap Up
What Problems was Flickr Having?
• Master Slave Topology
• Slave Lag
• Multiple SPOF
• Unable to keep up with demand
• Unable to Serve Search Traffic
• Multiple Second page load times
Design to attain Goal
• Since write intensive, need more then 1
master, need many write points.
• To get rid of SPOFs - be redundant.
• To allow maintenance real-time, traffic
needs to stick to servers, and ‘a’ server
needs to be able to handle the all traffic.
• To serve pages fast with many queries
need small data that fits in memory.
History
• 1999 AuctionWatch
– ACP in a BOX
• 2003 Friendster
– Project S00K
• 2005 Flickr
– Federation
Federation Key Components
• Shards
• Global Ring
• PHP logic to connect to the shards and
keep the data consistent
Shards
• Shards are a slice of a main database
• Shards are set up in Active Master-Master
Ring Replication
– Done by sticking a user to a server in a shard
– Shard assignments are from a random
number for new accounts
– Migration is done from time to time
– Can run on any hardware grade
Global Ring
• a.k.a Lookup Ring
– For stuff that can’t be federated
– Like where stuff is
Owner_id  SHARD-ID
Photo_id  Owner_id
Group_id  SHARD-ID

Then cached in MEMCache (lasts ½ hour)

Clicking a Favorite
• Pulls the Photo owners Account from Cache, to
get the shard location.
– SHARD-5
• Pulls my Information from Cache, to get my
shard location
– SHARD-13
• Starts a “Distributed transaction”
– To answer the question:
• Who favorited Rebekka’s Photo?
• What are Dathan’s Favorites
Getting Rid of Replication Lag
• On every page load the user is assigned
to a bucket
$id = intval(substr($user_id, -10));
$id % $count_of_hosts_in_shard
• If host is down, go to the next host in the
list
• If all hosts are down, display error page
Allow for maintenance
• Each server in a Shard is 50% loaded
– i.e. 1 server in a shard can take the full load if
a server of that shard is down or in
maintenance mode
• Shut down ½ the servers in each Shard
• Do the DBA thing
• Bring them up do the other ½
Stats
Stats continued
• Running within capacity threshold
– Over 36K per second
• Allow for Burst of Traffic up to
– Over double 36K per second
• Add more Shards grow
– 2 to 6K queries / second shard added
• Each Shard holds 400K+ users data
• 1 person (me) handles it all
What about Search
• Two search back-ends.
– Shards 35K qps on a few shards
– Yahoo proprietary websearch
• Owner single tag search goes to the
Shards due to real-time requirements
• All other search goes to a Yahoo Search
backend.
Hardware
• EMT64 Running RHEL-4 with 16GB of
RAM and 6 DISK 15K RPM RAID-10
• Any class server can be used, users per
shard just has to be adjusted.
• Cost per Query = soooo little it should be
considered 0.
• Data Size: 12 TB of user data
• Able to serve the traffic: Priceless
If have time share some quick tips
• Swapiness set to 0
• In RHEL-4 (i.e. 2.6 Linux Kernel) run DEADLINE
I/O scheduler
• Use RAID-10
• Use Battery Back cache
• Use 64-bit
• Use Memory lots of it, but leave enough for the
OS to spawn many threads
• Tune Queries
Things that would Help Flickr
• Multi-Master replication
• Thread Bug in INNODB for 4.1 release
already
• Optimization for OR queries, Bitwise, Sets,
etc.
Questions?

MySQL Scaling and High Availability Architectures
Document57 pages
MySQL Scaling and High Availability Architectures
britt
100% (8)
2021 Oct
Document64 pages
2021 Oct
Ahm Ferdous
100% (1)
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
Document37 pages
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
Dathan Vance Pattishall
No ratings yet
Flickr Architecture Presentation
Document41 pages
Flickr Architecture Presentation
rare2109
100% (13)
RHCA Syllabus
Document13 pages
RHCA Syllabus
Challagolla Govardhan
No ratings yet
FreeRADIUS Beginner's Guide
From Everand
FreeRADIUS Beginner's Guide
Dirk van der Walt
No ratings yet
Squid Proxy Server 3.1 Beginner's Guide
From Everand
Squid Proxy Server 3.1 Beginner's Guide
Kulbir Saini
Rating: 3 out of 5 stars
3/5 (1)
Hadoop: Data Processing and Modelling
From Everand
Hadoop: Data Processing and Modelling
Garry Turkington
No ratings yet
Vineet Gupta - GM - Software Engineering - Directi: Intelligent People. Uncommon Ideas
Document73 pages
Vineet Gupta - GM - Software Engineering - Directi: Intelligent People. Uncommon Ideas
abhinavrohatgi
No ratings yet
Apache Con
Document15 pages
Apache Con
indorate
No ratings yet
Impala Presentation - Orlando PDF
Document60 pages
Impala Presentation - Orlando PDF
VinkeyChen
No ratings yet
Hadoop Important Lecture
Document38 pages
Hadoop Important Lecture
affanabbasi015
No ratings yet
Gavin M. Roy: myYearBook - Com Architecture (Highload++, Moscow, Russia, October 2008)
Document30 pages
Gavin M. Roy: myYearBook - Com Architecture (Highload++, Moscow, Russia, October 2008)
Nikolay Samokhvalov
100% (1)
The Big Data Ecosystem at LinkedIn Presentation 1
Document33 pages
The Big Data Ecosystem at LinkedIn Presentation 1
Asep Sukmayadi Djaka
No ratings yet
Common Exadata Mistakes: Andy Colvin Practice Director, Enkitec IOUG Collaborate 2014
Document49 pages
Common Exadata Mistakes: Andy Colvin Practice Director, Enkitec IOUG Collaborate 2014
Mohammad Abdul Azeez
No ratings yet
Boost Drupal Performance by Rumen Yordanov
Document40 pages
Boost Drupal Performance by Rumen Yordanov
segments
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
Document53 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
jainam dude
No ratings yet
Airbnb
Document55 pages
Airbnb
Ashish Bindal
No ratings yet
Hadoop, A Distributed Framework For Big Data
Document55 pages
Hadoop, A Distributed Framework For Big Data
HARISH REDDY B
No ratings yet
A Practical Guide To Oracle 10g RAC Its REAL Easy!: Gavin Soorma, Emirates Airline, Dubai Session# 106
Document113 pages
A Practical Guide To Oracle 10g RAC Its REAL Easy!: Gavin Soorma, Emirates Airline, Dubai Session# 106
manjubashini_suria
No ratings yet
Lecture 1
Document55 pages
Lecture 1
George Okemwa
No ratings yet
Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino Aen@
Document36 pages
Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino Aen@
Eveline Solyom
No ratings yet
DW - Bigdata9
Document113 pages
DW - Bigdata9
ujjwal subedi
No ratings yet
Performance Scenario Sudden Slowdown On Rac
Document45 pages
Performance Scenario Sudden Slowdown On Rac
behanchod
No ratings yet
Hadoop, A Distributed Framework For Big Data
Document55 pages
Hadoop, A Distributed Framework For Big Data
sonia choudhary
No ratings yet
Module 2-1 - Basic-Technical-For-Digital-Forensics
Document50 pages
Module 2-1 - Basic-Technical-For-Digital-Forensics
dungnthe172688
No ratings yet
The Bigdawg Polystore: by Michael Stonebraker
Document23 pages
The Bigdawg Polystore: by Michael Stonebraker
Luk Vervenne
No ratings yet
Dec 2008 - Content Server Performance Tuning
Document48 pages
Dec 2008 - Content Server Performance Tuning
Tarek Sakr
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
Document55 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
Pratheesh Kumar
No ratings yet
Optimizing The Webserver
Document47 pages
Optimizing The Webserver
Bionito Christanto
No ratings yet
Exadata Database Machine
Document20 pages
Exadata Database Machine
Arindam Majumdar
No ratings yet
What Makes Play Framework Fast
Document33 pages
What Makes Play Framework Fast
ptgrupos
No ratings yet
Global Catalog and Flexible Single Master Operations (FSMO) Roles
Document30 pages
Global Catalog and Flexible Single Master Operations (FSMO) Roles
addislibro
No ratings yet
First Derivatives In-Memory Databases: Peter Storeng
Document34 pages
First Derivatives In-Memory Databases: Peter Storeng
diegoes
No ratings yet
Lessons Learned Building A Web 2.0 Application Using Mysql
Document50 pages
Lessons Learned Building A Web 2.0 Application Using Mysql
Oleksiy Kovyrin
100% (3)
Best Practices For MySQL Scalability Slides
Document25 pages
Best Practices For MySQL Scalability Slides
Dojak Dojak
No ratings yet
Hadoop Training in Bangalore
Document31 pages
Hadoop Training in Bangalore
kellytechnologies
No ratings yet
eDNS01 DNSConcepts
Document22 pages
eDNS01 DNSConcepts
Shyam Khadka
No ratings yet
Basic Concepts
Document84 pages
Basic Concepts
tuan anh
No ratings yet
Facebook Wall Data Using Graph API
Document55 pages
Facebook Wall Data Using Graph API
Cleilson Pereira
No ratings yet
Databricks, An Introduction: Chuck Connell, Insight Digital Innovation
Document36 pages
Databricks, An Introduction: Chuck Connell, Insight Digital Innovation
Saravanan1234567
No ratings yet
Database Scalability: Jonathan Ellis
Document49 pages
Database Scalability: Jonathan Ellis
bbo0t
No ratings yet
Key Technical Concepts
Document69 pages
Key Technical Concepts
Ain Anuar
No ratings yet
MySQL Performance Basics - BeCamp 2008
Document19 pages
MySQL Performance Basics - BeCamp 2008
kaplumb_aga
No ratings yet
Youtube Architecture
Document25 pages
Youtube Architecture
ವಿಜೇತ್ಹೆಗಡೆ
No ratings yet
MySQL Performance Tuning Step by Step
Document36 pages
MySQL Performance Tuning Step by Step
Jairo Serrano
No ratings yet
Linux Ldap Authentication
Document21 pages
Linux Ldap Authentication
shekhar785424
No ratings yet
SQL Server Administration PDF
Document13 pages
SQL Server Administration PDF
alex
No ratings yet
Commoncrawlpresentation 101027182938 Phpapp02
Document17 pages
Commoncrawlpresentation 101027182938 Phpapp02
Manoj Kumar Maurya
No ratings yet
Enkitec-DIY Exadata KerryOsborne
Document26 pages
Enkitec-DIY Exadata KerryOsborne
tssr2001
No ratings yet
Understanding Servers: It4Gis Keith T. Weber, GISP GIS Director ISU-GIS Training and Research Center
Document21 pages
Understanding Servers: It4Gis Keith T. Weber, GISP GIS Director ISU-GIS Training and Research Center
Lemar Du
No ratings yet
Large Datasets in MySQL On Amazon EC2
Document30 pages
Large Datasets in MySQL On Amazon EC2
Oleksiy Kovyrin
No ratings yet
Performance Tuning
Document48 pages
Performance Tuning
Lokesh Singh
No ratings yet
Hadoopintro
Document31 pages
Hadoopintro
Dhanraj Kannan
No ratings yet
Cacti 0.8 Beginner's Guide
From Everand
Cacti 0.8 Beginner's Guide
Thomas Urban
No ratings yet
Oracle Data Guard 11gR2 Administration Beginner's Guide
From Everand
Oracle Data Guard 11gR2 Administration Beginner's Guide
Emre Baransel
No ratings yet
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
Domain-Driven Laravel: Learn to Implement Domain-Driven Design Using Laravel
From Everand
Domain-Driven Laravel: Learn to Implement Domain-Driven Design Using Laravel
Jesse Griffin
No ratings yet
RPG TnT: 101 Dynamite Tips 'n Techniques with RPG IV
From Everand
RPG TnT: 101 Dynamite Tips 'n Techniques with RPG IV
Bob Cozzi
Rating: 5 out of 5 stars
5/5 (1)
Relayd and Httpd Mastery: IT Mastery, #11
From Everand
Relayd and Httpd Mastery: IT Mastery, #11
Michael W. Lucas
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
IBC Section Wise Juris
Document99 pages
IBC Section Wise Juris
Dharmvir
No ratings yet
LKS
Document91 pages
LKS
Seziaji Supriyanto
No ratings yet
A Major ResearchProposal Group4 BSN3D v1.0 20210117
Document23 pages
A Major ResearchProposal Group4 BSN3D v1.0 20210117
Iosif Cade
No ratings yet
Stair Design
Document7 pages
Stair Design
Rifat Bin Kamal
No ratings yet
LG Www. Radioradar - Net
Document11 pages
LG Www. Radioradar - Net
nidzat
0% (1)
??? Chris "The Legend" Sain - Journey To $100k
Document9 pages
??? Chris "The Legend" Sain - Journey To $100k
engkj
No ratings yet
Low Cost With Care: Convenience
Document9 pages
Low Cost With Care: Convenience
Z C
No ratings yet
Aircrew Operating Instructions For The Indigo One Satellite Tracking and Communication System
Document18 pages
Aircrew Operating Instructions For The Indigo One Satellite Tracking and Communication System
laurent
No ratings yet
Mystic Mongolia: Join in Group Tour
Document8 pages
Mystic Mongolia: Join in Group Tour
tsmunkhtungalag
No ratings yet
Infor CRM 8.3 Compatibility Matrix - Apr2016
Document56 pages
Infor CRM 8.3 Compatibility Matrix - Apr2016
Malay Mukherjee
No ratings yet
Lecture 08 - Induction Machines
Document18 pages
Lecture 08 - Induction Machines
Karthik V
No ratings yet
Ilr April 2013 PDF
Document202 pages
Ilr April 2013 PDF
snkulkarni_ind
No ratings yet
Occupational and Environmental Health: Prepared by Ms Yasmeen Bibi MR Asif Shah
Document67 pages
Occupational and Environmental Health: Prepared by Ms Yasmeen Bibi MR Asif Shah
Aamir
No ratings yet
Melt Pressure Transmitters Ke Series Performance Level C': Output 4... 20ma
Document6 pages
Melt Pressure Transmitters Ke Series Performance Level C': Output 4... 20ma
edgar covarrubias
No ratings yet
Subashnagar Miraj-Malgaon Raod Subashnagar Maharashtra - 416410, India Contact: 9922199978/9850985508
Document1 page
Subashnagar Miraj-Malgaon Raod Subashnagar Maharashtra - 416410, India Contact: 9922199978/9850985508
stamboli9
No ratings yet
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
Document13 pages
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
bob sedge
No ratings yet
BU255 Mock Final Exam Solution
Document13 pages
BU255 Mock Final Exam Solution
Ravyyy G
No ratings yet
Newsletter
Document3 pages
Newsletter
Rupal Nayal
No ratings yet
Pepsi Co
Document2 pages
Pepsi Co
Keine Lee
No ratings yet
Academic Rules and Regulations Updated in March 2019
Document28 pages
Academic Rules and Regulations Updated in March 2019
Akarshit Nandeshwar
No ratings yet
Major Challenges To The Effective Management of Hu
Document16 pages
Major Challenges To The Effective Management of Hu
Russell Massey
No ratings yet
Group Assignment PDF
Document5 pages
Group Assignment PDF
CWChen
No ratings yet
2016 - Storage-Reserve Sizing With Qualified Reliability For Connected High Renewable Penetration Micro-Grid (IEEE) - (Dong, Gao, Guan, Zhai, Wu)
Document12 pages
2016 - Storage-Reserve Sizing With Qualified Reliability For Connected High Renewable Penetration Micro-Grid (IEEE) - (Dong, Gao, Guan, Zhai, Wu)
JorgeCanoRamirez
No ratings yet
Philips Lx600 Manual Do Utilizador
Document21 pages
Philips Lx600 Manual Do Utilizador
Paulo Masselli
No ratings yet
OxnardPD ThousandOaks Borderline Dispatch
Document8 pages
OxnardPD ThousandOaks Borderline Dispatch
Thomas Busse
No ratings yet
India Grape Variety
Document33 pages
India Grape Variety
zikra arahman
No ratings yet
3RAY ECG-5512G ECG MACHINE 12 Channel NEW
Document4 pages
3RAY ECG-5512G ECG MACHINE 12 Channel NEW
John Joseph Lu
No ratings yet
Gcos 227 en
Document62 pages
Gcos 227 en
najibnja
No ratings yet
Nor 66243510668
Document1 page
Nor 66243510668
herysyam1980
No ratings yet