Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

How Python, TurboGears, and MongoDB

are Transforming SourceForge.net

Rick Copeland
http://blog.pythonisito.com
@rick446
rick@geek.net
Geeknet, page 1 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat
SourceForge, GeekNet,
wha...?

“Mature”
Old Stuff

Geeknet, page 2 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


How do we make things
exciting again?

Geeknet, page 3 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Python!

● Get it done well


● Get it done fast

Geeknet, page 4 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


First Project: FossFor.us

User Editable!

Not Ugly!
Web 2.0!
(ish)

Geeknet, page 5 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


FossFor.us Technology

● Get it done well


● Get it done fast

Geeknet, page 6 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Mark Ramm Hired
Halfway Through FossFor.us

Geeknet, page 7 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


But isn't he the TurboGears
guy?

He does
prefer
TurboGears

Geeknet, page 8 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


FossFor.us Went Well

● Small Team (2-3 programmers)


● Great framework/tools
● Flexibility to implement fast
● Could you guys give the same “spit-
n-polish” to SourceForge.net itself?

Geeknet, page 9 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


BUT

● SourceForge.net is written
in PHP... why not just
upgrade the PHP?
“SourceForge was written in the
best technology 1998 had to offer”

Geeknet, page 10 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


So Another
Django Site,
Then?
TurboGears 2.0!

Geeknet, page 11 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


But Seriously, Why TG?
Easy to rip out what we don't need;
most SF.net requests need:
● No authentication/authorization

● No widgets

● No admin

● No relational DB

● Lightweight session (stored in cookie)

Plays Nicely with others via WSGI


middleware
Geeknet, page 12 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat
Database: NoSQL
● Fast, fast, fast!
● Handle large data
sets
● Simple replication
● No need for
disconnected
replication /
synchronization

Geeknet, page 13 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Templates
● We liked Django
template syntax but
we wanted....
● More speed
● A bit more logic in the
template
● To not import all of
Django for the sake
of the template
engine

Geeknet, page 14 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


BUT
● We're only doing the “consumer” side of the site
● Need to interface with legacy PHP code
● SQL-based queues are too slow

Geeknet, page 15 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Everything's Going Just Fine
For Most Projects...

Geeknet, page 16 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


...Unless They Have a Lot of
Releases
● MongoDB is document-oriented
● So we saved all the project info in one document
● When a project has lots of releases, we quickly
overflow the max 4MB document size
● Also, large amounts of project data being fetched
continuously
● FIX 1: Split Projects from ProjectReleases
● FIX 2: Cache Project records in memory

Geeknet, page 17 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


The Day the Servers Died
● Big release – Project File System
replaces File Release System
● Most SF.net engineers attending
OSCON
● Break away from sessions for the
deployment....

CRASH!

Geeknet, page 18 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


But.... we tested it!
● Almost. Every. Part.
● Missed the download redirector –
click to download took you back to
the browse screen
● Click → redirect → click again →
and again → and again
● At one point, 5700 open
connections
Geeknet, page 19 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat
Good News: We Didn't Fall
Over

● Well, we didn't completely fall over


● Some 503s, but some people got through

● Still not what we'd like, but it let us fix the

redirector online and get people their


downloads sooner rather than later.

Geeknet, page 20 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Scaling Lessons Learned
● Hardware load balancers are nice (but you can probably
get by just fine with LigHTTPd or nginx)
● Keep a local MongoDB slave on each web server
● In-process, thread-safe memory caches for frequently hit
objects. (Tough to get cache invalidation right!)
● mod_wsgi – multiprocess + multithreaded, auto restart
every 2500-5000 requests to keep mem usage low
● 4 servers, 8 cores each, 8 GB each
● Run 6 TG processes per server, reserve 2 CPUs for
whatever
● We were memory-bound

Geeknet, page 21 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Dead Ends? Memcached
● Great idea – shared memory cache for
multiple machines
● Overhead: Network
● Overhead: Serialization / Deserialization
● MongoDB is really fast anyway (almost
as fast as Memcached)
● In-process Python dict is the fastest
possible Python cache anyway

Geeknet, page 22 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Where to Now?
● SourceForge releases Ming, a Python ORM-like
library for MongoDB, under the MIT license
● (http://sourceforge.net/projects/merciless)

● Continuing to rewrite more and more of our sites in


Python, TG2, and MongoDB (they are our strategic
tools)
● Be on the lookout for new announcements, new

features, and new open source software from


SourceForge.net
● “I'm not dead yet – I feel happy!”

Geeknet, page 23 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat


Questions?
Contact

Rick Copeland
http://blog.pythonisito.com
@rick446
rick@geek.net

Geeknet, page 25 SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat

You might also like