Debugging Ruby With MongoDB

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 87

Debugging Ruby

with MongoDB
Aman Gupta
@tmm1
Ruby developers
know...
Ruby
is

fatboyke (flickr)
Ruby loves eating RAM

37prime (flickr)
ruby allocates
memory from
the OS
memory is
broken up
into slots
each slot
holds one
ruby object
when you need an object, it’s
pulled off the freelist

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

if the freelist is empty, GC is


run

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

if the freelist is empty, GC is


run
GC finds non-reachable
objects and adds them to
the freelist

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

if the freelist is empty, GC is


run
GC finds non-reachable
objects and adds them to
the freelist

if the freelist is still empty


(all slots were in use)

a linked list called the


‘freelist’ points to all the
empy slots on the ruby heap
when you need an object, it’s
pulled off the freelist

if the freelist is empty, GC is


run
GC finds non-reachable
objects and adds them to
the freelist

if the freelist is still empty


(all slots were in use)
another heap is allocated
a linked list called the
‘freelist’ points to all the all the slots on the new
empy slots on the ruby heap heap are added to the
freelist
turns
out,
Ruby’s
GC is
also one
of the
reasons it
can be so
antphotos (flickr)
slow
Matz’ Ruby
Interpreter
(MRI 1.8)
has a...

john_lam (flickr)
Conservative
lifeisaprayer (flickr)
Stop
the
World
benimoto (flickr)
Mark
and
Sweep
michaelgoodin (flickr)
kiksbalayon (flickr)

Garbage
Collector
• conservative: the VM
hands out raw pointers to
ruby objects

• stop the world: no ruby


code can execute during GC

• mark and sweep: mark all


objects in use, sweep away
unmarked objects
more objects
=
longer GC

mckaysavage (flickr)
longer GC
=
less time to run
your ruby code
kgrocki (flickr)
fewer objects
=
better
performance
januskohl (flickr)
improve performance
1. remove unnecessary object allocations
object allocations are not free
improve performance
1. remove unnecessary object allocations
object allocations are not free

2. avoid leaked references


not really memory ‘leaks’

you’re holding a reference to an object you no


longer need. GC sees the reference, so it keeps
the object around
the GC
follows
references
recursively,
so a reference
to classA
will ‘leak’ all
these objects
let’s build a debugger
• step 1: collect data
• list of all ruby
objects in memory

• step 2: analyze data


• group by type
• group by file/line
version 1: collect data
• simple patch to ruby VM (300 lines of C)
• http://gist.github.com/73674
• simple text based output format
0x154750 @ -e:1 is OBJECT of type: T
0x15476c @ -e:1 is HASH which has data
0x154788 @ -e:1 is ARRAY of len: 0
0x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi
0x1547dc @ -e:1 is STRING len: 1 and val: T
0x154814 @ -e:1 is CLASS named: T inherits from Object
0x154a98 @ -e:1 is STRING len: 2 and val: hi
0x154b40 @ -e:1 is OBJECT of type: Range
version 1: analyze data
$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap
version 1: analyze data
$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort |


uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316
version 1: analyze data
$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort |


uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ grep "memcached.rb:316" /tmp/ruby.heap | awk


'{ print $5 }' | sort | uniq -c | sort -g | tail -5

   10948 ARRAY
   20355 OBJECT
   30744 DATA
  64952 HASH
  123290 STRING
version 1
• it works!
• but...
• must patch and rebuild ruby binary
• no information about references between
objects
• limited analysis via shell scripting
version 2 goals
• better data format
• simple: one line of text per object
• expressive: include all details about
object contents and references
• easy to use: easy to generate from C
code & easy to consume from various
scripting languages
equanimity (flickr)
version 2 is memprof
• no patches to ruby necessary
• gem install memprof
• require ‘memprof’
• Memprof.dump_all(“/tmp/app.json”)
• C extension for MRI ruby VM
http://github.com/ice799/memprof
• uses libyajl to dump out all ruby objects
as json
Memprof.dump{
strings }
"hello" + "world"

{
"_id": "0x19c610", memory address of object
"file": "file.rb", file and line where string
"line": 2,
was created
"type": "string",
"class": "0x1ba7f0", address of the class
"class_name": "String", “String”

"length": 10, length and contents


"data": "helloworld" of this string instance
}
arrays
Memprof.dump{
[
1,
:b,
{
"_id": "0x19c5c0",
2.2,
"d"
"class": "0x1b0d18", ]
"class_name": "Array", }

"length": 4,
"data": [
1, integers and symbols are
":b", stored in the array itself
"0x19c750", floats and strings are
"0x19c598" separate ruby objects
]
}
hashes
Memprof.dump{
{
:a => 1,
"b" => 2.2
{ }
"_id": "0x19c598", }
"type": "hash",
"class": "0x1af170",
"class_name": "Hash",

"default": null, no default proc


"length": 2,
"data": [
[ ":a", 1 ],
hash entries as key/value
[ "0xc728", "0xc750" ] pairs
]
}
classes
Memprof.dump{
class Hello
@@var=1
Const=2
{ def world() end
"_id": "0x19c408",
end
"type": "class", }
"name": "Hello",
"super": "0x1bfa48", superclass object reference
"super_name": "Object",

"ivars": { class variables and constants


"@@var": 1, are stored in the instance
"Const": 2
}, variable table
"methods": {
"world": "0x19c318" references to method objects
}
}
version 2: memprof.com
a web-based heap visualizer and leak analyzer
built on...

$ mongoimport
-d memprof
-c rails
--file /tmp/app.json
$ mongo memprof

let’s run some queries.


how many objects?

thaths (flickr)
how many objects?
> db.rails.count()
809816

• ruby scripts create a lot of objects


• usually not a problem, but...
• MRI has a naïve stop-the-world mark/
sweep GC
• fewer objects = faster GC = better
performance
what types of objects?

brettlider (flickr)
what types of objects?
> db.rails.distinct(‘type’)

[‘array’,
‘bignum’,
‘class’,
‘float’,
‘hash’,
‘module’,
‘node’,
‘object’,
‘regexp’,
‘string’,
...]
mongodb: distinct
• distinct(‘type’)
list of types of objects
• distinct(‘file’)
list of source files
• distinct(‘class_name’)
list of instance class names
• optionally filter first
• distinct(‘name’, {type:“class”})
names of all defined classes
improve performance
with indexes

> db.rails.ensureIndex({‘type’:1})

> db.rails.ensureIndex(
{‘file’:1},
{background:true}
)
mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against
common fields: type, class_name, super, file
• can index embedded field names
• ensureIndex(‘methods.add’)

• find({‘methods.add’:{$exists:true}})
find classes that define the method add
how many objs per type?

darrenhester (flickr)
how many objs per type?
> db.rails.group({
initial: {count:0},
key: {type:true}, group on type
cond: {},
reduce: function(obj, out) {
increment count
out.count++
for each obj
}
}).sort(function(a,b) {
return a.count - b.count sort results
})
how many objs per type?
[
...,
{type: ‘array’, count: 7621},
{type: ‘string’, count: 69139},
{type: ‘node’, count: 365285}
]
lots of nodes

• nodes represent ruby code


• stored like any other ruby object
• makes ruby completely dynamic
mongodb: group
• cond: query to filter objects before
grouping
• key: field(s) to group on
• initial: initial values for each group’s
results
• reduce: aggregation function
mongodb: group
• bykey:
type or class
• {type:1}
• key: {class_name:1}

• bykey:
file & line
• {file:1, line:1}

• bycond:
type in a specific file
• {file: “app.rb”},
key: {file:1, line:1}

• bycond:
length of strings in a specific file
• {file:“app.rb”,type:‘string’},
key: {length:1}
what subclasses String?

davestfu (flickr)
what subclasses String?
> db.rails.find(
{super_name:"String"},
{name:1} select only name field
)

{name: "ActiveSupport::SafeBuffer"}
{name: "ActiveSupport::StringInquirer"}
{name: "SQLite3::Blob"}
{name: "ActiveModel::Name"}
{name: "Arel::Attribute::Expressions"}
{name: "ActiveSupport::JSON::Variable"}
mongodb: find

• find({type:‘string’})
all strings
• find({type:{$ne:‘string’}})
everything except strings
• find({type:‘string’}, {data:1})
only select string’s data field
the largest objects?

http://body.builder.hu/imagebank/pictures/1088273777.jpg
the largest objects?
> db.rails.find(
{type:
{$in:['string','array','hash']}
},
{type:1,length:1}
).sort({length:-1}).limit(3)

{type: "string", length: 2308}


{type: "string", length: 1454}
{type: "string", length: 1238}
mongodb: sort, limit/skip
• sort({length:-1,file:1})
sort by length desc, file asc
• limit(10)
first 10 results
• skip(10).limit(10)
second 10 results
when were objs created?

zoutedrop (flickr)
when were objs created?
• useful to look at objects over time
• each obj has a timestamp of when it was
created
• find minimum time, call it
start_time
• create buckets for every
minute of execution since
start
• place objects into buckets
when were objs created?
> db.rails.mapReduce(function(){
var secs = this.time - start_time;
var mins_since_start = secs % 60;
emit(mins_since_start, 1);
}, function(key, vals){
for(var i=0,sum=0; i<vals.length;
sum += vals[i++]);
return sum;
}, {
scope: { start_time: db.rails.find
().sort({time:1}).limit(1)[0].time }
} start_time = min(time)
)
{result:"tmp.mr_1272615772_3"}
mongodb: mapReduce
• arguments
• map: function that emits one or more
key/value pairs given each object this
• reduce: function to return aggregate
result, given key and list of values
• scope: global variables to set for funcs
• results
• stored in a temporary collection
(tmp.mr_1272615772_3)
when were objs created?
> db.tmp.mr_1272615772_3.count()
12
script was running for 12 minutes

> db.tmp.mr_1272615772_3.find().sort
({value:-1}).limit(1)
{_id: 8, value: 41231}
41k objects created 8 minutes after start
references to this object?

jeffsmallwood (flickr)
references to this object?
ary = [“a”,”b”,”c”]
ary references “a”
“b” referenced by ary

• ruby makes it easy to “leak” references


• an object will stay around until all
references to it are gone
• more objects = longer GC = bad
performance
• must find references to fix leaks
references to this object?
• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]
})
create references lookup table
• db.rails_refs.ensureIndex({refs:1})
add ‘multikey’ index to refs array
• db.rails_refs.find({refs:“0xa”})
efficiently lookup all objs holding a ref to 0xa
mongodb: multikeys

• indexes on array values create a ‘multikey’


index
• classic example: nested array of tags
• find({tags: “ruby”})
find objs where obj.tags includes “ruby”
version 2: memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request

let’s use memprof to find it!

# in environment.rb
require `gem which memprof/signal`.strip
plugging a leak
in rails3
send the app some
requests so it leaks
$ ab -c 1 -n 30
http://localhost:3000/

tell memprof to dump


out the entire heap to
json
$ memprof
--pid <pid>
--name <dump name>
--key <api key>
2519 classes
30 copies of
TestController
2519 classes
30 copies of
TestController

mongo query for all


TestController classes

details for one copy of


TestController
find references to object
find references to object
find references to object

“leak” is on line 178

holding references
to all controllers
• In development mode, Rails reloads all your
application code on every request
• ActionView::Partials::PartialRenderer is caching
partials used by each controller as an optimization
• But.. it ends up holding a reference to every single
reloaded version of those controllers
• In development mode, Rails reloads all your
application code on every request
• ActionView::Partials::PartialRenderer is caching
partials used by each controller as an optimization
• But.. it ends up holding a reference to every single
reloaded version of those controllers
Questions?
Aman Gupta
@tmm1

You might also like