Professional Documents
Culture Documents
How I Learned To Stop Worrying and Write My Own ORM
How I Learned To Stop Worrying and Write My Own ORM
Sam Saffron
Programming, Technology and the Art of Hacking
Ruby
MiniProfiler
SQL
Dapper
Media Browser
C#
Home
About
A few weeks ago we started investigating some performance issues at Stack Overflow.
Our web tier was running hot, it was often hitting 100% CPU. This was caused by a combination of factors, we made a few mistakes here and there. We allowed
a few expensive operation to happen a bit too concurrently and … there were some framework issues.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 1/24
8/30/2019 How I learned to stop worrying and write my own ORM
Google loves Stack Overflow, it loves it so much that it will crawl 10 pages a second sometimes. This is totally by design, we told Google: “crawl us as fast as
you can” and Google did it’s best to comply.
The trouble was that crawling our question pages was expensive business.
We maintain a list of related questions on every question page, this list is rebuilt once a month in a on demand background process. This process is usually
pretty fast, however when its happening 100,000 times a day in huge bursts … it can get a bit costly.
So, we fixed that up. We changed some stuff so Google triggers less background activity. However, performance problems persisted.
In frameworks we trust
Our question show page does a fair bit of database work. It needs do a bunch of primary key lookups to pick up all the questions, answers, comments and
participating users. This all adds up. I spent a fair bit of time profiling both production and dev; something kept on showing up in my cpu analyzer traces
System.Reflection.Emit.DynamicMethod.CreateDelegate
System.Data.Linq.SqlClient.ObjectReaderCompiler.Compile
System.Data.Linq.SqlClient.SqlProvider.GetReaderFactory
System.Data.Linq.SqlClient.SqlProvider.System.Data.Linq.Provider.IProvider.Compile
System.Data.Linq.CommonDataServices+DeferredSourceFactory`1.ExecuteKeyQuery
System.Data.Linq.CommonDataServices+DeferredSourceFactory`1.Execute
System.Linq.Enumerable.SingleOrDefault
System.Data.Linq.EntityRef`1.get_Entity
Much of our work at Stack Overflow depended on the assumption that LINQ-2-SQL is fast enough. Turns out it was not fast enough for us.
In the trace above you can see that EntityRef<T> is baking a method, which is not a problem, unless it is happening 100s of times a second.
There are tons of Object-Relational-Mappers for the .Net framework. There is an antique question on SO discussing which is best as well as probably tens if not
hundreds of similar questions.
When we started off Stack Overflow we chose LINQ-2-SQL. Both LINQ-2-SQL and the more recent magic unicorn Entity Framwork 4.1 are built using a
similar pattern.
You give the ORM a query, it builds SQL out of it, then it constructs a DynamicMethod to materialize data coming back from SQL into your business objects. I
am not privy to the exact implementation, it may be including more or less work in these dynamic methods.
This implementation is an abstraction, and it leaks. It leaks performance, you give up some control over the queries you can run and have to deal with a LEFT
JOIN syntax that is so crazy it can make grown men cry.
The trouble is that the syntax is kind of clunky and there are certain places you can not use compiled queries. In particular EntityRef and EntitySet.
My solution
Due to various limitations much of our SQL is hand coded in LINQ ExecuteQuery block. This means we have lots of inline SQL similar to the following
var stuff = Current.DB.ExecuteQuery("select a,b,c from Foo where a = {0}", x)
This is the code we would always fall back to when performance was critical or we needed better control. We are all comfortable writing SQL and this feels
pretty natural.
The trouble is that LINQ-2-SQL is still a bit slow when it comes to ExecuteQuery. So we started converting stuff to compiled queries.
I decided to experiment and see if I can out-perform ExecuteQuery by introducing a smarter caching mechanism. Once we had a replacement we could easily
convert our existing code base to use it.
So, I wrote a tiny proof of concept ORM gist of rev1. IL weaving does get tricky.
It can act as a fast replacement for both compiled queries and ExecuteQuery methods. In future, I hope to release a more complete mapper so people can reuse
it.
Our workload at Stack Overflow is heavily skewed toward very very cheap queries that to a simple clustered index lookup. The DB returns the data instantly.
On my machine the cost of pulling out 100 random posts and turning them into Post objects are:
LINQ 2 SQL elapsed 335 ms
LINQ 2 SQL compiled 207 ms
LINQ 2 SQL ExecuteQuery 242 ms
Sams ORM 174 ms
Entity Framework 4.1 elapsed 550ms
Hand coded 164ms
So, LINQ-2-SQL can take double the amount of time to pull out our poor post, but that is not it. The trouble is that the extra 160ms is CPU time on the web
server. The web server could simply be idle waiting for SQL to generate data, but instead it is busy rebuilding the same methods over and over.
You can make stuff a fair bit faster with compiled queries, however they are still as expected slower than hand coding. In fact there is quite a gap between hand
coding and compiled queries.
Hand coding which is fastest is full of pain bugs and general sadness.
post.Id = reader.GetInt32(0);
post.Text = reader.GetString(1);
post.CreationDate = reader.GetDateTime(2);
post.Counter1 = reader.IsDBNull(4) ? (int?)null : reader.GetInt32(4);
// this makes me want to cry
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 4/24
8/30/2019 How I learned to stop worrying and write my own ORM
Another interesting fact is that Entity Framework is slowest and also most awkward to use. For one its context can not reuse an open connection.
Other benefits
You are probably thinking, oh-no, yet another ORM. I thought the world had more ORMs than blogging engines. Our use case is very narrow and I think it
makes sense.
We are using our new ORM for a specific problem: mapping parameterized SQL to business objects. We are not using it as a full blown ORM. It does not do
relationships and other bells and whistles. This allows us to continue using LINQ-2-SQL where performance does not matter and port all our inline SQL to use
our mapper, since it is faster and more flexible.
This control means we can continue tweaking performance and adding features we have long waited for.
For example:
var awesome = connection.ExecuteQuery("select * from Posts where Id in (select * from @Ids)", new {Ids = new int[] {1,2,3}});
SQL 2008 has a new awesome Table-Valued Parameters, which would be awesome to use. Many ORMs shy away from supporting this cause you need to define
a table type in SQL Server. However, we are in control here, so we can make it happen.
Recap
The end result of this round of optimization which involved a massive amount of rewriting, hardware tuning and such was that the question show page now only
takes 50ms of server time to render. At its worst it was up at the 200ms server time. This performance boost was totally worth it. Optimizing LINQ-2-SQL
played a key part of achieving this.
However, I would not recommend people go crazy and dump LINQ-2-SQL just because of this. My recommendation would be: measure, find bottlenecks and
fix the ones that matter.
Comments
Mike_Murray over 8 years ago
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 5/24
8/30/2019 How I learned to stop worrying and write my own ORM
I'd be interested to know how Rob Conery's Massive performs in comparison (https://github.com/robconery/massive). If you haven't heard of it, it's an ironic
name for an ORM that is one file and well below 500 lines if you remove comments. No overhead, get you to the SQL fast.
Sam,
Nice article. I wrote similar data-access framework a couple years ago. To solve the database null value problem I used Steve Michelotti's Nullable data reader
(http://www.codeproject.com/KB/database/NullableReaders.aspx) . It rocks!
Good luck,
Chuck
You could also probably do this using extension methods which would mean you could just wack on so additional methods to SqlDataReader.
I second Mike's comment. If you could try Rob's “Massive†ORM implementation for your performance tests I think it would be extremely interesting.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 6/24
8/30/2019 How I learned to stop worrying and write my own ORM
I may try it out, but its an unfair comparison, Massive deals with Expandos and dynamic types, L2S and our ORM deals with strong types and bakes serializers,
totally different strategy.
Nope … it would not solve the basic problem that stuff is just waiting around for 1.5ms more than 30 times a page, doing nothing useful. We could possibly
redesign stuff so a question page is a single stored proc call returning multiple result sets, but that would be incredibly complex and unmaintainable.
Sam, will you be releasing your ORM into the community, putting it onto codeplex or github ?
the prototype is linked in the article, as soon as it is complete I will look at releasing something proper
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 7/24
8/30/2019 How I learned to stop worrying and write my own ORM
Nice post. I had to do something similar because we were finding subsonic linq too slow. In the end I wrote this http://www.toptensoftware.com/petapoco which
is very similar to Rob's massive but for strongly typed pocos. Were seeing CPU drop from 80 to 5%.
You might want to take a look at this article which shows how you can get LINQ Expressions to do the IL stuff for you.
Just curious — did you have a chance to look at NHibernate? I was wondering how, performance-wise, it stacked up to your other ORMs (LINQ to SQL,
Entity Framework, etc).
Very interesting post, thanks for the great level of detail. You mentioned “I am not privy to the exact implementation†– and I think that is the actual
root of the problem. As SO continues to grow and become more popular understanding the implementation details of your chosen frameworks and tools will
become extremely important. If you can't even see the source code, let alone contribute to improving it – or at least subclass/extend the pieces that you want to
mod then you may be in for a world of hurt. Also, you'll find that most popular web apps end up ditching general purpose ORMs once they start hitting
performance and scalability walls.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 8/24
8/30/2019 How I learned to stop worrying and write my own ORM
+1 on vote to use System.Linq.Expressions to create the IL rather than emitting your own IL. Think of it as a managed wrapper around your DynamicWrapper.
It's what I use.
I did basically this a couple of years ago but have since moved to nhibernate for ease of use. There are a couple of things I learned along the way that may be
helpful to you that I see as issues with what you have right now.
What you are missing is more information on the parameters. Sure it's nice to have dynamic queries and all that, but you need to hint more on the parameters.
This mostly affects things where parameter size is variable, like strings; not sure about things like decimal precision off the top of my head.
your code is adding the parameter via command.Parameters.Add method, and then setting value on the parameter.
if we pass ‘abc' then @name is nvarchar(3) when you add the parameter by just using the value. if we pass ‘abcd' then @name is nvarchar(4)
you now have 2 query plans for that query. if you can hint that @name should be @nvarchar(100) or something that makes sense, then you will be able to reuse
the query plan, which will improve performance of the query.
An easy fix to help this out in your code is to just give smart defaults for strings in increments like 128, 256, 512, 1024, -1 (max).
here's the code that i wrote a couple years ago that takes in a data record and emits a hydrated type and my method that i would call that utilized it:
https://gist.github.com/894695.
Very good point about string params, I will make sure I add it.
Regarding expressions, we are pretty lucky to have Marc work for us and he breaths IL, so he can help out a lot. I find IL weaving a bit more straight forward
the indirection around expressions confuses me.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 9/24
8/30/2019 How I learned to stop worrying and write my own ORM
Have any of you tried FrameworkGen? It generates simple performant c# code and uses sql data readers and stored procs.
Sam, nice work. I'm wondering how altering SO's caching strategy might have impacted? Did you look into that at all? I understand the need for better query
performance, but this high-read scenario shouts cache management to me.
Paul,
Why “select * from Posts where Id in (select * from @Ids)†over “select * from Posts where Id in @Ids†?
SQL Server support “where Id in (a, b, c)†, so the nested query seems noisy.
Since I abandoned the idea of table valued params, they are too slow.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 10/24
8/30/2019 How I learned to stop worrying and write my own ORM
Have you looked ad BLToolkit? According to http://ormbattle.net/ it's pretty fast and can show similar or faster performace. It would be interesting to see it's
timing.
Good work, I'm also interested in a comparison with bl-toolkit, too. And why do you prefer a static strategy instead of a dynamic one (like Massive) ?
Serdar
We use static typing extensively in the project, needed a way to drop this in as a replacement
I've implemented the same benchmark in C++ for the ODB ORM. The results are available here, in case anyone is interested how a C++ ORM stacks up against
the C# crowd:
http://codesynthesis.com/~boris/blog/2011/04/06/performance-odb-cxx-orm-vs-cs-orm/
Sam,
In your benchmark, was there any compilation included in your compiled linq to sql test at all? le does the time include the initial compilation? I ask because
your difference between uncompiled and compiled seems small (compared to the difference I've seen in my own L2S usage.) But perhaps that's because your
queries are simpler than ours.
I think its wrong to call this an ORM. Its more like a query helper.
While you may satisfy the definition of ORM the term has come to mean a lot more than just something that transforms datareaders.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 12/24
8/30/2019 How I learned to stop worrying and write my own ORM
Is this simply a way to get data out of the db? How do you handle insert, update and deletes?
What about magic strings? Have you considered these? Your queries may become rather cumbersome to maintain. I personally like to avoid strings if at all
possible even by incorporating some code generation.
I've done a similar thing with “BLToolkit†:http://erraticdev.blogspot.com/2010/11/t4-template-to-generate-bltoolkit.html because it does use similar
approach if you provide your queries yourself.
I've written a bunch of optimized stored procedures that use some naming convention (Entity_Action as in ie. Person_GetLicenses) and then a T4 template that
queries database for stored procedures and generates strong typed objects and methods:
Person p = db.Person.GetLicenses(personId).ExecuteObject();
This gives me the flexibility of having even complex queries on the DB that can process as well as return data (because usually when we create something we
want to return it as well) while also eliminating errors with magic strings. Stored procedures can then be easily maintained on the server and all calls can then
easily be recreated by a push of a button that re-generates all T4 templates in the solution.
Do you support stored procedures? Do you have means to avoid magic strings?
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 13/24
8/30/2019 How I learned to stop worrying and write my own ORM
I'm following your MultiMapping example. Any idea why this code:
string sql = @“select uc.Description UCDesc, uc.UseCaseID, System.SystemID, System.Description SysDesc from UseCase uc left join System on
System.SystemID = uc.SystemID Order by uc.UseCaseID†;
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 14/24
8/30/2019 How I learned to stop worrying and write my own ORM
I found the answer to the my question above…it turns out I had to use the splitOn parameter of the Query method.
First, the performance bench marks link listed on the google page results in a 404.
Second, I spent some time comparing usage of Dapper vs using Enterprise library (not entity framework) and building the object list directly.
The performance differences I found were a bit surprising. In all cases, Dapper took 100% more time than doing it straight. I spent 8 hours on this today because
I like the idea you have; however, I can't go forward with something that's going to radically decrease performance for us.
Dapper does execute the query in about the exact same amount of time; however, the problem appears when it comes time to iterate the results.
Is there somewhere I can post real findings so that we can figure out what's going on here?
That sounds very odd to me do you have a link somewhere that shows demo code of what your were doing?
Materializing a list of 1000 objects is just a fraction (probably less than 1%) slower than doing so by hand with a IDataReader
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 15/24
8/30/2019 How I learned to stop worrying and write my own ORM
I'm sending you an email now. To sum up, the problem was with the Buffered parameter.
When False, it executes 100% slower than Ent Lib (not Entity Framework) for regular stored proc access and object materialization.
Having reread this page, it's apparent I misunderstood what the Buffered parameter actually did.
I'm going to expand my testing to cover some other internal use cases. If the others come out like this then we'll be making the switch.
Sam, what do you mean by “table valued params, they are too slow†? Too slow? How? And in what use cases?
Is there a better alternative to sending a list of primary keys to the server in order to return those rows?
I noticed that for a simple int list of up to a few hundred passing in one param per number is 3 times faster than using TVPs
Hi,
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 16/24
8/30/2019 How I learned to stop worrying and write my own ORM
I have not downloaded your own ORM to take a look, but with the explanation and sample code, I am just curious, what is the different between iBatis.Net and
your own ORM?
Thanks Hardy
This seems like a good use-case for using a stored procedure, especially if you are just embedding your own SQL anyways. Am I missing something?
I'm curious if you considered trying different architectures and/or a NOSQL database? If so, what were your observations of the alternatives?
we are quite happy with Sql Server at the moment, though do offload some stuff to redis.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 17/24
8/30/2019 How I learned to stop worrying and write my own ORM
hi,sam i download latest dapper,and i find SqlMapperExtensions doesn't support Oracle perfectly, it's default placeholder is ‘@',but Oracle placeholder is
“:†,so there're some problems to support Oracle database,would you give us a solution to support all kind of database?
My post above should have been more like this (no, preview, edit or delete, so the best I can do is repost and hope it works this time)…
“but this high-read scenario shouts cache management to me†“Our caching story is actually quite completeâ€
So why are the Google requests (essentially anonomous users) ever causing SO to hit the database?
well, we never really pre-cached pages, there are millions of them and they just change too often.
Sam, I’m wondering too why you guys didn’t pick nhibernate in the first place?
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 18/24
8/30/2019 How I learned to stop worrying and write my own ORM
We were looking for something simpler, leaner and faster. Dapper is still playing a key role in Stack Overflow infrastructure today.
Thanks for answer. It would be awesome to know how you guys doing other things on SO website because it’s really important to know how things go in heavy
loaded website that was designed with more practical goals instead of academic ones (like many sample websites nowadays).
Like - do you do DI? Do you break your app “do something” actions into services (like PostService.CreateNewPost (…)), into commands (like
commandProcessor.Execute<CreatePostCommand, Post> (…)) or into DDD entity methods (like PostRepository ().Create (…))? Do you encapsulate your
queries in reposositories or in extension methods (query objects) or you use DbContext inside controllers directly? Do you practice “protect the developer”
inside your service/model layer?
I have not been working at SO for almost 2 year now, so I am not sure
In my day we had some services, stuff was componentized, dbcontext was used directly, we placed a huge amount of trust in the developers. DI was not heavily
used.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 19/24
8/30/2019 How I learned to stop worrying and write my own ORM
nice thing, i stumbled upon the same issue but in more generic whay where perfermonce is a big issue for me, so i almost did the same like you did, whith one
difference, instead of using il i create getters and setters using expression tree like below
public static Action<T, object> CreateSetter<T>(PropertyInfo propertyInfo)
{
var targetType = propertyInfo.DeclaringType;
return action;
}
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 20/24
8/30/2019 How I learned to stop worrying and write my own ORM
Its a lifetime ago, but I do remember we experimented with that. afaik msil was more flexible.
Hey Sam, I love Dapper and use it extensively. However is there any way around the maximum seven entity limit? I find I need to break up my queries when I
get close to seven entities then later combine the resultsets. Are there any plans to extend the number of entities allowed for mapping?
Really unsure about the seven entity limit I am no longer the maintainer for Dapper, maybe @Marc_Gravell knows, or perhaps you could ask on Stack
Overflow.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 21/24
8/30/2019 How I learned to stop worrying and write my own ORM
Hey @dronestudios; there should be an overload that accepts Type[]. Eternal number of generic overloads are not practical, so beyond some n, an array of types
is used instead. I would also worry that by the time you’re populating that many types, you’re doing too much in one go. Personally, I would advocate
introducing a simpler type that matches the row shape, then map separately to your domain types.
Thanks for the help guys! What Marc mentioned works like a charm. I think you should put an example up on the StackExchange site about the Type overload
solution. I was able to use the example from the unit test. Dapper is the bomb!
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 22/24
8/30/2019 How I learned to stop worrying and write my own ORM
board.User4 = (User)objects[4];
board.User5 = (User)objects[5];
board.User6 = (User)objects[6];
board.User7 = (User)objects[7];
board.User8 = (User)objects[8];
board.User9 = (User)objects[9];
return board;
};
I love dapper, but I’m amazed that we go along happily for decades with SQL in string literals (which dapper does nothing to impose, but nothing to help you
avoid.) Combined with the use of dynamics, it means every error is a runtime error. Would we not be happier and more fulfilled if we used something like
QueryFirst ? Sql in its own file. Your queries tested against your base every time you save them, All your ADO stuff generated for you, and, cerise sur le gateau,
compile errors if your schema changes.
Leave a Comment
Twitter updates
@samsaffron
Stack Overflow
recent questions
recent answers
Online content
Off the Rails
Measuring Ruby
Why I am excited about Ruby 2.1?
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 23/24
8/30/2019 How I learned to stop worrying and write my own ORM
About me
I am a co-founder of Discourse. I live in sunny Sydney. I love writing about performance, Ruby, Mini Profiler and technology related topics.
https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM 24/24