The purpose of this series is to introduce some patterns used by Entity Framework which are probably a departure from the way most of us are used to handling persistence. Last time we looked at Identity Map, this time we’re going to look at a closely related pattern “Unit of Work”.
What is the Unit of Work Pattern?
This time I’m not even going to try to use the definition. For some reason the definitions for these patterns seem designed to sound impressive while conveying no information in as few words as possible. Instead, here are a couple of descriptive paragraphs from Martin Fowler’s book Patterns of Enterprise Application Architecture.
When you're pulling data in and out of a database, it's important to keep track of what you've changed; otherwise, that data won't be written back into the database. Similarly you have to insert new objects you create and remove any objects you delete.
You can change the database with each change to your object model, but this can lead to lots of very small database calls, which ends up being very slow. Furthermore it requires you to have a transaction open for the whole interaction, which is impractical if you have a business transaction that spans multiple requests. The situation is even worse if you need to keep track of the objects you've read so you can avoid inconsistent reads.
A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.
So what does this mean? Let’s look at two examples, the traditional save/update scenario and the Unit of Work save/update scenario.
Example: Traditional Save/Update Scenario
Ok, we’re going to look at the old way of doing things first. For this example we have an architecture that uses the Active Record pattern. Even if you’re not familiar with the term Active Record, you’ve probably used some variation of this pattern. We have two classes, PostRecord and CommentRecord. PostRecord represents a single blog post and PostComment represents a single comment on that blog post. Each class represents a single record of data. They encapsulate data, business logic methods, and persistence (Get and Save) methods which are used to get data from or save data to the database. A typical block of code where we get a post from the db, make a quick change, and then create some comments for the blog post follows. While we run this code I’ll be checking Sql Profiler to monitor any database activity. The idea is that we want to keep track of when saves and updates are written to db.
int postId = 1;
PostRecord post = PostRecord.GetPost(postId);
// PROFILER: The line above hits the db and pulls
// the post with post_id = 1.
post.PostTitle = "Rusty Bedsprings";
// PROFILER: At this point we see our change to
// the post record is saved back to db.
CommentRecord comment1 = new CommentRecord()
CommentId = 100,
CommentText = "Life in the fast lane.",
PostId = 1
CommentRecord comment2 = new CommentRecord()
CommentId = 101,
CommentText = "Another one bites the dust",
PostId = 1
// PROFILER: We see our new comment record
// saved back to db.
// PROFILER: Again, we see our new comment
// record saved back to db.
No real surprises there. We create objects in memory, set their members, then save them. When we call a save method, our DAL opens a connection to the database and runs the query that saves the data. This, or something very similar, is the way that persistence is handled in almost every .Net application I’ve ever worked on. We may hit the db 20 times in a block of code, but each time we open separate db connection and execute a separate db query.
Example: Unit of Work Save/Update Scenario
Now we’re going to do the same thing using ADO.Net Entity Framework, which implements the Unit of Work pattern. There are some minor changes to our architecture. We still have two classes BlogPost and BlogComment, that represent our post and comment data, and the associated business logic. But now our persistence methods are located in a separate ObjectContext class named UowEntities. Both Linq and EntityFramework use a context class to encapsulate persistence logic. This context class is where the Unit of Work pattern is implemented. So, let’s go through our Entity Framework code, once again running the Sql Profiler to monitor database activity.
UowEntities context = new UowEntities();
var query = from p in context.PostSet where p.post_id == 1 select p;
Post existingPost = query.FirstOrDefault<Post>();
// PROFILER: The line above hits the db and gets
// data for post with post_id=1
existingPost.post_title = "Rusty Bedsprings";
// PROFILER: No db activity
PostComment comment1 = new PostComment()
comment_id = 100,
comment_text = "Life in the fast lane.",
PostComment comment2 = new PostComment()
comment_text = "Another one bites the dust",
// PROFILER: No db activity
// PROFILER: No db activity
// PROFILER: This time db code runs to commit all
// changes. A single connection is opened and SQL
// is executed that updates the post_title, then
// SQL is executed that saves the two new comment
// records to the database.
So the Unit of Work pattern batches all of our db calls together and runs them at one time. We have an ObjectContext class that encapsulates our persistence logic. Part of what that class does is to track the state of every entity that we get from it. It knows what entities have changed. Then when we call SaveChanges(), it creates a batch of SQL statements to persist all of the changes that have taken place, then it runs them with a single db connection.
Saves and Updates are Deferred
The thing that probably caused me the most heartburn about Unit of Work is that data saves and updates are deferred. When I change a property on an entity, my change is not saved back to the database. When I create new comment objects and add them to my context (in the example above I added them to the post.PostComments collection which adds them to the context behind the scenes), my new comments are not saved back to the database. Nothing is saved back to the database until I call SaveChanges() on my ObjectContext. Until that method is called, the ObjectContext just sits there and keeps track of all entities that have changed. I’m still not sure that I like the idea of all my data changes being deferred and then batched together, but it’s growing on me.
How Does Entity Framework Know Which Entities Have Changed?
So, the context class knows which entities have changed and it batches SQL statements to update them. How does it know which entities have changed? If you said that it keeps a copy of the original data for each entity, then give yourself a gold star. From the first post in this series, you may recall that the context stores an Identity Map which is basically just a central cache of all entities that are loaded through the context. In addition to a cached copy of the current state of each entity, Entity Framework also stores a copy of the original state of each entity (all of it’s original data values). If you want to read more about it, search for Entity Framework Object Services.
Why Do I Care?
So what’s the advantage of using a framework that implements Unit of Work? I still have a tough time convincing myself that this is a good idea, so if anyone out there is a strong supporter of this pattern, please post your reasons why in the comments. But, even with my ambivalence toward the pattern, I can see 3 advantages.
- Performance. Batching 30 SQL updates and running them on a single connection is more efficient than running 30 updates over 30 separate connections. Also, if your code is designed in a way that it produces duplicate SQL updates for the same entity (which is bad design but I have seen it before), the framework will consolidate those multiple updates into a single update.
- Concurrency. Unit of Work will automatically handle concurrency issues within a single thread by merging any updates to the same entity. Now recall that Entity Framework’s implementation of Unit of Work involves tracking the initial state of each entity. This means that Entity Framework can use an optimistic concurrency model to determine if that database has changed since you initially pulled your entities. Entity data members have an attribute named ConcurrencyMode which is set to “None” by default. If you set this attribute to “Fixed”, Entity Framework will check the database before saving and will throw an OptimisticConcurrencyException if the data has changed.
- Transactions. Unit of Work is transactional by default. When you call SaveChanges(), if there is a problem with any of the data updates, they all get rolled back, or depending on the exception, they may never make it to the db at all. This can provide an easy way to batch data updates in a single transaction, even if the code for those updates exists in multiple, separate code modules. You just create a new context object, pass it around as a parameter to the different modules (a little Dependency Injection), then call SaveChanges() and all of the data updates from the different modules are batched together in a single transaction.
So, hopefully this will give you a little background on why things are done the way they are in Entity Framework. I still find myself ready to chuck it out the window every once in a while (especially when debugging failed transactions). Then I realize that many of my objections stem from the fact that I’m just not used to using these patterns for data persistence. They do represent a different paradigm that doesn’t always make sense to me, but they are also best practices that have evolved over years thanks to the efforts of some very intelligent programmers.