Wednesday, June 25, 2008

Linq To SQL Caching

I ran into a weird behavior while trying out different usage patterns of Linq To SQL. I noticed that some queries were not hitting the database! Now I knew that Linq To SQL object tracking keeps cached copies of entities it retrieves, but my understanding was that it only used this for identity mapping and would never return stale results. After some Googling and then looking at the internals of the System.Data.Linq.Table class with Reflector, I came to the conclusion that it was indeed returning its cached results. This makes sense once you understand the way the data context works; I didn't realize the implications of object tracking. Once an object has been retrieved once by a data context, its values will not be updated by the database. This is key for the way optimistic concurrency support works in Linq to SQL, but if you are used to writing simple crud applications where you ignore concurrency it would be easy to overlook this.

On thing still puzzles me though, if I change my call from

context.Products;

to

context.Products.ToList();

I would always hit the database. It turns out that ToList calls GetEnumerator (which leads to a query being fired) whereas when I databind directly against the Table, it calls IListSource.GetList, which will return the cached table if it can. Why wouldn't you query the database to check for new objects that might have been added to your results, and why couldn't the same query use the cache when I call ToList on it?

Wednesday, June 18, 2008

Deferred Execution in Linq to SQL

Just like the last post, this one is motivated by a comment I got from someone identified as merlin981. Since we seem to have a running dialog, do you have a blog or other online presence? In any case, I wanted to explain my understanding of how Linq to SQL uses deferred execution because merlin and I seemed to have a very different ideas.

Let's take a look at a simple query like the one below.

var dbContext = new TestDataContext();
var result = from x in dbContext.Products
         select x;
At this point, the query is just and expression tree. When you iterate over the the results, the following single query executes against the database:
SELECT [t0].[Id], [t0].[Name], [t0].[Price], [t0].[CategoryId]
FROM [dbo].[Product] AS [t0]
At this point, I can access the Id, Name and CategoryId of all the products that were in the the database without any other connections to the database. On the other hand, if you were to do something like this:
foreach (var product in result)
{
  Response.Write(product.Category.Name);
}

This block of code is going to hit the database once for each product. Obviously we want to avoid that, and there are several ways to do so. One is to return an anonymous type containing just the columns we need:

var result = from x in dbContext.Products
         select new
         {
             x.Name,
             CategoryName = x.Category.Name
         };

foreach (var product in result)
{
    Response.Write(product.CategoryName);
}

This method will do an inner join and pull back just the columns we asked for. Another way is to specify load options for our original query:
var dbContext = new TestDataContext();
dbContext.LoadOptions.LoadWith<Product>(p => p.Category);
var result = from x in dbContext.Products
         select x;

This tells the Linq to SQL Execution engine to load all the fields in the Category entity for each product. The generated SQL is below.

SELECT [t0].[Id], [t0].[Name], [t0].[Price], [t0].[CategoryId], [t1].[Name] AS [Name2]
FROM [dbo].[Product] AS [t0]
INNER JOIN [dbo].[Category] AS [t1] ON [t1].[Id] = [t0].[CategoryId]

I hope this has been a helpful example of how Linq To SQL uses deferred execution.

Monday, June 9, 2008

Stored Procedures, a Best Practice?

I just saw merlin981's comment on my LINQ to SQL post, thanks for taking time to leave it!  That said, I think the term "Best Practice" is something of a misnomer here.  There has been much written on both sides of this debate.  One thing is for sure, though, a parameterized query is compiled just like a stored procedure on SQL Sever version 7.0 and on.  From Frans Bouma's blog, I found this article in the SQL Server's Books Online:

SQL Server 2000 and SQL Server version 7.0 incorporate a number of changes to statement processing that extend many of the performance benefits of stored procedures to all SQL statements. SQL Server 2000 and SQL Server 7.0 do not save a partially compiled plan for stored procedures when they are created. A stored procedure is compiled at execution time, like any other Transact-SQL statement. SQL Server 2000 and SQL Server 7.0 retain execution plans for all SQL statements in the procedure cache, not just stored procedure execution plans.

So I think it is clear that sprocs will not be significantly faster than ad hoc SQL for simple cases.  This is not to say that you should never use sprocs, on the contrary, there are situations where sprocs will be the only good solution (for instance, complex data manipulation that requires temporary tables). The point is that using an ORM can make development easier by allowing you to ignore the SQL for the majority of cases.  If you see that parts of your application are slow, then you can fix that.

Merlin also mentioned that running queries directly against tables uses deferred execution like it is a bad thing.  Deferred execution is what allows LINQ to work at all, and can improve performance in many scenarios.  Of course, like any tool, it can get you into trouble if you don't understand it.