Saturday, June 28, 2008

.NET Linq Deferred Execution

The new Linq (Language Integrated Query) feature shipped with .NET 3.5 is fantastic. With Linq now we can write more concise and meaningful queries. Like anonymous method pitfall we discussed in previous post, Linq has a similar behavior called deferred execution; it could be problematic if you are not aware of that. Let's look at following code:
using System;
using System.Linq;
using System.Collections.Generic;

class Program
{
static void Main()
{
// Test Data
string[] names = new string [] { "NameA", "NameB"};
string _name = "NameA";

// Query by delegate
IEnumerable<string> searchName1 = names.Where(
delegate(string name)
{
return name == _name;
});
// Query by Lambda expression
IEnumerable<string> searchName2 = names.Where(name => name == _name);

// Rename the search keyword
_name = "NameB";

// Redo the queries
IEnumerable<string> searchName3 = names.Where(
delegate(string name)
{
return name == _name;
});
IEnumerable<string> searchName4 = names.Where(name => name == _name);

Console.WriteLine("{0} \t {1} \t {2} \t {3}",
searchName1.First(), searchName2.First(), searchName3.First(), searchName4.First());

Console.Read();
}
}
What's the result? you may think it must be "NameA NameA NameB NameB". But you will get "NameB NameB NameB NameB" instead if you run the console application.

Why's that? Because Linq's Where search is by default a deferred execution function. For example, the statement
string searchName2 = names.Where(name => name == _name);
is telling the compiler that we have a Lambda expression attached to the Where search. But it's not invoked until we are actually reading data from the search result. So searchName1, searchName2, searchName3 and searchName4 in our case are the same because they all compare the same value when their condition is examined.

How to avoid this issue? Reading data immediately after the query such as looping through the data inside the IEnumerable collection. The other way is use ToList() or ToArray() methods to force immediate execution of a Linq query.

Following Linq methods have deferred execution behavior:
Except, Take, TakeWhile, Skip, SkipWhile, Where
While the others don't have such behavior, and will be executed immediately:
Any, Average, Contains, Count, First, FirstOrDefault, Last, LastOrDefault, Single,
SingleOrDefault, Sum, Max, Min, ToList, ToArray, ToDictionary, ToLookup
How to remember all this? The tip is to look at the return type of the method. It would be deferred execution if the return type is IEnumerable<TSource>. Why? Yield return is used by those methods. That's the root cause of the deferred execution.