Understanding Generic Delegates and Lambda Expressions with LINQ Extension Methods

July 28, 2009 — Leonard Lobel

Generic delegates and lambda expressions underpin LINQ. If you’ve been struggling to get your arms around these new concepts, this post should help demystify matters for you.

Of course, there are a a host of other new .NET language features that enable LINQ as well. These include extension methods, type inference, anonymous types, object initializers, and custom iterators, just to name a few! In this post, I’ll address generic delegates and lambda expressions as they pertain to LINQ (that is, as they pertain to the extension methods provided in .NET 3.5 to support LINQ), and I’ll cover some of the other new language features in future posts.

LINQ Query Syntax

There is simply no LINQ without generic delegates and lambda expressions, yet it would seem at first that you can write simple LINQ queries without seeing or touching either a generic delegate or a lambda expression. Here’s a simple example that queries from a collection of customers in custs. In the where clause, we use a boolean allCustomers variable to control whether all customers or only USA customers will be returned.

1 var allCustomers = false;
2    
3 var q =
4     from c in custs
5     where c.Country == "USA" || allCustomers == true
6     orderby c.FirstName
7     select c;

Where’s the generic delegate and the lambda expression in this LINQ query? Oh they’re there all right, but the LINQ query syntax hides those details from the code.

LINQ Method Call Syntax

In reality, the compiler (C# in this case, but VB .NET as well) is supporting a sugar-coated syntax that is strikingly similar to an ordinary SQL query in the RDBMS world, but which really amounts to a chained set of method calls.

What this means is that there is really no such thing as a where clause in LINQ–it’s a Where extension method call.

And guess what? There’s no such thing as an orderby clause either–it’s an OrderBy extension method call, chained to the end of the Where method call.

Meaning that the previous query is treated by the compiler exactly as if you had coded it using LINQ method call syntax as follows:

1 var q = custs
2     .Where(c => c.Country == "USA" || allCustomers == true)
3     .OrderBy(c => c.FirstName);

So this code looks more “conventional,” eh? We’re invoking the Where method on the collection of customers, and then invoking the OrderBy method on the result returned by Where, and obtaining our final filtered and sorted results in q.

LINQ query syntax is obviously cleaner than LINQ method call syntax, but it’s important to understand that only LINQ method call syntax fully implements the complete set of LINQ Standard Query Operators (SQOs), so that you will sometimes be forced to use method call syntax. Also realize that it’s common and normal to use combinations of the two in a single query (I’ll show examples of this using .Concat and other methods in future posts).

Two questions arise at this point:

1) How did the customer collection, which is a List<Customer> in our example, get a Where and OrderBy method?

2) What is the => operator in the method parameters, and what does it do (and how the heck do you pronounce it)?

The answer to the first question is simple: Extension methods. I’ll cover those in more detail in a future post, but for now just understand that the .NET framework 3.5 “extends” any class that implements IEnumerable<T> (virtually any collection, such as our List<Customer>) with the Where and OrderBy methods, as well as many other methods that enable LINQ.

Generic Delegates

The second question is much juicier. First, the => is the lambda operator, and many people verbalize it as ‘goes to’. So ‘c => …’ is read aloud as ‘c goes to…’

To understand lambdas, you need to understand generic delegates and anonymous methods. And to best understand those is to review delegates and generics themselves. If you’ve been programming .NET for any length of time, you’re certain to have encountered delegates. And as of .NET 2.0 (VS 2005), generics have been an important language feature for OOP as well.

A delegate is a data type that defines the “shape” (signature) of a method call. Once you have defined a delegate, you can declare a variable as that delegate. At runtime, the variable can then be pointed to (and then invoke) any available method that matches the signature defined by the delegate. EventHandler is a typical example of a delegate defined by the .NET framework. Here is how .NET defines the EventHandler delegate:

1 public delegate void EventHandler(object sender, EventArgs e);

This delegate is suitable for pointing to any method that returns no value (void), and accepts two parameters (object sender and EventArgs e); in other words, a typical event handler that has no special event arguments to be passed.

A generic delegate is nothing more than an ordinary delegate, but supports generic type declarations for achieving strongly-typed definitions of the delegate at compile time. This is the same way generics are used for classes. For example, just as the framework provides a List<T> class so that you can have a List of anything, it now also provides a small set of generic delegates named Func that can be used to stongly type many different method signatures; essentially, any method that takes from zero to 4 parameters of any type (T1 through T4) and returns an object of any type (TResult):

1 public delegate TResult Func<TResult>();
2 public delegate TResult Func<T, TResult>(T arg);
3 public delegate TResult Func<T1, T2, TResult>(T1 arg1, T2 arg2);
4 public delegate TResult Func<T1, T2, T3, TResult>(T1 arg1, T2 arg2, T3 arg3);
5 public delegate TResult Func<T1, T2, T3, T4, TResult>(T1 arg1, T2 arg2, T3 arg3, T4 arg4);

Delegates as Parameters

With these generic Func delegates baked into the .NET framework, LINQ queries can use them as parameters to extension methods such as Where and OrderBy. Let’s examine one overload of the Where extension method:

1 public static IEnumerable<TSource> Where<TSource>(
2   this IEnumerable<TSource> source,
3   Func<TSource, bool> predicate);

What does this mean? Let’s break it down. And to better suit our particular example and simplify the explanation, let’s substitute Customer for TSource:

1) It’s a method called Where<Customer>.

2) It returns an IEnumerable<Customer> sequence.

3) It’s an extension method available to operate on any IEnumerable<Customer>instance (as denoted by the special this keyword in the signature), such as the List<Customer> in our example. The this keyword used in this context means that an extension method for the type following the this keyword is being defined, rather than definining what looks like the first parameter to the method.

4) It takes a single predicate parameter of type Func<Customer, bool>.

Number 4 is key. Func<Customer, bool> is the second Func generic delegate shown above, Func<T, TResult>, where T is Customer and TResult is bool (Boolean). The Func<Customer, bool> delegate refers to any method that accepts a Customer object and returns a bool result.  This means that the Where<Customer> method takes a delegate (function pointer) that points to another method which actually implements the filtering logic. This other method will receive each Customer object as the LINQ query iterates the source sequence List<Customer>, apply the desired filtering criteria on it, and return a true or false result in the bool return value that controls whether this particular Customer object should be selected by the query.

How does all that get expressed simply with: .Where(c => c.Country == “USA” || allCustomers == true)?

The best way to answer that question is to first demonstrate two other ways of invoking the Where method. The first is to explicitly provide a delegate instance of Func<Customer, bool> that points to another method called IsCustomerCountryUSA:

01 private void MainMethod()
02 {
03     var filterMethod = new Func<Customer, bool>(IsCustomerCountryUSA);
04     var q = custs.Where(filterMethod);
05 }
06   
07 private bool IsCustomerCountryUSA(Customer c)
08 {
09     return (c.Country == "USA");
10 }

The Where method takes a single parameter, which is a new instance of the Func<TSource, T> generic delegate that points to a method named IsCustomerCountryUSA. This works because IsCustomerCountryUSA accepts a Customer object and returns a bool result, which is the signature defined by Func<Customer, bool> expected as a parameter by Where<Customer>.  Inside the IsCustomerCountryUSA, filtering logic is applied. Note that because we have created IsCustomerCountryUSA as a completely separate method, it cannot access the allCustomers variable defined locally in the calling method (the variable that controls whether all or only USA customers should be selected, as shown at the beginning of this post). Furthermore, because the signature of the IsCustomerCountryUSA method is fixed as determined by Func<Customer, bool>, we can’t just pass the allCustomers variable along as another parameter either.  Thus, this approach would require you to define the allCustomers variable as a private member if you wanted to include it in the filtering logic of the IsCustomerCountryUSA method.

Anonymous Methods

Anonymous methods were introduced at the same time as generics in .NET 2.0 (VS 2005). They are useful in cases such as this, where the method we’re creating for the delegate only needs to be called from one place, so the entire method body is simply coded in-line. Since it’s coded in-line at the one and only place it’s invoked, the method doesn’t need to be given a name, hence the term anonymous method.

To refactor the delegate instance version of the code to use an anonymous method, replace the Func parameter passed to the Where method with the actual method signature and body as follows:

1 var q = custs.Where(delegate(Customer c) { return c.Country == "USA" || allCustomers == true; });

The keyword delegate indicates you’re pointing the delegate parameter Func<Customer, bool> expected by the Where method to an anonymous method. The anonymous method isn’t named, as mentioned, but it must match the Func<Customer, bool> signature, meaning it must accept a Customer object parameter and return a bool result. The explicit signature following the delegate keyword defines the expected single Customer object parameter, and the expected bool result is inferred from the fact that the method returns an expression that resolves to a bool value. If the anonymous method used any other signature or returned anything other than a bool result, the compiler would fail to build your code  because the Where<Customer> method is defined to accept only a Func<Customer, bool> delegate.

So now the method body containing the filtering logic that was formerly coded in the separate IsCustomerCountryUSA method appears in braces right after the anonymous method signature. But there’s another huge advantage here besides saving the overhead of declaring a separate method for the filtering logic. Because anonymous methods appear inside of the method that calls them, they magically gain access to the local variables of the calling method. Thus, this implementation can also test the allCustomers variable, even though it’s defined in the calling method which would normally be inaccessible to the method being called. Because of this feature, the allCustomers variable defined as a local variable in the calling method can be tested by the anonymous method as part of the filtering criteria. That means it’s not necessary to define private fields to share local variables defined in a method with an anonymous method that the method calls, as demonstrated here). The result is less code, and cleaner code (though you should be aware that the compiler pulls some fancy tricks to make this possible, and there is a degree of additional runtime overhead involved when accessing local variables of the calling method from the anonymous method being called).

Lambda Expressions

Lambdas were introduced in .NET 3.5 (VS 2008) as a more elegant way of implementing anonymous methods, particularly with LINQ. When a lambda is used to represent an anonymous method containing a single expression which returns a single result, that anonymous method is called a lambda expression. Let’s first refactor our query by converting it from an anonymous method to a lambda:

1 var q = custs.Where((Customer c) => { return c.Country == "USA" || allCustomers == true; });

All we did was drop the delegate keyword and add the => (“goes to”) operator. The rest looks and works the same as before.

This lambda consists of a single expression that returns a single result, which qualifies it to be refactored as a lambda expression like this:

1 var q = custs.Where(c => c.Country == "USA" || allCustomers == true);

This final version again works exactly the same way, but has become very terse. First, the Customer data type in the signature isn’t needed, since it’s inferred as the TSource in Func<TSource, bool>. The parentheses surrounding the signature are also not needed, so they’re dropped too. Secondly, since lambda expressions by definition consist of only a single statement, the opening and closing braces for the method body aren’t needed and also get dropped. Finally, since lambda expressions by definition return a single result, the return keyword isn’t needed and is also dropped.

Besides offering an even terser syntax, lambda expressions provide two important additional benefits over anonymous methods. First, as just demonstrated, the data type of the parameter in the signature needn’t be specified since it’s inferred automatically by the compiler. That means that anonymous types (such as those often created by projections defined in the select clause of a LINQ query) can also be passed to lambda expressions. Anonymous types cannot be passed to any other type of method, so this capability was absolutely required to support anonymous types with LINQ. Second, lambda expressions can be used to build expression trees at runtime, which essentially means that your query code gets converted into query data which can then be processed by a particular LINQ implementation at runtime. For example, LINQ to SQL builds an expression tree from the lambda expressions in your query, and generates the appropriate T-SQL for querying SQL Server from the expression tree. Future posts will cover anonymous types and expression trees in greater detail.

To Sum It All Up

So this final lambda expression version invokes the Where method on the List<Customer> collection in custs. Because List<Customer> implements IEnumerable<T>, the Where extension method which is defined for any IEnumerable<T> is available to be invoked on it. The Where<Customer> method expects a single parameter of the generic delegate type Func<Customer, bool>. Such a parameter can be satisfied by the in-line lambda expression shown above, which accepts a Customer object as c =>, and returns a bool result by applying a filtering condition on the customer’s Country property and the calling method’s allCustomers variable.

And as expained at the beginning of this post, the method syntax query using our final lambda expression is exactly what the compiler would generate for us if we expressed it using this query syntax:

1 var q =
2     from c in custs
3     where c.Country == "USA" || allCustomers == true
4     select c;

I hope this helps clear up any confusion there is surrounding lambda expressions in .NET 3.5.

Happy coding!

This entry was posted in LINQ and tagged , , . Bookmark the permalink.

Leave a comment