New C# 3.0 Language Additions

To make LINQ seamlessly integrate with C#, significant enhancements were needed for the C# language. Virtually every significant enhancement to the C# language made in version 3.0 was made specifically to support LINQ. While all of these features have merit on their own, it is really the sum of the parts contributing to LINQ that makes C# 3.0 so noteworthy.

To truly understand much of the syntax of LINQ, it is necessary for me to cover some of the new C# 3.0 language features before proceeding with the workings of the components of LINQ. This chapter will cover the following language additions:

  • Lambda expressions

  • Expression trees

  • The keyword var, object and collection initialization, and anonymous types

  • Extension methods

  • Partial methods

  • Query expressions

In the examples in this chapter, I do not explicitly show which assemblies should be added and which namespaces you should specify in your using directives for the assemblies and namespaces I cover in Chapter 1. I do point out any new ones though, but only in the first example introducing them.

Lambda Expressions

In C# 3.0, Microsoft has added lambda expressions. Lambda expressions have been used in computer languages as far back as LISP, and were conceptualized in 1936 by Alonzo Church, an American mathematician. These expressions provide shorthand syntax for specifying an algorithm.

But before jumping immediately into lambda expressions, let's take a look at the evolution of specifying an algorithm as an argument to a method since that is the purpose of lambda expressions.

Using Named Methods

Prior to C# 2.0, when a method or variable was typed to require a delegate, a developer would have to create a named method and pass that name where the delegate was required.

As an example, consider the following situation. Let's pretend we have two developers, one is a common code-developer, and the other is an application developer. It isn't necessary that there be two different developers, I just need labels to delineate the two different roles. The common-code developer wants to create general-purpose code that can be reused throughout the project. The application developer will consume that general-purpose code to create an application. In this example scenario, the common-code developer wants to create a generic method for filtering arrays of integers, but with the ability to specify the algorithm used to filter the array. First, he must declare the delegate. It will be prototyped to receive an int and return true if the int should be included in the filtered array.

So, he creates a utility class and adds the delegate and filtering method. Here is the common code:

public class Common
{
  public delegate bool IntFilter(int i);

  public static int[] FilterArrayOfInts(int[] ints, IntFilter filter)
  {
    ArrayList aList = new ArrayList();
    foreach (int i in ints)
    {
      if (filter(i))
      {
        aList.Add(i);
      }
    }
    return ((int[])aList.ToArray(typeof(int)));
  }
}

The common code developer will put both the delegate declaration and the FilterArrayOfInts into a common library assembly, a dynamic link library (DLL), so that it can be used in multiple applications.

The FilterArrayOfInts method listed previously allows the application developer to pass in an array of integers and a delegate to his filter method and get back a filtered array.

Now let's assume the application developer wants to filter (in) just the odd integers. Here is his filter method, which is declared in his application code.

Example. The application developer's filter method
public class Application
{
  public static bool IsOdd(int i)
  {
    return ((i & 1) == 1);
  }
}

Based on the code in the FilterArrayOfInts method, this method will get called for every int in the array that gets passed in. This filter will only return true if the int passed in is odd. Listing 2-1 shows an example using the FilterArrayOfInts method, followed by the results.

Example. Calling the Common Library Filter Method
using System.Collections;

int[] nums = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

int[] oddNums = Common.FilterArrayOfInts(nums, Application.IsOdd);

foreach (int i in oddNums)
  Console.WriteLine(i);

Here are the results:

1
3
5
7
9

Notice that to pass the delegate as the second parameter of FilterArrayOfInts, the application developer just passes the name of the method. By simply creating another filter, he can filter differently. He could have a filter for even numbers, prime numbers, whatever criteria he wants. Delegates lend themselves to highly reusable code.

Using Anonymous Methods

That's all well and good, but it can get tedious writing all these filter methods and whatever other delegate methods you may need. Many of these methods will only get used in a single call, and it's a bother to create named methods for them all. Since C# 2.0, developers have had the ability to use anonymous methods to pass code inline as a substitute for a delegate. Anonymous methods allow the developer to specify the code right where the delegate would normally get passed. Instead of creating the IsOdd method, he may specify the filtering code right where the delegate would normally be passed. Listing 2-2 shows the same code from Listing 2-1 but uses an anonymous method instead.

Example. Calling the Filter Method with an Anonymous Method
int[] nums = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

int[] oddNums =
  Common.FilterArrayOfInts(nums, delegate(int i) { return ((i & 1) == 1); });

foreach (int i in oddNums)
  Console.WriteLine(i);

This is pretty cool. The application developer no longer has to declare a method anywhere. This is great for filtering logic code that isn't likely to get reused. As required, the output is the same as the previous example:

1
3
5
7
9

Using anonymous methods does have one drawback. It's kind of verbose and hard to read. If only there was a more concise way to write the method code.

Using Lambda Expressions

Lambda expressions are specified as a comma-delimited list of parameters followed by the lambda operator, followed by an expression or statement block. If there is more than one input parameter, enclose the input parameters in parentheses. In C#, the lambda operator is =>. Therefore, a lambda expression in C# looks like this:

(param1, param2, ...paramN) => expr

Or when needing more complexity, a statement block can be used:

(param1, param2, ...paramN) =>
{
  statement1;
  statement2;
  ...
  statementN;
  return(lambda_expression_return_type);
}

In this example, the data type returned at the end of the statement block must match the return type specified by the delegate. Here is an example lambda expression:

x => x

This lambda expression could be read as "x goes to x," or perhaps "input x returns x." It means that for input variable x, return x. This expression merely returns what is passed in. Since there is only a single input parameter, x, it does not need to be enclosed in parentheses. It is important to know that it is the delegate that is dictating what the type of x being input is, and what type must be returned. For example, if the delegate is defined as passing a string in but returning a bool, then x => x could not be used because if x going in is a string, then x being returned would be a string as well, but the delegate specified it must be bool. So with a delegate defined like that, the portion of the expression to the right of the lambda operator (=>) must evaluate to or return a bool, such as this:

x => x.Length > 0

This lambda expression could be read as "x goes to x.Length > 0," or perhaps "input x returns x.Length > 0." Since the right-hand portion of this expression does evaluate to a bool, the delegate had better specify that the method returns a bool, otherwise a compiler error will result.

The following lambda expression will attempt to return the length of the input argument. So the delegate had better specify a return type of int:

s => s.Length

If multiple parameters are passed into the lambda expression, separate them with commas and enclose them in parentheses like this:

(x, y) => x == y

Complex lambda expressions may even be specified with a statement block like this:

(x, y) =>
{
  if (x > y)
    return (x);
  else
    return (y);
}

What is important to remember is that the delegate is defining what the input types are and what the return type must be. So make sure your lambda expression matches the delegate definition.

Make sure your lambda expressions are written to accept the input types specified by the delegate definition and return the type the delegate defines to be returned.


To refresh your memory, here is the delegate declaration that the common code developer defined:

delegate bool IntFilter(int i);

The application developer's lambda expression must support an int passed in and a bool being returned. This can be inferred from the method he is calling and the purpose of the filter method, but it is important to remember the delegate is dictating this.

So the previous example—but using a lambda expression this time, would look like Listing 2-3.

Example. Calling the Filter Method with a Lambda Expression
int[] nums = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

int[] oddNums = Common.FilterArrayOfInts(nums, i => ((i & 1) == 1));

foreach (int i in oddNums)
  Console.WriteLine(i);

Wow, that's concise code. I know it may look a little funny because it is so new, but once you get used to it, it sure is readable and maintainable. As is required, the results are the same as the previous examples:

1
3
5
7
9

For a recap, here are the significant lines from the sample code for each approach:

int[] oddNums =   //  using named method
  Common.FilterArrayOfInts(nums, Application.IsOdd);

int[] oddNums =   //  using anonymous method
  Common.FilterArrayOfInts(nums, delegate(int i){return((i & 1) == 1);});

int[] oddNums =   // using lambda expression
  Common.FilterArrayOfInts(nums, i => ((i & 1) == 1));

I know that first line is actually shorter, but don't forget that there is a named method declared somewhere else defining what the method does. Of course, if that filtering logic is going to be reused in several places, or perhaps if the algorithm is complex and should only be trusted to a specialized developer, it may make more sense to create a named method to be consumed by other developers.

Complex or reused algorithms may be better served by named methods so they can be reused by any developer without that developer necessarily understanding the algorithm.


Whether named methods, anonymous methods, or lambda expressions are used is up to the developer. Use whatever makes the most sense for the situation at hand.

You will often take advantage of lambda expressions by passing them as arguments to your LINQ query operator calls. Since every LINQ query is likely to have unique or scarcely reused operator lambda expressions, this provides the flexibility of specifying your operator logic without having to create named methods for virtually every query.

Expression Trees

An expression tree is an efficient data representation, in tree form, of a query operator's lambda expression. These expression tree data representations can be evaluated, all simultaneously, so that a single query can be built and executed against a data source, such as a database.

In the majority of the examples I have discussed so far, the query's operators have been performed in a linear fashion. Let's examine the following code:

int[] nums = new int[] { 6, 2, 7, 1, 9, 3 };
IEnumerable<int> numsLessThanFour = nums
  .Where(i => i < 4)
  .OrderBy(i => i);

This query contains two operators, Where and OrderBy, that are expecting method delegates as their arguments. When this code is compiled, .NET intermediate language (IL) code is emitted that is identical to an anonymous method for each of the query operator's lambda expressions.

When this query is executed, the Where operator is called first, followed by the OrderBy operator.

This linear execution of the operators seems reasonable for this example, but you should consider a query against a very large data source, such as a database. Would it make sense for a SQL query to first call the database with the where statement only to turn around and order it in a subsequent call? Of course this just isn't feasible for database queries, as well as potentially other types of queries. This is where expression trees become necessary. Since an expression tree allows the simultaneous evaluation and execution of all operators in a query, a single query can be made instead of a separate query for each operator.

So there now are two different things the compiler can generate for an operator's lambda expression, IL code or an expression tree. What determines whether an operator's lambda expression gets compiled into IL code or an expression tree? The operator's prototype will define which of these actions the compiler will take. If the operator is declared to accept a method delegate, IL code will be emitted. If the operator is declared to accept an expression of a method delegate, an expression tree is emitted.

As an example, let's look at two different implementations of the Where operator. The first is the Standard Query Operator that exists in the LINQ to Objects API, which is defined in the System.Linq.Enumerable class:

public static IEnumerable<T> Where<T>(
  this IEnumerable<T> source,
  Func<T, bool> predicate);

The second Where operator implementation exists in the LINQ to SQL API and is in the System.Linq.Queryable class:

public static IQueryable<T> Where<T>(
  this IQueryable<T> source,
  System.Linq.Expressions.Expression<Func<int, bool>> predicate);

As you can see, the first Where operator is declared to accept a method delegate, as specified by the Func delegate, and the compiler will generate IL code for this operator's lambda expression. I will cover the Func delegate in Chapter 3. For now just be aware that it is defining the signature of the delegate passed as the predicate argument. The second Where operator is declared to accept an expression tree (Expression), so the compiler will generate an expression tree data representation of the lambda expression.

The operators that accept an IEnumerable<T> sequence as their first argument are declared to accept a method delegate for their lambda expressions. The operators that accept an IQueryable<T> sequence as their first argument are declared to accept an expression tree.

NOTE

Extension methods on IEnumerable<T> sequences have IL code emitted by the compiler. Extension methods on IQueryable<T> sequences have expression trees emitted by the compiler.

Merely being a consumer of LINQ does not require the developer to be very cognizant of expression trees. It is the vendor's developer who adds LINQ capability to a data storage product who needs to fully understand expression trees. Because of this, I don't cover them in any detail in this book.

Keyword var, Object Initialization, and Anonymous Types

Be forewarned: it is nearly impossible to discuss the var keyword and implicit type inference without demonstrating object initialization or anonymous types. Likewise, it is nearly impossible to discuss object initialization or anonymous types without discussing the var keyword. All three of these C# language enhancements are very tightly coupled.

Before describing each of these three new language features in detail—because each will describe itself in terms of the other—allow me to introduce all three simultaneously. Let's examine the following statement:

var1 mySpouse = new {2
						FirstName = "Vickey"3, LastName = "Rattz"3 };

In this example, I declare a variable named mySpouse using the var keyword. It is assigned the value of an anonymous type that is initialized using the new object initialization features. That one line of code is taking advantage of the var keyword, anonymous types, and object initialization.

1You can detect the line of code is using the var keyword because it is explicitly stated. 2You are able to detect there is an anonymous type because I use the new operator without specifying a named class. 3And you can see the anonymous object is being explicitly initialized using the new object initialization feature.

In a nutshell, the var keyword allows the data type of an object to be inferred based on the data type with which it has been initialized. Anonymous types allow new class data types to be created on the fly. True to the word anonymous, these new data types have no name. You can't very well create an anonymous data type if you don't know what member variables it contains, and you can't know what members it contains unless you know what types those members are. Lastly, you won't know what data type those new members are unless they are initialized. The object initialization feature handles all that.

From that line of code, the compiler will create a new anonymous class type containing two public string members; the first being named FirstName and the second named LastName.

The Implicitly Typed Local Variable Keyword var

With the addition of anonymous types to C#, a new problem becomes apparent. If a variable is being instantiated that is an unnamed type, as in an anonymous type, of what type variable would you assign it to? Consider the following code as an example:

// This code will not compile.
??? unnamedTypeVar = new {firstArg = 1, secondArg = "Joe" };

What variable type would you declare unnamedTypeVar to be? This is a problem. The folks at Microsoft chose to remedy this by creating a new keyword, var. This new keyword informs the compiler that it should implicitly infer the variable type from the variable's initializer. This means that a variable declared with the var keyword must have an initializer.

If you leave off an initializer, you will get a compiler error. Listing 2-4 shows some code that declares a variable with the keyword var but fails to initialize it.

Example. An Invalid Variable Declaration Using the var keyword
var name;

And here is the compiler error it produces:

Implicitly-typed local variables must be initialized

Because these variables are statically type checked at compile time, an initializer is required so the compiler can implicitly infer the type from it. Attempting to assign a value of a different data type elsewhere in the code will result in a compiler error. For example, let's examine the code in Listing 2-5.

Example. An Invalid Assignment to a Variable Declared Using the var Keyword
var name = "Joe";    //  So far so good.
name = 1;            //  Uh oh.
Console.WriteLine(name);

This code is going to fail to compile because the name variable is going to be implicitly inferred to be of type string; yet I attempt to assign an integer value of 1 to the variable. Here is the compiler error this code generates:

Cannot implicitly convert type 'int' to 'string'

As you can see, the compiler is enforcing the variable's type. Back to that original code example of an anonymous type assignment, using the var keyword, my code with an additional line to display the variable would look like Listing 2-6.

Example. An Anonymous Type Assigned to a Variable Declared with the var Keyword
var unnamedTypeVar = new {firstArg = 1, secondArg = "Joe" };
Console.WriteLine(unnamedTypeVar.firstArg + ". " + unnamedTypeVar.secondArg);

Here are the results of this code:

1. Joe

As you can see, using the var keyword, you get static type checking plus the flexibility to support anonymous types. This will become very important when I discuss projection type operators in the remainder of this book.

In these examples so far, usage of the var keyword has been mandatory because there is no alternative. If you are assigning an object of an anonymous class type to a variable, you have no choice but to assign it to a variable declared with the var keyword. However, it is possible to use var any time you declare a variable, as long as it is getting initialized properly. I recommend refraining from that indulgence though for the sake of maintainability. I feel like developers should always know the type of data they are working with, and while the actual data type may be known to you now, will it be when you revisit this code in six months? What about when another developer is responsible once you leave?

For the sake of maintainable code, refrain from using the var keyword just because it is convenient. Use it when necessary, such as when assigning an object of anonymous type to a variable.


Object and Collection Initialization Expressions

Due to the need for the dynamic data types that anonymous types allow, there needed to be a change in the way objects and collections could be initialized. Since expressions are provided in a lambda expression or an expression tree, object and collection initialization was simplified for initialization on the fly.

Object Initialization

Object initialization allows you to specify the initialization values for publicly accessible members and properties of a class during instantiation. As an example, consider this class:

public class Address
{
  public string address;
  public string city;
  public string state;
  public string postalCode;
}

Prior to the object initialization feature added to C# 3.0, without a specialized constructor you would have to initialize an object of type Address as shown in Listing 2-7.

Example. Instantiating and Initializing the Class the Old Way
Address address = new Address();
address.address = "105 Elm Street";
address.city = "Atlanta";
address.state = "GA";
address.postalCode = "30339";

This will become very cumbersome in a lambda expression. Imagine you have queried the values from a data source and are projecting specific members into an Address object with the Select operator:

//  This code will not compile.
IEnumerable<Address> addresses = somedatasource
  .Where(a => a.State = "GA")
  .Select(a => new Address(???)???);

You just won't have a convenient way to get the members initialized in the newly constructed Address object. Have no fear: object initialization to the rescue. Now you may be saying that you could create a constructor that would allow you to pass all those initialization values in when the object is instantiated. Yes, you could, some of the time. But what a hassle that would be, wouldn't it? And how are you going to do that with an anonymous type created on the fly? Wouldn't it be much easier to just instantiate the object as shown in Listing 2-8?

Example. Instantiating and Initializing the Class the New Fancy-Pants Way
Address address = new Address {
                        address = "105 Elm Street",
                        city = "Atlanta",
                        state = "GA",
                        postalCode = "30339"
                      };

You can get away with that in a lambda expression. Also, remember these new object initialization capabilities can be used anywhere, not just with LINQ queries.

When using object initialization, the compiler instantiates the object using the class's parameterless constructor, then it initializes the named members with the specified values. Any members that are not specified will have the default value for their data type.

Collection Initialization

As if the new object initialization enhancements were not enough, someone at Microsoft must have said, "What about collections?" Collection initialization allows you to specify the initialization values for a collection, just like you would for an object, as long as the collection implements the System.Collections.Generic.ICollection<T> interface. This means that none of the legacy C# collections, those in the System.Collection namespace, can be initialized with collection initialization.

As an example of collection initialization, consider the code in Listing 2-9.

Example. An Example of Collection Initialization
using System.Collections.Generic;

List<string> presidents = new List<string> { "Adams", "Arthur", "Buchanan" };
foreach(string president in presidents)
{
  Console.WriteLine(president);
}

When running the example by pressing Ctrl+F5, you get the following results:

Adams
Arthur
Buchanan

In addition to using collection initialization with LINQ, it can be very handy for creating initialized collections in code where LINQ queries are not even present.

Anonymous Types

Creating a new language level API for generic data query is made more difficult by the C# language's lack of ability to dynamically create new data types at compile time. If we want data queries to retrieve first-class language level elements, the language must have the ability to create first-class language level data elements, which for C# are classes. So the C# 3.0 language specification now includes the ability to dynamically create new unnamed classes and objects from those classes. This type of class is known as an anonymous type.

An anonymous type has no name and is generated by the compiler based on the initialization of the object being instantiated. Since the class has no type name, any variable assigned to an object of an anonymous type must have some way to declare it. This is the purpose of the new C# 3.0 var keyword.

The anonymous type is invaluable when projecting new data types using the Select or SelectMany operators. Without anonymous types, predefined named classes would always have to exist for the purpose of projecting data into the predefined named classes when calling the Select or SelectMany operators. It would be very inconvenient to have to create named classes for every query.

In the object initialization section of this chapter, I discussed the following object instantiation and initialization code:

Address address = new Address {
                        address = "105 Elm Street",
                        city = "Atlanta",
                        state = "GA",
                        postalCode = "30339"
                      };

If instead of using the named Address class I want to use an anonymous type, I would just omit the class name. However, you can't store the newly instantiated object in a variable of Address type because it is no longer a variable of type Address. It now has a generated type name known only to the compiler. So I have to change the data type of the address variable too. This again is what the var keyword is for, as demonstrated by Listing 2-10.

Example. Instantiating and Initializing an Anonymous Type Using Object Initialization
var address = new {
                address = "105 Elm Street",
                city = "Atlanta",
                state = "GA",
                postalCode = "30339"
              };

Console.WriteLine("address = {0} : city = {1} : state = {2} : zip = {3}",
  address.address, address.city, address.state, address.postalCode);

Console.WriteLine("{0}", address.GetType().ToString());

I added that last call to the Console.WriteLine method just so you can see the internal compiler-generated name for the anonymous class. Here are the results:

address = 105 Elm Street : city = Atlanta : state = GA : zip = 30339
<>f__AnonymousType5`4[System.String,System.String,System.String,System.String]

That anonymous class type certainly looks compiler-generated to me. Of course, your compiler generated anonymous class name could be different.

Extension Methods

An extension method is a static method of a static class that you can call as though it were an instance method of a different class. For example, you could create an extension method named ToDouble that is a static method in a static class you create named StringConversions, but that is called as though it were a method of an object of type string.

Before I explain extension methods in detail, let's first review the problem that lead to their creation by discussing static (class) versus instance (object) level methods. Instance level methods can only be called on instances of a class, otherwise known as objects. You cannot call an instance level method on the class itself. Likewise, static methods must be called on the class, as opposed to an instance of a class.

Instance (Object) vs. Static (Class) Methods Recap

The string class ToUpper method is an example of an instance level method. You cannot call ToUpper on the string class itself; you must call it on a string object.

In the code in Listing 2-11, I demonstrate this by calling the ToUpper method on the object named name.

Example. Calling an Instance Method on an Object
//  This code will compile.
string name = "Joe";
Console.WriteLine(name.ToUpper());

The previous code compiles, and when run produces the following output:

JOE

However, if I try to call the ToUpper method on the string class itself, I will get a compiler error because the ToUpper method is an instance level method and I am attempting to call it on the class, rather than the object. Listing 2-12 shows an example of an attempt to do this, and the compiler error generated by it.

Example. Trying to Call an Instance Method on a Class
//  This code will not even compile.
string.ToUpper();

Just trying to compile this code produces the following compiler error:

An object reference is required for the nonstatic field, method, or property
'string.ToUpper()'

This example seems a little hokey though since it couldn't possibly work because I never gave it any string value to convert to uppercase. Any attempt to do so though would result in trying to call some variation of the ToUpper method that does not exist because there is no prototype for the ToUpper method whose signature includes a string.

Contrast the ToUpper method with the string class Format method. This method is defined to be static. This requires the Format method to be called on the string class itself, rather than on an object of type string. First I will try to call it on an object with the code in Listing 2-13.

Example. Trying to Call a Class Method on an Object
string firstName = "Joe";
string lastName = "Rattz";
string name = firstName.Format("{0} {1}", firstName, lastName);
Console.WriteLine(name);

This code produces the following compiler error:

Member 'string.Format(string, object, object)' cannot be accessed with an instance
reference; qualify it with a type name instead

However, if instead I call the Format method on the string class itself, it compiles and works as desired, as demonstrated in Listing 2-14.

Example. Calling a Class Method on a Class
string firstName = "Joe";
string lastName = "Rattz";
string name = string.Format("{0} {1}", firstName, lastName);
Console.WriteLine(name);

The code produces the following results:

Joe Rattz

It is sometimes obvious from parts of the signature other than the static keyword itself that the method must be an instance-level method. For example, consider the ToUpper method. It doesn't have any arguments other than one overloaded version taking a CultureInfo object reference. So if it isn't relying on a string instance's internal data, what string would it convert to uppercase?

The Problem Solved by Extension Methods

So what is the problem you ask? For this discussion, assume you are the developer responsible for designing a new way to query multitudes of objects. Let's say you decide to create a Where method to help with the where clauses. How would you do it?

Would you make the Where operator an instance method? If so, to what class would you add that Where method? You want the Where method to work for querying any collection of objects. There just isn't a logical class to add the Where method to. Taking this approach, you would have to modify a zillion different classes if you want universal data querying capability.

So now that you realize the method must be static, what is the problem? Think of your typical (SQL) query and how many where clauses you often have. Also consider the joins, grouping, and ordering.

Let's imagine that you have created the concept of a new data type, a sequence of generic data objects that we will call an Enumerable. It makes sense that the Where method would need to operate on an Enumerable (of data) and return another filtered Enumerable. In addition, the Where method will need to accept an argument allowing the developer to specify the exact logic used to filter data records from or into the Enumerable. This argument, that I will call the predicate, could be specified as a named method, an anonymous method, or a lambda expression.

The following three code examples in this section are hypothetical and will not compile.


Since the Where method requires an input Enumerable to filter, and the method is static, that input Enumerable must be specified as an argument to the Where method. It would appear something like the following:

static Enumerable Enumerable.Where(Enumerable input, LambdaExpression predicate) {
...
}

Ignoring for the moment the semantics of a lambda expression, calling the Where method would look something like the following:

Enumerable enumerable = {"one", "two", "three"};
Enumerable filteredEnumerable = Enumerable.Where(enumerable, lambdaExpression);

That doesn't look too ornery. But what happens when we need several where clauses? Since the Enumerable that the Where method is operating on must be an argument to the method, the result is that chaining methods together requires embedding them inside each other. Three where clauses suddenly change the code to the following:

Enumerable enumerable = {"one", "two", "three"};
Enumerable finalEnumerable =
  Enumerable.Where(Enumerable.Where(Enumerable.Where(enumerable, lX1), lX2), lX3);

You have to read the statement from the inside out. That gets hard to read in a hurry. Can you imagine what a complex query would look like? If only there was a better way.

The Solution

A nice solution would be if you could call the static Where method on each Enumerable object itself, rather than on the class. Then it would no longer be necessary to pass each Enumerable into the Where method because the Enumerable object would have access to its own internal Enumerable. That would change the syntax of the query proposed previously to something more like this:

Enumerable enumerable = {"one", "two", "three"};
Enumerable finalEnumerable = enumerable.Where(lX1).Where(lX2).Where(lX3);

The previous code and the following code example are hypothetical and will not compile.


This could even be rewritten as the following:

Enumerable enumerable = {"one", "two", "three"};
Enumerable finalEnumerable = enumerable
  .Where(lX1)
  .Where(lX2)
  .Where(lX3);

Wow, that's much easier to read. You can now read the statement from left to right, top to bottom. As you can see, this syntax is very easy to follow once you understand what it is doing. Because of this, you will often see LINQ queries written in this format in much of the LINQ documentation and in this book.

Ultimately what you need is the ability to have a static method that you can call on a class instance. This is exactly what extension methods are and what they allow. They were added to C# to provide a syntactically elegant way to call a static method without having to pass the method's first argument. This allows the extension method to be called as though it were a method of the first argument, which makes chaining extension method calls far more readable than if the first argument was passed. Extension methods assist LINQ by allowing the Standard Query Operators to be called on the IEnumerable<T> interface.

NOTE

Extension methods are methods that while static can be called on an instance (object) of a class rather than on the class itself.

Extension Method Declarations and Invocations

Specifying a method's first argument with the this keyword modifier will make that method an extension method.

The extension method will appear as an instance method of any object with the same type as the extension method's first argument's data type. For example, if the extension method's first argument is of type string, the extension method will appear as a string instance method and can be called on any string object.

Also keep in mind that extension methods can only be declared in static classes.

Here is an example of an extension method:

namespace Netsplore.Utilities
{
  public static class StringConversions
  {
    public static double ToDouble(this string s) {
      return Double.Parse(s);
    }

    public static bool ToBool(this string s) {
      return Boolean.Parse(s);
    }
  }
}

Notice that both the class and every method it contains are static. Now you can take advantage of those extension methods by calling the static methods on the object instances as shown in Listing 2-15. Because the ToDouble method is static and its first argument specifies the this keyword, ToDouble is an extension method.

Example. Calling an Extension Method
using Netsplore.Utilities;

double pi = "3.1415926535".ToDouble();
Console.WriteLine(pi);

This produces the following results:

3.1415926535

It is important that you specify the using directive for the Netsplore.Utilities namespace, otherwise the compiler will not find the extension methods and you will get compiler errors such as the following:

'string' does not contain a definition for 'ToDouble' and no extension method
'ToDouble' accepting a first argument of type 'string' could be found (are you
missing a using directive or an assembly reference?)

As mentioned previously, attempting to declare an extension method inside a nonstatic class is not allowed. If you do so, you will see a compiler error like the following:

Extension methods must be defined in a non-generic static class

Extension Method Precedence

Normal object instance methods take precedence over extension methods when their signature matches the calling signature.

Extension methods seem like a really useful concept, especially when you want to be able to extend a class you cannot, such as a sealed class or one for which you do not have source code. The previous extension method examples all effectively add methods to the string class. Without extension methods, you couldn't do that because the string class is sealed.

Partial Methods

Recently added to C# 3.0, partial methods add a lightweight event-handling mechanism to C#. Forget the conclusions you are more than likely drawing about partial methods based on their name. About the only thing partial methods have in common with partial classes is that a partial method can only exist in a partial class. In fact, that is rule 1 for partial methods.

Before I get to all of the rules concerning partial methods let me tell you what they are. Partial methods are methods where the prototype or definition of the method is specified in the declaration of a partial class, but an implementation for the method is not provided in that same declaration of the partial class. In fact, there may not be any implementation for the method in any declaration of that same partial class. And if there is no implementation of the method in any other declaration for the same partial class, no IL code is emitted by the compiler for the declaration of the method, the call to the method, or the evaluation of the arguments passed to the method. It's as if the method never existed.

Some people do not like the term "partial method" because it is somewhat of a misnomer due to their behavior when compared to that of a partial class. Perhaps the method modifier should have been ghost instead of partial.

A Partial Method Example

Let's take a look at a partial class containing the definition of a partial method in the following class file named MyWidget.cs:

Example. The MyWidget Class File
public partial class MyWidget
{
  partial void MyWidgetStart(int count);
  partial void MyWidgetEnd(int count);

  public MyWidget()
  {
    int count = 0;
    MyWidgetStart(++count);
    Console.WriteLine("In the constructor of MyWidget.");
    MyWidgetEnd(++count);
    Console.WriteLine("count = " + count);
  }
}

In the MyWidget class declaration above, I have a partial class named MyWidget. The first two lines of code are partial method definitions. I have defined partial methods named MyWidgetStart and MyWidgetEnd that each accept an int input parameter and return void. It is another rule that partial methods must return void.

The next piece of code in the MyWidget class is the constructor. As you can see, I declare an int named count and initialize it to 0. I then call the MyWidgetStart method, write a message to the console, call the MyWidgetEnd method, and finally output the value of count to the console. Notice I am incrementing the value of count each time it is passed into a partial method. I am doing this to prove that if no implementation of a partial method exists, its arguments are not even evaluated.

In Listing 2-16 I instantiate a MyWidget object.

Example. Instantiating a MyWidget
MyWidget myWidget = new MyWidget();

Let's take a look at the output of this example by pressing Ctrl+F5:

In the constructor of MyWidget.
count = 0

As you can see, even after the MyWidget constructor has incremented its count variable twice, when it displays the value of count at the end of the constructor, it is still 0. This is because the code for the evaluation of the arguments to the unimplemented partial methods is never emitted by the compiler. No IL code was emitted for either of those two partial method calls.

Now let's add an implementation for the two partial methods:

Example. Another Declaration for MyWidget but Containing Implementations for the Partial Methods
public partial class MyWidget
{
  partial void MyWidgetStart(int count)
  {
    Console.WriteLine("In MyWidgetStart(count is {0})", count);
  }

  partial void MyWidgetEnd(int count)
  {
    Console.WriteLine("In MyWidgetEnd(count is {0})", count);
  }
}

Now that you have added this declaration, run Listing 2-16 again and look at the results:

In MyWidgetStart(count is 1)
In the constructor of MyWidget.
In MyWidgetEnd(count is 2)
count = 2

As you can see, not only are the partial method implementations getting called, the arguments passed are evaluated as well. You can see this because of the value of the count variable at the end of the output.

What Is the Point of Partial Methods?

So you may be wondering, what is the point? Others have said, "This is similar to using inheritance and virtual methods. Why corrupt the language with something similar?" To them I say "Take a chill-pill Jill." Partial methods are more efficient if you plan on allowing many potentially unimplemented hooks in the code. They allow code to be written with the intention of someone else extending it via the partial class paradigm but without the degradation in performance if they choose not to.

The case in point for which partial methods were probably added is the code generated for LINQ to SQL entity classes by the entity class generator tools. To make the generated entity classes more usable, partial methods have been added to them. For example, each mapped property of a generated entity class has a partial method that is called before the property is changed and another partial method that is called after the property is changed. This allows you to add another module declaring the same entity class, implement these partial methods, and be notified every time a property is about to be changed and after it is changed. How cool is that? And if you don't do it, the code is no bigger and no slower. Who wouldn't want that?

The Rules

It has been all fun and games up to here, but unfortunately, there are some rules that apply to partial methods. Here is a list:

  • Partial methods must only be defined and implemented in partial classes

  • Partial methods must specify the partial modifier

  • Partial methods are private but must not specify the private modifier or a compiler error will result

  • Partial methods must return void

  • Partial methods may be unimplemented

  • Parital methods may be static

  • Partial methods may have arguments

These rules are not too bad. For what we gain in terms of flexibility in the generated entity classes plus what we can do with them ourselves, I think C# has gained a nice feature.

Query Expressions

One of the conveniences that the C# language provides is the foreach statement. When you use foreach, the compiler translates it into a loop with calls to methods such as GetEnumerator and MoveNext. The simplicity the foreach statement provides for enumerating through arrays and collections has made it very popular and often used.

One of the features of LINQ that seems to attract developers is the SQL-like syntax available for LINQ queries. The first few LINQ examples in the first chapter of this book use this syntax. This syntax is provided via the new C# 3.0 language enhancement known as query expressions. Query expressions allow LINQ queries to be expressed in nearly SQL form, with just a few minor deviations.

To perform a LINQ query, it is not required to use query expressions. The alternative is to use standard C# dot notation, calling methods on objects and classes. In many cases, I find using the standard dot notation favorable for instructional purposes because I feel it is more demonstrative of what is actually happening and when. There is no compiler translating what I write into the standard dot notation equivalent. Therefore, many examples in this book do not use query expression syntax but instead opt for the standard dot notation syntax. However, there is no disputing the allure of query expression syntax. The familiarity it provides in formulating your first queries can be very enticing indeed.

To get an idea of what the two different syntaxes look like, Listing 2-17 shows a query using the standard dot notation syntax.

Example. A Query Using the Standard Dot Notation Syntax
string[] names = {
  "Adams", "Arthur", "Buchanan", "Bush", "Carter", "Cleveland",
  "Clinton", "Coolidge", "Eisenhower", "Fillmore", "Ford", "Garfield",
  "Grant", "Harding", "Harrison", "Hayes", "Hoover", "Jackson",
  "Jefferson", "Johnson", "Kennedy", "Lincoln", "Madison", "McKinley",
  "Monroe", "Nixon", "Pierce", "Polk", "Reagan", "Roosevelt", "Taft",
  "Taylor", "Truman", "Tyler", "Van Buren", "Washington", "Wilson"};
IEnumerable<string> sequence = names
							  .Where(n => n.Length < 6)
							  .Select(n => n);

foreach (string name in sequence)
{
  Console.WriteLine("{0}", name);
}

Listing 2-18 is the equivalent query using the query expression syntax:

Example. The Equivalent Query Using the Query Expression Syntax
string[] names = {
  "Adams", "Arthur", "Buchanan", "Bush", "Carter", "Cleveland",
  "Clinton", "Coolidge", "Eisenhower", "Fillmore", "Ford", "Garfield",
  "Grant", "Harding", "Harrison", "Hayes", "Hoover", "Jackson",
  "Jefferson", "Johnson", "Kennedy", "Lincoln", "Madison", "McKinley",
  "Monroe", "Nixon", "Pierce", "Polk", "Reagan", "Roosevelt", "Taft",
  "Taylor", "Truman", "Tyler", "Van Buren", "Washington", "Wilson"};
IEnumerable<string> sequence = from n in names
							                               where n.Length < 6
							                               select n;

foreach (string name in sequence)
{
  Console.WriteLine("{0}", name);
}

The first thing you may notice about the query expression example is that unlike SQL, the from statement precedes the select statement. One of the compelling reasons for this change is to narrow the scope for IntelliSense. Without this inversion of the statements, if in the Visual Studio 2008 text editor you typed select followed by a space, IntelliSense will have no idea what variables to display in its drop-down list. The scope of possible variables at this point is not restricted in any way. By specifying where the data is coming from first, IntelliSense has the scope of what variables to offer you for selection. Both of these examples provide the same results:

Adams
Bush
Ford
Grant
Hayes
Nixon
Polk
Taft
Tyler

It is important to note that the query expression syntax only translates the most common query operators: Where, Select, SelectMany, Join, GroupJoin, GroupBy, OrderBy, ThenBy, OrderByDescending, and ThenByDescending.

Query Expression Grammar

Your query expressions must adhere to the following rules:

  1. A query expression must begin with a from clause.

  2. The remainder of the query expression may then contain zero or more from, let, or where clauses. A from clause is a generator that declares one or more enumerator variables enumerating over a sequence or a join of multiple sequences. A let clause introduces a variable and assigns a value to it. A where clause filters elements from the sequence or join of multiple sequences into the output sequence.

  3. The remainder of the query expression may then be followed by an orderby clause which contains one or more ordering fields with optional ordering direction. Direction is either ascending or descending.

  4. The remainder of the query expression must then be followed by a select or group clause.

  5. The remainder of the query expression may then be followed by an optional continuation clause. A continuation clause is either the into clause, zero or more join clauses, or another repeating sequence of these numbered elements beginning with the clauses in No. 2. An into clause directs the query results into an imaginary output sequence, which functions as a from clause for a subsequent query expression beginning with the clauses in No. 2.

For a more technical yet less wordy description of the query expression syntax, use the following grammar diagram provided by Microsoft in the MSDN LINQ documentation:

query-expression:
  from-clause   query-body

from-clause:
  from   typeopt   identifier   in   expression   join-clausesopt

join-clauses:
  join-clause
  join-clauses   join-clause

join-clause:
  join   typeopt   identifier   in   expression   on   expression   equals
    expression
  join   typeopt   identifier   in   expression   on   expression   equals
    expression   into   identifier

query-body:
  from-let-where-clausesopt   orderby-clauseopt   select-or-group-clause
    query-continuationopt

from-let-where-clauses:
  from-let-where-clause
  from-let-where-clauses   from-let-where-clause

from-let-where-clause:
  from-clause
  let-clause
  where-clause

let-clause:
  let   identifier   =   expression

where-clause:
  where   boolean-expression

orderby-clause:
  orderby   orderings

orderings:
  ordering
  orderings   ,   ordering

ordering:
  expression   ordering-directionopt

ordering-direction:
  ascending
  descending

select-or-group-clause:
  select-clause
  group-clause

select-clause:
  select   expression

group-clause:
  group   expression   by   expression

query-continuation:
  into   identifier   join-clausesopt   query-body

Query Expression Translation

Now assuming you have created a syntactically correct query expression, the next issue becomes how the compiler translates the query expression into C# code. It must translate your query expression into the standard C# dot notation that I discuss in the query expression section. But how does it do this?

To translate a query expression, the compiler is looking for code patterns in the query expression that need to be translated. The compiler will perform several translation steps in a specific order to translate the query expression into standard C# dot notation. Each translation step is looking for one or more related code patterns. The compiler must repeatedly translate all occurrences of the code patterns for that translation step in the query expression before moving on to the next translation step. Likewise, each step operates on the assumption that the query has had the code patterns for all previous translation steps translated.

Transparent Identifiers

Some translations insert enumeration variables with transparent identifiers. In the translation step descriptions in the next section, a transparent identifier is identified with an asterisk (*). This should not be confused with the SQL selected field wildcard character, *. When translating query expressions, sometimes additional enumerations are generated by the compiler, and transparent identifiers are used to enumerate through them. The transparent identifiers only exist during the translation process and once the query expression is fully translated no transparent identifiers will remain in the query.

Translation Steps

Next I discuss the translation steps. In doing so, I use the variable letters shown in Table 2-1 to represent specific portions of the query.

Table Translation Step Variables
VariableDescriptionExample
cA compiler-generated temporary variableN/A
eAn enumerator variablefrom e in customers
fSelected field element or new anonymous typefrom e in customers select f
gA grouped elementfrom e in s group g by k
iAn imaginary into sequencefrom e in s into i
kGrouped or joined key elementfrom e in s group g by k
lA variable introduced by letfrom e in s let l = v
oAn ordering elementfrom e in s orderby o
sInput sequencefrom e in s
vA value assigned to a let variablefrom e in s let l = v
wA where clausefrom e in s where w

Allow me to provide a word of warning. The soon to be described translation steps are quite complicated. Do not allow this to discourage you. You no more need to fully understand the translation steps to write LINQ queries than you need to know how the compiler translates the foreach statement to use it. They are here to provide additional translation information should you need it, which should be rarely, or never.

The translation steps are documented as code pattern translation. Oddly, even though I present the translation steps in the order the compiler performs them, I think the translation process is simpler to understand if you learn them in the reverse order. The reason is that when you look at the first translation step, it handles only the first code pattern translation and you are left with a lot of untranslated code patterns that you have yet to be introduced to. In my mind, this leaves a lot of unaccounted for gobbledygook. Since each translation step requires the previous translation step's code patterns to already be translated, by the time you get to the final translation step, there is no gobbledygook left. I think this makes the final translation step easier to understand than the first. And in my opinion, traversing backward through the translation steps is the easiest way to understand what is going on.

That said, here are the translation steps presented in the order in which the compiler performs them.

Select and Group Clauses with into Continuation Clause

If your query expression contains an into continuation clause, the following translation is made:



Here is an example:



Explicit Enumeration Variable Types

If your query expression contains a from clause that explicitly specifies an enumeration variable type, the following translation will be made:



Here is an example:



If your query expression contains a join clause that explicitly specifies an enumeration variable type, the following translation will be made:



Here is an example:



Explicitly typing enumeration variables is necessary when the enumerated data collection is one of the C# legacy data collections, such as ArrayList. The casting that is done when explicitly typing the enumeration variable converts the legacy collection into a sequence implementing IEnumerable<T> so that other query operators can be performed.


Join Clauses

If the query expression contains a from clause followed by a join clause without an into continuation clause followed by a select clause, the following translation takes place (t is a temporary compiler-generated variable):



Here is an example:



If the query expression contains a from clause followed by a join clause with an into continuation clause followed by a select clause, the following translation takes place (t is a temporary compiler generated variable):



Here is an example:



If the query expression contains a from clause followed by a join clause without an into continuation clause followed by something other than a select clause, the following translation takes place (* is a transparent identifier):



Notice that you now have a code pattern that matches the first code pattern in this translation step. Specifically, you have a query expression that contains a from clause followed by a join clause without an into continuation clause followed by a select clause. So the compiler will repeat this translation step.

If the query expression contains a from clause followed by a join clause with an into continuation clause followed by something other than a select clause, the following translation takes place (* is a transparent identifier):



This time notice that there is now a code pattern that matches the second code pattern in this translation step. Specifically, there is a query expression that contains a from clause followed by a join clause with an into continuation clause followed by a select clause. So the compiler will repeat this translation step.

Let and Where Clauses

If the query expression contains a from clause followed immediately by a let clause, the following translation takes place (* is a transparent identifier):



Here is an example (t is a compiler generated identifier that is invisible and inaccessible to any code you write):



If the query expression contains a from clause followed immediately by a where clause, the following translation takes place:



Here is an example:



Multiple Generator (From) Clauses

If the query expression contains two from clauses followed by a select clause, the following translation takes place:



Here is an example (t is a temporary compiler generated variable):



If the query expression contains two from clauses followed by something other than a select clause, the following translation takes place (* is a transparent identifier):



Here is an example (* is a transparent identifier):



Orderby Clauses

If the direction of the ordering is ascending, the following translations take place:



Here is an example:



If the direction of any of the orderings is descending, the translations will be to the OrderByDescending or ThenByDescending operators. Here is the same example as the previous, except this time the names are requested in descending order:



Select Clauses

In the query expression, if the selected element is the same identifier as the sequence enumerator variable, meaning you are selecting the entire element that is stored in the sequence, the following translation takes place:



Here is an example:



If the selected element is not the same identifier as the sequence enumerator variable, meaning you are selecting something other than the entire element stored in the sequence such as a member of the element or an anonymous type constructed of several members of the element, the following translation takes place:



Here is an example:



Group Clauses

In the query expression, if the grouped element is the same identifier as the sequence enumerator, meaning you are grouping the entire element stored in the sequence, the following translation takes place:



Here is an example:



If the grouped element is not the same identifier as the sequence enumerator, meaning you are grouping something other than the entire element stored in the sequence, the following translation takes place:



Here is an example:



At this point all translation steps are completed and the query expression should be fully translated to standard dot notation syntax.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset