In Hour 10, “Working with Arrays and Collections,” you learned how applications could work with data stored in collections. Applications also need to work with data stored in other data sources, such as SQL databases or XML files, or even accessed through a web service. Traditionally, queries against these different data sources required different syntax and performed no type checking at compile time.
For example, consider a collection of customers. How would you search that collection for all customers with a specific job title? Using what you have learned so far, you would need to write code that iterates over each item in the collection, examining the appropriate field and returning those items that match the job title for which you are searching. What would happen if the source of your customer data were to change and no longer be an in-memory collection but an XML file or data retrieved from a web service call? You would most likely need to rewrite your search logic to accommodate this new data source.
In this hour, you learn about Language Integrated Query (LINQ) and query expression expressions, which enable you to write a single query that works correctly for any supported data source.
Query expressions in the .NET Framework are part of a set of technologies called LINQ, which integrate query capabilities directly into the C# language. LINQ eliminates the language mismatch commonly found between working with data and working with objects by providing the same query language for the following data sources:
• XML documents
• Web services
• ADO.NET Datasets
• Any collections that support the IEnumerable
or IEnumerable<T>
interfaces.
This enables a query to be a first-class language construct, just like arithmetic operations and control flow statements are first-class concepts in C#.
Query expressions in LINQ can query and transform data from any supported data source in a consistent fashion by working with the common operations performed rather than focusing on the structure. You can freely change the structure of the underlying data being queried without needing to change the query itself.
Listing 13.1 shows a query against a collection of Contact
objects. Assume for the moment that the list has been populated as a result of calling GetContacts
.
class Contact
{
public int Id { get; set; }
public string Company { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
public string Address { get; set; }
public string City { get; set; }
public string StateProvince { get; set; }
}
IEnumerable<Contact> contacts = GetContacts();
var result =
from contact in contacts
select contact.FirstName;
foreach(var name in result)
{
Console.WriteLine(name);
}
This simple query illustrates the declarative syntax, also called the query comprehension syntax, supported by the C# language. This syntax enables you to write queries using Structured Query Language (SQL)-like query syntax, providing a great deal of flexibility and expressiveness. Although all the variables in a query expression are strongly typed, in most cases you don’t need to provide the type explicitly because the compiler can infer it.
Note: Query Comprehension Syntax
If you are familiar with SQL, the query comprehension syntax used by LINQ will be familiar since it uses some of the same keywords and offers many of the same advantages. The most noticeable difference is that the from
operator occurs before the select
operator, rather than after it as it does in SQL. Although SQL is designed to handle relational data only, LINQ actually supports far more data structures.
Although the code shown in Listing 13.1 might look simple, a lot is actually going on. The first thing you should notice is the use of an implicitly typed variable named result
, which is actually of type IEnumerable<string>
. The result of the query expression (the code on the right side of the assignment operator) is actually a query, not the result of the query. The select
clause returns an object that represents the operation of projecting a result (the contact.FirstName
values) from a sequence (the contacts
list). Because the results are strings, result
must be an enumerable collection of strings. It does not actually retrieve the data at this time; rather, it simply returns an enumerable collection that will fetch the data later.
This query literally says “select the FirstName
field from each element, called contact
, in the data source specified by contacts
.” You can think of the contact
variable specified in the from
clause as being similar to the iteration variable of a foreach
statement. It corresponds to a read-only local variable scoped only to the query expression. The in
clause specifies the data source containing the elements to be queried, and the select
clause says to select only the contact.FirstName
field for each element during the iteration.
Although this syntax works well for selecting a single field, it is common to select multiple fields or even to transform the data in some way, such as combining fields. Fortunately, LINQ enables these scenarios as well, using similar syntax. You actually have several options for performing these types of selections.
The first is simply to concatenate the fields in the select
statement, thereby still returning a single field, as shown in Listing 13.2.
var result =
from contact in contacts
select contact.FirstName + " " + contact.LastName;
foreach(var name in result)
{
Console.WriteLine(name);
}
Obviously, this form of selection works only in a limited number of cases. A more flexible approach is to return multiple fields, essentially returning a subset of data, as shown in Listing 13.3.
var result =
from contact in contacts
select new
{
Name = contact.LastName + ", " + contact.FirstName,
DateOfBirth = contact.DateOfBirth
};
foreach(var contact in result)
{
Console.WriteLine("{0} born on {1}", contact.Name, contact.DateOfBirth);
}
In this case, you are still returning an IEnumerable
, but what is its type? If you look at the select
clause in Listing 13.3, you should notice it is returning a new type containing the values from the contact.FirstName
and contact.LastName
fields. This new type is actually an anonymous type containing properties named Name
and DateOfBirth
. The type is anonymous because it doesn’t have a name. You did not explicitly declare a new type that corresponds to the returned value; the compiler generated it for you.
The ability to create anonymous types in this manner is central to the way LINQ works and would not be possible without the type inference provided by var
.
Selecting data is important, but selecting data in this way provides no option to restrict what data is returned. Just as SQL provides a where
clause, LINQ provides a where
clause that returns an enumerable collection containing elements that match the specified criteria. Listing 13.4 applies a where
clause to the query in Listing 13.3, restricting the results to only those contacts where the value of StateProvince
is equal to "FL"
.
var result =
from contact in contacts
where contact.StateProvince == "FL"
select new { customer.FirstName, customer.LastName };
foreach(var name in result)
{
Console.WriteLine(name.FirstName + " " + name.LastName);
}
The where
clause is applied first, resulting in an enumerable collection to which the select
clause is applied, resulting in an anonymous type containing the FirstName
and LastName
properties.
To support more complex scenarios, such as ordering or grouping the returned data, LINQ provides the orderby
and group
clauses. You can order data in either ascending (smallest to largest) or descending (largest to smallest) order. Because ascending is the default, you don’t need to specify it. Listing 13.5 shows the query from Listing 13.1 ordered by the LastName
field.
var result =
from contact in contacts
orderby contact.LastName
select contact.FirstName;
foreach(var name in result)
{
Console.WriteLine(name);
}
You can order by multiple fields and can mix ascending and descending to create rather sophisticated orderby
statements, as shown in Listing 13.6.
var result =
from contact in contacts
orderby
contact.LastName ascending,
contact.FirstName descending
select customer.FirstName;
foreach(var name in result)
{
Console.WriteLine(name);
}
Grouping data follows a similar pattern, but the group
clause takes the place of the select
clause. The difference when grouping data is that the result returned is an IEnumerable
of IGrouping<TKey, TElement>
objects, which you can think of as a list of lists. This requires two nested foreach
statements to access the results.
Listing 13.7 shows the same query as in Listing 13.1, but this time groups by the first character of the last name.
var result =
from contact in contacts
group contact by contact.LastName[0];
foreach(var group in result)
{
Console.WriteLine("Last names starting with {0}", group.Key);
foreach(var name in group)
{
Console.WriteLine(name);
}
Console.WriteLine();
}
If you need to refer to the result of a grouping operation, you can create an identifier that can be queried further using the into
keyword. This form of composability is a query continuation.
Listing 13.8 performs the same query as Listing 13.7 but returns only those groups that have more than two entries.
var result =
from contact in contacts
group contact by contact.LastName[0] into namesGroup
where namesGroup.Count() > 2
select namesGroup;
foreach(var group in result)
{
Console.WriteLine("Last names starting with {0}", group.Key);
foreach(var name in group)
{
Console.WriteLine(name);
}
Console.WriteLine();
}
LINQ also enables you to combine multiple data sources by joining them together on one or more common fields. Joining data is important for queries against data sources where their relationship cannot be followed directly. Unlike SQL, which supports joins using many different operators, join operations in LINQ are based on the equality of their keys.
Expanding on the earlier examples that used only the Contact
class, you need at least two classes to perform join operations. The Contact
class is shown again in Listing 13.9, along with a new JournalEntry
class. Continue the assumption that the contacts
list has been populated as a result of calling GetContacts
and that the journal
list has been populated as a result of calling GetJournalEntries
.
class Contact
{
public int Id { get; set; }
public string Company { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
public string Address { get; set; }
public string City { get; set; }
public string StateProvince { get; set; }
}
class JournalEntry
{
public int Id { get; set; }
public int ContactId { get; set; }
public string Description { get; set; }
public string EntryType { get; set; }
public DateTime Date { get; set; }
}
IEnumerable<Contact> contacts = GetContacts();
IEnumerable<JournalEntry> journal = GetJournalEntries();
The simplest join
query in LINQ is the functional equivalent of an inner join in SQL and uses the join
clause. Unlike joins
in SQL, which can use many different operators, joins
in LINQ can use only an equality operator and are called equijoins.
Listing 13.10 shows a query against a list of Contact
objects joined to a list of JournalEntry
objects using the Contact.ID
and JournalEntry.ContactId
fields as the keys for the join.
var result =
from contact in contacts
join journalEntry in journal
on contact.Id equals journalEntry.ContactId
select new
{
contact.FirstName,
contact.LastName,
journalEntry.Date,
journalEntry.EntryType,
journalEntry.Description
};
The join
clause in Listing 13.10 creates a range variable named journalEntry
, which is of type JournalEntry
, and then uses the equals
operator to join the two data sources.
LINQ also has the concept of a group join, which has no corresponding SQL query. A group join uses the into
keyword and creates results that have a hierarchical structure. Just as you did with the group
clause, you need nested foreach
statements to access the results.
When working with LINQ joins, order is important. The data source to be joined must be on the left side of the equals
operator and the joining data source must be on the right. In this example, contacts
is the data source to be joined and journal
is the joining data source.
Fortunately, the compiler can catch these types of errors and generate a compiler error. If you were to swap the parameters in the join
clause, you would get the following compiler error:
The name 'journalentry' is not in scope on the left side of 'equals'. Consider
swapping the expressions on either side of 'equals'.
Another important thing to watch out for is that the join
clause uses the equals
operator, which is not the same as the equality (==
) operator.
Listing 13.11 shows a query that joins contacts
and journal
and returns a result grouped by contact name. Each entry in the group has an enumerable collection of journal entries, represented by the JournalEntries
property in the returned anonymous type.
var result =
from contact in contacts
join journalEntry in journal
on contact.Id equals journalEntry.ContactId
into journalGroups
select new
{
Name = contact.LastName + ", " + contact.FirstName,
JournalEntries = journalGroups
};
Although selecting and joining data often return results in the right shape, that hierarchical shape can sometimes be cumbersome to work with. LINQ enables you to create queries that instead return the flattened data, much the same way you would when querying a SQL data source.
Suppose you were to change the Contact
and JournalEntry
classes so that a List<JournalEntries>
field named Journal
is added to the Contact
class and the ContactId
property is removed from the JournalEntry
class, as shown in Listing 13.12.
class Contact
{
public int Id { get; set; }
public string Company { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
public string Address { get; set; }
public string City { get; set; }
public string StateProvince { get; set; }
public List<JournalEntries> Journal;
}
class JournalEntry
{
public int Id { get; set; }
public string Description { get; set; }
public string EntryType { get; set; }
public DateTime Date { get; set; }
}
IEnumerable<Contact> contacts = GetContacts ();
You could then query the contacts collection using the following query to retrieve the list of journal entries for a specific contact, as shown in Listing 13.13.
var result =
from contact in contacts
where contact.Id == 1
select contact.Journal;
foreach(var item in result)
{
foreach(var journalEntry in item)
{
Console.WriteLine(journalEntry);
}
}
Although this works and returns the results, it still requires nested foreach
statements to generate the proper results. Fortunately, LINQ provides a query syntax that enables the data to be returned in a flattened manner by supporting a select
from more than one data source. The code in Listing 13.14 shows how this query would be written so that only a single foreach
statement is required by using multiple from
clauses.
var result =
from contact in contacts
from journalEntry in contact.Journal
where contact.Id == 1
select journalEntry;
foreach(var journalEntry in result)
{
Console.WriteLine(journalEntry);
}
All the queries you have just seen use declarative query syntax; however, they could have also been written using standard query operator method calls, which are actually extension methods for the Enumerable
class defined in the System.Linq
namespace. The compiler converts query expressions using the declarative syntax to the equivalent query operator method calls.
As long as you include the System.Linq
namespace with a using
statement, you can see the standard query operator methods on any classes that implement the IEnumerable<T>
interface, as shown in Figure 13.5.
Although the declarative query syntax supports almost all query operations, there are some, such as Count
or Max
, which have no equivalent query syntax and must be expressed as a method call. Because each method call returns an IEnumerable
, you can compose complex queries by chaining the method calls together. This is what the compiler does on your behalf when it compiles your declarative query expressions.
Listing 13.15 shows the same query from Listing 13.4 using method syntax rather than declarative syntax, and the output from both will be identical. The Where
method corresponds to the where
clause, whereas the Select
method corresponds to the select
clause.
Note: Declarative or Method Syntax
The choice of using the declarative syntax or the method syntax is entirely personal and depends on which one you find easier to read. No matter which one you choose, the result of executing the query will be the same.
var result = contacts.
Where(contact => contact.StateProvince == "FL").
Select(contact => new { contact.FirstName, contact.LastName });
foreach(var name in result)
{
Console.WriteLine(name.FirstName + " " + name.LastName);
}
In Listing 13.15, you might have noticed that the arguments passed to the Where
and Select
methods look different from what you have used before. These arguments actually contain code rather than data types. In Hour 7, “Events and Event Handling,” you learned about delegates, which enable a method to be passed as an argument to other methods, and about anonymous methods, which enable you to write an unnamed inline statement block that can be executed in a delegate invocation.
The combination of these concepts is a lambda, which is an anonymous function that can contain expressions and statements. Lambdas enable you to write code normally written using an anonymous method or generic delegate in a more convenient and compact way.
Because lambdas are a more compact way to write a delegate, you can use them anywhere you would ordinarily have used a delegate. As a result, the lambda formal parameter types must match the corresponding delegate type exactly. The return type must also be implicitly convertible to the delegate’s return type.
Although lambdas have no type, they are implicitly convertible to any compatible delegate type. That implicit conversion is what enables you to pass them without explicit assignment.
Lambdas in C# use the lambda operator (=>
). If you think about a lambda in the context of a method, the left side of the operator specifies the formal parameter list, and the right side of the operator contains the method body. All the restrictions that apply to anonymous methods also apply to lambdas.
The argument to the Where
method shown in Listing 13.15, contact => contact.StateProvince == "FL"
, is read as “contact goes to contact.StateProvince equals FL.”
Tip: Captured and Defined Variables
Lambdas also have the capability to “capture” variables, which can be local variables or parameters of the containing method. This enables the body of the lambda to access the captured variable by name. If the captured variable is a local variable, it must be definitely assigned before it can be used in the lambda. Captured parameters cannot be ref
or out
parameters.
Be careful, however, because variables that are captured by lambdas will not be eligible for garbage collection until the delegate that references it goes out of scope.
Any variables introduced within the lambda are not visible in the outer containing method. This also applies to the input parameter names, so you can use the same identifiers for multiple lambdas.
When a lambda contains an expression on the right side of the operator, it is an expression lambda and returns the result of that expression. The basic form of an expression lambda is as follows:
(input parameters) => expressions
If there is only one input parameter, the parentheses are optional. If you have any other number of input parameters, including none, the parentheses are required.
Just as generic methods can infer the type of their type parameter, lambdas can infer the type for their input parameters. If the compiler cannot infer the type, you can specify the type explicitly. Listing 13.16 shows different forms of expression lambdas.
x => Math.Pow(x, 2)
(x, y) => Math.Pow(x, y)
() => Math.Pow(2, 2)
(int x, string s) => s.Length < x
If you consider the expression portion of an expression lambda as the body of a method, an expression lambda contains an implicit return
statement that returns the result of the expression.
Caution: Expression Lambdas Containing Method Calls
Although most of the examples in Listing 13.16 used methods on the right side of the operator, if you create lambdas that will be used in another domain, such as SQL Server, you should not use method calls because they have no meaning outside the boundary of the .NET Framework common language runtime.
A lambda that has one or more statements enclosed by curly braces on the right side is a statement lambda. The basic form of a statement lambda is as follows:
(input parameters) => { statement; }
Like expression lambdas, if there is only one input parameter, the parentheses are optional; otherwise, they are required. Statement lambdas also follow the same rules of type inference.
Although expression lambdas contain an implicit return
statement, statement lambdas do not. You must explicitly specify the return
statement from a statement lambda. The return
statement causes only the implicit method represented by the lambda to return, not the enclosing method. Listing 13.17 shows different forms of statement lambdas.
(x) => { return x++; };
CheckBox cb = new CheckBox();
cb.CheckedChanged += (sender, e) =>
{
MessageBox.Show(cb.Checked.ToString());
};
Action<string> myDel = n =>
{
string s = n + " " + "World";
Console.WriteLine(s);
};
myDel("Hello");
A statement lambda cannot contain a goto
, break
, or continue
statement whose target is outside the scope of the lambda itself. Similarly, normal scoping rules prevent a branch into a nested lambda from an outer lambda.
Although lambdas are an integral component of LINQ, they can be used anywhere you can use a delegate. As a result, the .NET Framework provides many predefined delegates that can be used to represent a method that can be passed as a parameter without requiring you to first declare an explicit delegate type.
Because delegates that return a Boolean
value are common, the .NET Framework defines a Predicate<in T>
delegate, which is used by many of the methods in the Array
and List<T>
classes.
Although Predicate<T>
defines a delegate that always returns a Boolean
value, the Func
family of delegates encapsulates a method that has the specified return value and 0 to 16 input parameters.
Because Predicate<T>
and the Func
delegates all have a return type, the family of Action
delegates represents a method that has a void
return type. Just like the Func
delegates, the Action
delegates also accept from 0 to 16 input parameters.
Unlike many traditional data query techniques, a LINQ query is not evaluated until you actually iterate over it. One advantage of this approach, called lazy evaluation, is that it enables the data in the original collection to change between when the query is executed and the data identified by the query is retrieved. Ultimately, this means you will always have the most up-to-date data.
Even though LINQ prefers to use lazy evaluation, any queries that use any of the aggregation functions must first iterate over all the elements. These functions, such as Count
, Max
, Average
, and First
, return a single value and execute without using an explicit foreach
statement.
Tip: Deferred Execution and Chained Queries
Another advantage of deferred execution is that it enables queries to be efficiently chained together. Because query objects represent queries, not the results of those queries, they can easily be chained together or reused without causing potentially expensive data fetching operations.
You can also force immediate evaluation, sometimes called greedy evaluation, by placing the foreach
statement immediately after the query expression or by calling the ToList
or ToArray
methods. You can also use either ToList
or ToArray
to cache the data in a single collection object.
LINQ takes the best ideas from functional languages such as Haskell and other research languages and brings them together to introduce a way to query data in a consistent manner, no matter what the original data source might be, using a simple declarative or method-based syntax. By enabling queries to be written in a source-agnostic fashion, LINQ enables access to a wide variety of data sources, including databases, XML files, and in-memory collections.
Using syntax similar to that used by SQL queries, the declarative syntax of LINQ enables a query to be a first-class language construct, just like arithmetic operations and control flow statements. LINQ is actually implemented as a set of extension methods on the IEnumerable<T>
interface, which accept lambdas as a parameter. Lambdas, in the form of expression or statement lambdas, are a compact way to write anonymous delegates. When you first start with LINQ, you don’t need to use lambdas extensively, but as you become more familiar with them, you will find that they are extremely powerful.
Q. What is LINQ?
A. LINQ is a set of technologies that integrates query capabilities directly into the C# language and eliminates the language mismatch commonly found between working with data and working with objects by providing the same query language for each supported data source.
Q. What is a lambda expression?
A. A lambda expression represents a compact and concise way to write an anonymous delegate and can be used anywhere a traditional delegate can be used.
1. Is there a difference between the declarative and method syntax for LINQ?
2. When is a LINQ query executed?
3. What is the underlying delegate type for lambda expressions?
1. The choice of using the declarative syntax or the method syntax is entirely personal and depends on which one you find easier to read. No matter which one you choose, the result of executing the query will be the same.
2. By default, LINQ utilizes deferred execution of queries. This means that the query is not actually executed until the result is iterated over using a foreach
statement.
3. Lambda expressions are inherently typeless, so they have no underlying type; however, they can be implicitly converted to any compatible delegate type.
There are no exercises for this hour.