Visual Basic provides many tools for storing data. You can store data in individual variables, arrays, and collections. Generic collection classes even let you store data in a strongly typed way — so, for example, you know that a List(Of Student)
contains only Student
objects and nothing else.
However, those storage methods by themselves don't give you an easy way to manipulate their contents. For example, suppose you have a List(Of Student)
and you want to find the entry for the student with the highest average test score; or you want to find all students who are missing a test score; or you want to sort the students by midterm scores in ascending order.
None of these operations is very difficult. You can loop through the list looking for the items you want. Sorting is a little trickier depending on how you want to sort the items, but in any case you can accomplish all of these operations with a dozen or two lines of code.
Language-Integrated Query (LINQ) provides another method for manipulating these sorts of data collections. Instead of just giving you a way to search collections, however, LINQ enables a program to access data stored in databases, arrays, collections, or files in basically the same way.
LINQ provides four basic technologies that give you access to data stored in various places:
In this lesson you learn how to use LINQ to Objects. You learn how to extract data from lists, collections, and arrays, and how to process the results.
The other forms of LINQ are even more specialized than the topics that I want to cover in this part of the book, so I won't discuss them here. After you learn about LINQ to Objects, you will be able to learn about the others on your own. Microsoft's "Language-Integrated Query (LINQ)" page at msdn.microsoft.com/library/bb397926.aspx
is a good starting point for learning more about LINQ.
Using LINQ to process data takes three steps:
Create a data source.
Build a query to select data from the data source.
Execute the query and process the result.
You might expect the third step to be two separate steps, "Execute the query" and "Process the result." In practice, however, LINQ doesn't actually execute the query until it must — when the program tries to access the results. This is called deferred execution.
The EvenNumbers example program (which is available for download on the book's web site) uses these steps in the following code that displays the even numbers between 0 and 99:
' 1. Create the data source. Dim numbers(0 To 100) As Integer For i As Integer = 0 To 100 numbers(i) = i Next i ' 2. Build a query to select data from the data source. Dim evenQuery = From num As Integer In numbers Where (num Mod 2 = 0) Select num ' 3. Execute the query and process the result. For Each num As Integer In evenQuery lstEvenNumbers.Items.Add(num) Next num
First, the program creates the data source: an array containing the numbers 0 through 100. In this example the data source is quite simple, but in other programs it could be much more complex. Instead of an array of numbers, it could be a list of Customer
objects, or an array of Order
objects that contain lists of OrderItem
objects.
Second, the program builds a query to select the even numbers from the list. I explain queries in more detail later, but the following list describes the key pieces of this query:
Dim
— This declares the variable that represents the query. Notice that this code doesn't include an As
keyword and a data type. In this example, the result will be an IEnumerable(Of Integer)
, but in general the results of LINQ queries can have some very strange data types. Rather than try to figure out what a query will return, most developers omit the LINQ query's data type and let Visual Basic figure out what the data type is.
Visual Basic can figure out the query's data type if Option Infer
is On
, which it is by default. Unfortunately, if Option Infer
is On
, then code such as the following is also allowed.
Dim x = 10
In this case, Visual Basic sees that you are assigning the value 10 to the variable x
so it infers that x
must be an Integer
. That's fine if you want x
to be an Integer
but what if you want it to be a Short, Long
, or Double
? In that case, you must explicitly give the variable a data type.
To avoid possible confusion, I use inferred data types only for LINQ queries and in fact turn Option Infer Off
in files that don't need it.
evenQuery
— This is the name the code is giving to the query. You can think of it as a variable that represents the result LINQ will later produce.
From num As Integer In numbers
— This means the query will select data from the numbers
array. It will use the Integer
variable num
to range over the values in the array. Because num
ranges over the values, it is called the query's range variable. (If you omit As Integer
, the compiler will implicitly figure out its data type based on what's in the array.)
Where (num Mod 2 = 0)
— This is the query's where clause. It determines which items are selected from the array. This example selects the even numbers where num mod 2
is 0
.
Select num
— This tells the query to return whatever is in the range variable num
for the values that are selected. Often, you will want to return the value of the range variable, but you could return something else such as 2 * num
or a new object created with a constructor that takes num
as a parameter.
In its third step, the code loops through the result produced by the query, adding each selected value to a ListBox
.
The following sections provide more detailed descriptions of some of the key pieces of a LINQ query: where clauses, order by clauses, and select clauses.
Probably the most common reason to use LINQ is to filter the data with a where clause. The where clause can include normal Boolean expressions that use AndAlso, OrElse, Or, >
, and other Boolean and comparison operators. It can use the range variable and any properties or methods that it provides (if it's an object). It can even perform calculations and invoke functions.
The where clause is optional. If you omit it, the query selects all of the items in its range.
For example, the PrimeNumbers example program (which is available for download on the book's web site) uses the following query. It is similar to the earlier one that selects even numbers, except this one's where clause uses the IsPrime
function to select only prime numbers. (How the IsPrime
function works isn't important to this discussion, so it isn't shown here. Download the example program to see how it works.)
Dim evenQuery = From num As Integer In numbers Where IsPrime(num) Select num
The FindCustomers example program shown in Figure 38-1 (and available in this lesson's code download on the web site) demonstrates several where clauses.
The following code shows the Customer
class used by the FindCustomers program. It includes some auto-implemented properties and an overridden ToString
method that displays the Customer
's values.
Public Class Customer Public Property FirstName As String Public Property LastName As String Public Property Balance As Decimal Public Property DueDate As Date Public Overrides Function ToString() As String Return FirstName & " " & LastName & vbTab & Balance.ToString("C") & vbTab & DueDate.ToString("d") End Function End Class
The following code shows how the FindCustomers program displays the same customer data selected with different where clauses:
Dim billDate As Date = #4/1/2011# 'Dim billDate As Date = DateTime.Today Me.Text = "FindCustomers (" & billDate.ToString("d") & ")" ' Make the customers. Dim customers() As Customer = { New Customer() With {.FirstName = "Ann", .LastName = "Ashler", .Balance = 100, .DueDate = #3/10/2011#}, New Customer() With {.FirstName = "Bob", .LastName = "Boggart", .Balance = 150, .DueDate = #2/5/2011#}, New Customer() With {.FirstName = "Cindy", .LastName = "Carruthers", .Balance = −50, .DueDate = #3/17/2011#}, New Customer() With {.FirstName = "Dan", .LastName = "Dent", .Balance = −75, .DueDate = #2/10/2011#}, New Customer() With {.FirstName = "Eve", .LastName = "Edwards", .Balance = 0, .DueDate = #2/10/2011#}, New Customer() With {.FirstName = "Frank", .LastName = "Finkle", .Balance = −95, .DueDate = #3/9/2011#}, New Customer() With {.FirstName = "Gina", .LastName = "Garra", .Balance = −10, .DueDate = #3/30/2011#} } ' Display all customers. lstAll.DataSource = customers ' Display customers with negative balances. Dim negativeQuery = From cust As Customer In customers Where (cust.Balance > 0) Select cust ' Place before the Select clause: ' Order By cust.Balance Ascending, cust.FirstName lstNegativeBalance.DataSource = negativeQuery.ToArray() ' Display customers who owe at least $50. Dim owes50Query = From cust As Customer In customers Where (cust.Balance <= −50) Select cust lstOwes50.DataSource = owes50Query.ToArray() ' Display customers who owe at least $50 ' and are overdue at least 30 days. Dim overdueQuery = From cust As Customer In customers Where (cust.Balance <= −50) AndAlso (billDate.Subtract(cust.DueDate).TotalDays > 30) Select cust lstOverdue.DataSource = overdueQuery.ToArray()
The program starts by creating a DateTime
named billDate
and setting it equal to April 1, 2011. In a real application you would probably use the current date (commented out), but this program uses a specific date so it works well with the sample data no matter when you run it. The program displays the date in its title bar (so you can compare it to the Customers
' due dates) and creates an array of Customer
objects.
Next, the code sets the lstAll
control's DataSource
property to the array so that ListBox
displays all the Customer
objects. The Customer
class's overridden ToString
method makes it display each Customer
's name, balance, and due date.
The program then executes the following LINQ query:
' Display customers with negative balances. Dim negativeQuery = From cust As Customer In customers Where (cust.Balance > 0) Select cust lstNegativeBalance.DataSource = negativeQuery.ToArray()
This query's where clause selects Customers
with Balance
properties less than 0. The query returns an IEnumerable
, but a ListBox
's DataSource
property requires an IList
or IListSource
so the program calls the result's ToArray
method to convert it into an array that the DataSource
property can handle.
After displaying this result, the program executes two other LINQ queries and displays their results similarly. The first query selects Customers
who owe at least $50. The final query selects Customers
who owe at least $50 and who have a DueDate
older than 30 days.
Often the result of a query is easier to read if you sort the selected values. You can do this by inserting an order by clause between the where clause and the select clause.
The order by clause begins with the keywords Order By
followed by one or more values separated by commas that determine how the results are ordered.
Optionally, you can follow a value by the keyword Ascending
(the default) or Descending
to specify whether the results are ordered in ascending (1-2-3 or A-B-C) or descending (3-2-1 or C-B-A) order.
For example, the following query selects Customers
with negative balances and orders them such that the smallest (most negative) values come first:
Dim negativeQuery = From cust As Customer In customers Where (cust.Balance > 0) Order By cust.Balance Ascending Select cust
The following version orders the results first by balance; then, if two customers have the same balance, by last name:
Dim negativeQuery = From cust As Customer In customers Where (cust.Balance > 0) Order By cust.Balance Ascending, cust.FirstName Select cust
The select clause determines what data is pulled from the data source and stored in the result. All of the previous examples select the data over which they are ranging. For example, the FindCustomers example program ranges over an array of Customer
objects and selects certain Customer
objects.
Instead of selecting the objects in the query's range, a program can select only some properties of those objects, a result calculated from those properties, or even completely new objects. Selecting a new kind of data from the existing data is called projecting or transforming the data.
The FindStudents example program shown in Figure 38-2 (and available in this lesson's code download on the web site) uses the following simple Student
class:
Public Class Student Public Property FirstName As String Public Property LastName As String Public Property TestScores As List(Of Integer) End Class
The program uses the following query to select all of the students' names and test averages ordered by name:
' Select all students and their test averages ordered by name. Dim allStudents = From stu As Student In students
Order By stu.LastName, stu.FirstName Select stu.FirstName & " " & stu.LastName & vbTab & stu.TestScores.Average().ToString("0.00") lstAll.DataSource = allStudents.ToArray()
This query's select statement does not select the range variable stu
. Instead it selects a string that holds the student's first and last names and the student's test score average. (Notice how the code calls the TestScore
list's Average
method to get the average of the student's test scores.) The result of the query contains Strings
instead of Students
.
The program next uses the following code to list the students who have averages of at least 60, giving them passing grades:
' Select passing students ordered by name. Dim passingStudents = From stu As Student In students Order By stu.LastName, stu.FirstName Where (stu.TestScores.Average() >= 60) Select stu.FirstName & " " & stu.LastName lstPassing.DataSource = passingStudents.ToArray()
This code again selects a string instead of a Student
object. The code that selects failing students is similar, so it isn't shown here.
The program uses the following code to select students with averages below the class average:
' Select all scores and compute a class average. Dim allAverages = From stu As Student In students Order By stu.LastName, stu.FirstName Select stu.TestScores.Average() Dim classAverage As Double = allAverages.Average() ' Display the average. Me.Text = "FindStudents: Class Average = " & classAverage.ToString("0.00") ' Select students with average below the class average ordered by average. Dim belowAverageStudents = From stu As Student In students Order By stu.TestScores.Average() Where stu.TestScores.Average() > classAverage Select New With { .Name = stu.FirstName & " " & stu.LastName, .Average = stu.TestScores.Average() } For Each info In belowAverageStudents Debug.WriteLine(TypeName(info)) lstBelowAverage.Items.Add(info.Name & vbTab & info.Average.ToString("0.00")) Next info
This snippet starts by selecting all of the students' test score averages to get a list of Doubles
. The program calls that list's Average
function to get the class average.
Next, the code queries the student data again, this time selecting students with averages below the class average.
This query demonstrates a new kind of select clause that creates a list of objects. The new objects have two properties, Name
and Average
, that are given values by the select clause.
The data type of these new objects is created automatically and isn't given an explicit name in the code, so it is known as an anonymous type.
After creating the query, the code loops through its results, using each object's Name
and Average
property to display the selected students in a ListBox
. Notice that the code doesn't give an explicit data type to the looping variable info
, so it doesn't need to figure out what data type it really has.
Objects with anonymous data types actually have a true data type, just not one that you want to have to figure out. For example, you can add the following statement inside the previous code's For Each
loop to see what data type the objects actually have:
Debug.WriteLine(TypeName(info))
If you look in the Immediate window, you'll see that these objects have the following ungainly data type:
VB$AnonymousType_0(Of String,Double)
Though you can sort of see what's going on here (note that the object contains a string and a double), you probably wouldn't want to type this mess into your code even if you could. In this case, using an inferred type makes the code a lot easier to read.
LINQ provides plenty of other features that won't fit in this lesson. It lets you:
Group results to produce output lists that contain other lists
Get only a certain number of results or get results while a certain condition is true
Skip a certain number of results or skip results while a certain condition is true
Join objects selected from multiple data sources
Use aggregate functions such as Average
(which you've already seen), Count, Min, Max
, and Sum
In Lesson 32's Try It, you built a program that used the DirectoryInfo
class's GetFiles
method to search for files matching a pattern and containing a target string. For example, the program could search the directory hierarchy starting at C:VBProjects
to find files with the .vb
extension and containing the string "DirectoryInfo."
In this Try It, you modify that program to perform the same search with LINQ. Instead of writing code to loop through the files returned by GetFiles
and examining each one, you make LINQ examine the files for you.
You can download the code and resources for this Try It from the book's web page at www.wrox.com
or www.vb-helper.com/24hourvb.html
. You can find them in the Lesson38 folder of the download.
In this lesson:
Use the DirectoryInfo
object's GetFiles
method in the query's from clause.
In the query's where clause, use the File
class's ReadAllText
method to get the file's contents. Convert it to lowercase and use Contains
to see if the file holds the target string.
To display the files, convert the query into an array and then set the ListBox
's DataSource
property equal to the result.
You cannot clear a ListBox
while it has a DataSource
, so set DataSource = Nothing
before you clear the list. (Otherwise, the program will crash if you perform more than one search.)
Copy the program you built for Lesson 32's Try It (or download Lesson 32's version from the book's web site) and modify the code to use LINQ to search for files.
Copying the program is reasonably straightforward.
To use LINQ to search for files, modify the Search button's Click
event handler so it looks like the following.
' Search for files matching the pattern ' and containing the target string. Private Sub btnSearch_Click() Handles btnSearch.Click lstFiles.DataSource = Nothing lstFiles.Items.Clear() ' Get the file pattern and target string. Dim pattern As String = cboPattern.Text Dim target As String = txtTarget.Text.ToLower()
' Search for files. Dim dirinfo As New DirectoryInfo(txtDirectory.Text) Dim fileQuery = From info As FileInfo In dirinfo.GetFiles(pattern, SearchOption.AllDirectories) Where File.ReadAllText(info.FullName). ToLower().Contains(target) Select info.FullName ' Display the result. lstFiles.DataSource = fileQuery.ToArray() End Sub
Please select Lesson 38 on the DVD to view the video that accompanies this lesson.
Build a program that lists the names of the files in a directory together with their sizes, ordered with the biggest files first.
Copy the program you built for Exercise 1 and modify it so it searches for files in the directory hierarchy starting at the specified directory.
Use LINQ to make a program that lists the perfect squares between 0 and 999.
You can find solutions to this lesson's exercises in the Lesson38 folder inside the download available on the book's web site at www.wrox.com
or www.vb-helper.com/24hourvb.html
.