Chapter 38. Manipulating Data with LINQ to Objects

Visual Basic provides many tools for storing data. You can store data in individual variables, arrays, and collections. Generic collection classes even let you store data in a strongly typed way — so, for example, you know that a List(Of Student) contains only Student objects and nothing else.

However, those storage methods by themselves don't give you an easy way to manipulate their contents. For example, suppose you have a List(Of Student) and you want to find the entry for the student with the highest average test score; or you want to find all students who are missing a test score; or you want to sort the students by midterm scores in ascending order.

None of these operations is very difficult. You can loop through the list looking for the items you want. Sorting is a little trickier depending on how you want to sort the items, but in any case you can accomplish all of these operations with a dozen or two lines of code.

Language-Integrated Query (LINQ) provides another method for manipulating these sorts of data collections. Instead of just giving you a way to search collections, however, LINQ enables a program to access data stored in databases, arrays, collections, or files in basically the same way.

LINQ provides four basic technologies that give you access to data stored in various places:

  • LINQ to SQL — Data stored in SQL Server databases

  • LINQ to Dataset — Data stored in other databases

  • LINQ to XML — Data stored in XML (Extensible Markup Language) files

  • LINQ to Objects — Data stored in collections, lists, arrays, strings, files, and so forth

In this lesson you learn how to use LINQ to Objects. You learn how to extract data from lists, collections, and arrays, and how to process the results.

Note

The other forms of LINQ are even more specialized than the topics that I want to cover in this part of the book, so I won't discuss them here. After you learn about LINQ to Objects, you will be able to learn about the others on your own. Microsoft's "Language-Integrated Query (LINQ)" page at msdn.microsoft.com/library/bb397926.aspx is a good starting point for learning more about LINQ.

LINQ BASICS

Using LINQ to process data takes three steps:

  1. Create a data source.

  2. Build a query to select data from the data source.

  3. Execute the query and process the result.

You might expect the third step to be two separate steps, "Execute the query" and "Process the result." In practice, however, LINQ doesn't actually execute the query until it must — when the program tries to access the results. This is called deferred execution.

The EvenNumbers example program (which is available for download on the book's web site) uses these steps in the following code that displays the even numbers between 0 and 99:

' 1. Create the data source.
Dim numbers(0 To 100) As Integer
For i As Integer = 0 To 100
    numbers(i) = i
Next i

' 2. Build a query to select data from the data source.
Dim evenQuery =
    From num As Integer In numbers
    Where (num Mod 2 = 0)
    Select num

' 3. Execute the query and process the result.
For Each num As Integer In evenQuery
    lstEvenNumbers.Items.Add(num)
Next num

First, the program creates the data source: an array containing the numbers 0 through 100. In this example the data source is quite simple, but in other programs it could be much more complex. Instead of an array of numbers, it could be a list of Customer objects, or an array of Order objects that contain lists of OrderItem objects.

Second, the program builds a query to select the even numbers from the list. I explain queries in more detail later, but the following list describes the key pieces of this query:

  • Dim — This declares the variable that represents the query. Notice that this code doesn't include an As keyword and a data type. In this example, the result will be an IEnumerable(Of Integer), but in general the results of LINQ queries can have some very strange data types. Rather than try to figure out what a query will return, most developers omit the LINQ query's data type and let Visual Basic figure out what the data type is.

    Note

    Visual Basic can figure out the query's data type if Option Infer is On, which it is by default. Unfortunately, if Option Infer is On, then code such as the following is also allowed.

    Dim x = 10

    In this case, Visual Basic sees that you are assigning the value 10 to the variable x so it infers that x must be an Integer. That's fine if you want x to be an Integer but what if you want it to be a Short, Long, or Double? In that case, you must explicitly give the variable a data type.

    To avoid possible confusion, I use inferred data types only for LINQ queries and in fact turn Option Infer Off in files that don't need it.

  • evenQuery — This is the name the code is giving to the query. You can think of it as a variable that represents the result LINQ will later produce.

  • From num As Integer In numbers — This means the query will select data from the numbers array. It will use the Integer variable num to range over the values in the array. Because num ranges over the values, it is called the query's range variable. (If you omit As Integer, the compiler will implicitly figure out its data type based on what's in the array.)

  • Where (num Mod 2 = 0) — This is the query's where clause. It determines which items are selected from the array. This example selects the even numbers where num mod 2 is 0.

  • Select num — This tells the query to return whatever is in the range variable num for the values that are selected. Often, you will want to return the value of the range variable, but you could return something else such as 2 * num or a new object created with a constructor that takes num as a parameter.

In its third step, the code loops through the result produced by the query, adding each selected value to a ListBox.

The following sections provide more detailed descriptions of some of the key pieces of a LINQ query: where clauses, order by clauses, and select clauses.

WHERE CLAUSES

Probably the most common reason to use LINQ is to filter the data with a where clause. The where clause can include normal Boolean expressions that use AndAlso, OrElse, Or, >, and other Boolean and comparison operators. It can use the range variable and any properties or methods that it provides (if it's an object). It can even perform calculations and invoke functions.

Note

The where clause is optional. If you omit it, the query selects all of the items in its range.

For example, the PrimeNumbers example program (which is available for download on the book's web site) uses the following query. It is similar to the earlier one that selects even numbers, except this one's where clause uses the IsPrime function to select only prime numbers. (How the IsPrime function works isn't important to this discussion, so it isn't shown here. Download the example program to see how it works.)

Dim evenQuery =
    From num As Integer In numbers
    Where IsPrime(num)
    Select num

The FindCustomers example program shown in Figure 38-1 (and available in this lesson's code download on the web site) demonstrates several where clauses.

Figure 38-1

Figure 38.1. Figure 38-1

The following code shows the Customer class used by the FindCustomers program. It includes some auto-implemented properties and an overridden ToString method that displays the Customer's values.

Public Class Customer
    Public Property FirstName As String
    Public Property LastName As String
    Public Property Balance As Decimal
    Public Property DueDate As Date

    Public Overrides Function ToString() As String
        Return FirstName & " " & LastName & vbTab &
            Balance.ToString("C") & vbTab & DueDate.ToString("d")
    End Function
End Class

The following code shows how the FindCustomers program displays the same customer data selected with different where clauses:

Dim billDate As Date = #4/1/2011#
'Dim billDate As Date = DateTime.Today

Me.Text = "FindCustomers (" & billDate.ToString("d") & ")"

' Make the customers.
Dim customers() As Customer =
{
    New Customer() With {.FirstName = "Ann", .LastName = "Ashler",
        .Balance = 100, .DueDate = #3/10/2011#},
    New Customer() With {.FirstName = "Bob", .LastName = "Boggart",
        .Balance = 150, .DueDate = #2/5/2011#},
    New Customer() With {.FirstName = "Cindy", .LastName = "Carruthers",
        .Balance = −50, .DueDate = #3/17/2011#},
    New Customer() With {.FirstName = "Dan", .LastName = "Dent",
        .Balance = −75, .DueDate = #2/10/2011#},
    New Customer() With {.FirstName = "Eve", .LastName = "Edwards",
        .Balance = 0, .DueDate = #2/10/2011#},
    New Customer() With {.FirstName = "Frank", .LastName = "Finkle",
        .Balance = −95, .DueDate = #3/9/2011#},
    New Customer() With {.FirstName = "Gina", .LastName = "Garra",
        .Balance = −10, .DueDate = #3/30/2011#}
}

' Display all customers.
lstAll.DataSource = customers

' Display customers with negative balances.
Dim negativeQuery =
    From cust As Customer In customers
    Where (cust.Balance > 0)
    Select cust
' Place before the Select clause:
'   Order By cust.Balance Ascending, cust.FirstName
lstNegativeBalance.DataSource = negativeQuery.ToArray()

' Display customers who owe at least $50.
Dim owes50Query =
    From cust As Customer In customers
    Where (cust.Balance <= −50)
    Select cust
lstOwes50.DataSource = owes50Query.ToArray()

' Display customers who owe at least $50
' and are overdue at least 30 days.
Dim overdueQuery =
    From cust As Customer In customers
    Where (cust.Balance <= −50) AndAlso
          (billDate.Subtract(cust.DueDate).TotalDays > 30)
    Select cust
lstOverdue.DataSource = overdueQuery.ToArray()

The program starts by creating a DateTime named billDate and setting it equal to April 1, 2011. In a real application you would probably use the current date (commented out), but this program uses a specific date so it works well with the sample data no matter when you run it. The program displays the date in its title bar (so you can compare it to the Customers' due dates) and creates an array of Customer objects.

Next, the code sets the lstAll control's DataSource property to the array so that ListBox displays all the Customer objects. The Customer class's overridden ToString method makes it display each Customer's name, balance, and due date.

The program then executes the following LINQ query:

' Display customers with negative balances.
Dim negativeQuery =
    From cust As Customer In customers
    Where (cust.Balance > 0)
    Select cust
lstNegativeBalance.DataSource = negativeQuery.ToArray()

This query's where clause selects Customers with Balance properties less than 0. The query returns an IEnumerable, but a ListBox's DataSource property requires an IList or IListSource so the program calls the result's ToArray method to convert it into an array that the DataSource property can handle.

After displaying this result, the program executes two other LINQ queries and displays their results similarly. The first query selects Customers who owe at least $50. The final query selects Customers who owe at least $50 and who have a DueDate older than 30 days.

ORDER BY CLAUSES

Often the result of a query is easier to read if you sort the selected values. You can do this by inserting an order by clause between the where clause and the select clause.

The order by clause begins with the keywords Order By followed by one or more values separated by commas that determine how the results are ordered.

Optionally, you can follow a value by the keyword Ascending (the default) or Descending to specify whether the results are ordered in ascending (1-2-3 or A-B-C) or descending (3-2-1 or C-B-A) order.

For example, the following query selects Customers with negative balances and orders them such that the smallest (most negative) values come first:

Dim negativeQuery =
    From cust As Customer In customers
    Where (cust.Balance > 0)
    Order By cust.Balance Ascending
    Select cust

The following version orders the results first by balance; then, if two customers have the same balance, by last name:

Dim negativeQuery =
    From cust As Customer In customers
    Where (cust.Balance > 0)
    Order By cust.Balance Ascending, cust.FirstName
    Select cust

SELECT CLAUSES

The select clause determines what data is pulled from the data source and stored in the result. All of the previous examples select the data over which they are ranging. For example, the FindCustomers example program ranges over an array of Customer objects and selects certain Customer objects.

Instead of selecting the objects in the query's range, a program can select only some properties of those objects, a result calculated from those properties, or even completely new objects. Selecting a new kind of data from the existing data is called projecting or transforming the data.

The FindStudents example program shown in Figure 38-2 (and available in this lesson's code download on the web site) uses the following simple Student class:

Public Class Student
    Public Property FirstName As String
    Public Property LastName As String
    Public Property TestScores As List(Of Integer)
End Class
Figure 38-2

Figure 38.2. Figure 38-2

The program uses the following query to select all of the students' names and test averages ordered by name:

' Select all students and their test averages ordered by name.
Dim allStudents =
    From stu As Student In students
Order By stu.LastName, stu.FirstName
    Select stu.FirstName & " " & stu.LastName & vbTab &
        stu.TestScores.Average().ToString("0.00")
lstAll.DataSource = allStudents.ToArray()

This query's select statement does not select the range variable stu. Instead it selects a string that holds the student's first and last names and the student's test score average. (Notice how the code calls the TestScore list's Average method to get the average of the student's test scores.) The result of the query contains Strings instead of Students.

The program next uses the following code to list the students who have averages of at least 60, giving them passing grades:

' Select passing students ordered by name.
Dim passingStudents =
    From stu As Student In students
    Order By stu.LastName, stu.FirstName
    Where (stu.TestScores.Average() >= 60)
    Select stu.FirstName & " " & stu.LastName
lstPassing.DataSource = passingStudents.ToArray()

This code again selects a string instead of a Student object. The code that selects failing students is similar, so it isn't shown here.

The program uses the following code to select students with averages below the class average:

' Select all scores and compute a class average.
Dim allAverages =
    From stu As Student In students
    Order By stu.LastName, stu.FirstName
    Select stu.TestScores.Average()
Dim classAverage As Double = allAverages.Average()

' Display the average.
Me.Text = "FindStudents: Class Average = " & classAverage.ToString("0.00")

' Select students with average below the class average ordered by average.
Dim belowAverageStudents =
    From stu As Student In students
    Order By stu.TestScores.Average()
    Where stu.TestScores.Average() > classAverage
    Select New With
    {
        .Name = stu.FirstName & " " & stu.LastName,
        .Average = stu.TestScores.Average()
    }
For Each info In belowAverageStudents
    Debug.WriteLine(TypeName(info))
    lstBelowAverage.Items.Add(info.Name & vbTab & info.Average.ToString("0.00"))
Next info

This snippet starts by selecting all of the students' test score averages to get a list of Doubles. The program calls that list's Average function to get the class average.

Next, the code queries the student data again, this time selecting students with averages below the class average.

This query demonstrates a new kind of select clause that creates a list of objects. The new objects have two properties, Name and Average, that are given values by the select clause.

The data type of these new objects is created automatically and isn't given an explicit name in the code, so it is known as an anonymous type.

After creating the query, the code loops through its results, using each object's Name and Average property to display the selected students in a ListBox. Notice that the code doesn't give an explicit data type to the looping variable info, so it doesn't need to figure out what data type it really has.

Note

Objects with anonymous data types actually have a true data type, just not one that you want to have to figure out. For example, you can add the following statement inside the previous code's For Each loop to see what data type the objects actually have:

Debug.WriteLine(TypeName(info))

If you look in the Immediate window, you'll see that these objects have the following ungainly data type:

VB$AnonymousType_0(Of String,Double)

Though you can sort of see what's going on here (note that the object contains a string and a double), you probably wouldn't want to type this mess into your code even if you could. In this case, using an inferred type makes the code a lot easier to read.

LINQ provides plenty of other features that won't fit in this lesson. It lets you:

  • Group results to produce output lists that contain other lists

  • Get only a certain number of results or get results while a certain condition is true

  • Skip a certain number of results or skip results while a certain condition is true

  • Join objects selected from multiple data sources

  • Use aggregate functions such as Average (which you've already seen), Count, Min, Max, and Sum

TRY IT

In Lesson 32's Try It, you built a program that used the DirectoryInfo class's GetFiles method to search for files matching a pattern and containing a target string. For example, the program could search the directory hierarchy starting at C:VBProjects to find files with the .vb extension and containing the string "DirectoryInfo."

In this Try It, you modify that program to perform the same search with LINQ. Instead of writing code to loop through the files returned by GetFiles and examining each one, you make LINQ examine the files for you.

Note

You can download the code and resources for this Try It from the book's web page at www.wrox.com or www.vb-helper.com/24hourvb.html. You can find them in the Lesson38 folder of the download.

Lesson Requirements

In this lesson:

  • Copy the program you built for Lesson 32's Try It (or download Lesson 32's version from the book's web site) and modify the code to use LINQ to search for files.

Hints

  • Use the DirectoryInfo object's GetFiles method in the query's from clause.

  • In the query's where clause, use the File class's ReadAllText method to get the file's contents. Convert it to lowercase and use Contains to see if the file holds the target string.

  • To display the files, convert the query into an array and then set the ListBox's DataSource property equal to the result.

  • You cannot clear a ListBox while it has a DataSource, so set DataSource = Nothing before you clear the list. (Otherwise, the program will crash if you perform more than one search.)

Step-by-Step

  • Copy the program you built for Lesson 32's Try It (or download Lesson 32's version from the book's web site) and modify the code to use LINQ to search for files.

    1. Copying the program is reasonably straightforward.

    2. To use LINQ to search for files, modify the Search button's Click event handler so it looks like the following.

      ' Search for files matching the pattern
      ' and containing the target string.
      Private Sub btnSearch_Click() Handles btnSearch.Click
          lstFiles.DataSource = Nothing
          lstFiles.Items.Clear()
      
          ' Get the file pattern and target string.
          Dim pattern As String = cboPattern.Text
          Dim target As String = txtTarget.Text.ToLower()
      ' Search for files.
          Dim dirinfo As New DirectoryInfo(txtDirectory.Text)
          Dim fileQuery =
              From info As FileInfo
              In dirinfo.GetFiles(pattern, SearchOption.AllDirectories)
              Where File.ReadAllText(info.FullName).
                  ToLower().Contains(target)
              Select info.FullName
      
          ' Display the result.
          lstFiles.DataSource = fileQuery.ToArray()
      End Sub

Note

Please select Lesson 38 on the DVD to view the video that accompanies this lesson.

EXERCISES

  1. Build a program that lists the names of the files in a directory together with their sizes, ordered with the biggest files first.

  2. Copy the program you built for Exercise 1 and modify it so it searches for files in the directory hierarchy starting at the specified directory.

  3. Use LINQ to make a program that lists the perfect squares between 0 and 999.

    Note

    You can find solutions to this lesson's exercises in the Lesson38 folder inside the download available on the book's web site at www.wrox.com or www.vb-helper.com/24hourvb.html.

EXERCISES
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset