DataRow Set Operators

As you may recall, in the LINQ to Objects API, there are a handful of Standard Query Operators that exist for the purpose of making sequence set-type comparisons. I am referring to the Distinct, Except, Intersect, Union, and SequenceEqual operators. Each of these operators performs a set operation on two sequences.

For each of these set-type operators, determining sequence element equality is necessary to perform the appropriate set operation. These operators perform element comparisons by calling the GetHashCode and Equals methods on the elements. For a DataRow, this results in a reference comparison, which is not the desired behavior. This will result in the incorrect determination of element equality, thereby causing the operators to return erroneous results. Because of this, each of these operators has an additional prototype that I omitted in the LINQ to Objects chapters. This additional prototype allows an IEqualityComparer object to be provided as an argument. Conveniently, a comparer object has been provided for us specifically for these versions of the operators, System.Data.DataRowComparer.Default. This comparer class is in the System.Data namespace in the System.Data.Entity.dll assembly. This comparer determines element equality by comparing the number of columns and the static data type of each column, and using the IComparable interface on the column's dynamic data type if that type implements the interface; otherwise, it calls the System.Object's static Equals method.

Each of these additional operator prototypes is defined in the System.Linq.Enumerable static class just as the other prototypes of these operators are.

In this section, I provide some examples to illustrate the incorrect and, more importantly, correct way to make these sequence comparisons when working with DataSet objects.

Distinct

The Distinct operator removes duplicate rows from a sequence of objects. It returns an object that when enumerated, enumerates a source sequence of objects and returns a sequence of objects with the duplicate rows removed. Typically, this operator determines duplicates by calling each element's data type's GetHashCode and Equals methods. However, for DataRow type objects, this would cause an incorrect result.

Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.Default comparer object, the element equality will be properly determined. With it, a row is deemed to be a duplicate by comparing DataRow objects using the number of columns in a row and the static data type of each column, and then using the IComparable interface on each column if its dynamic data type implements the IComparable interface, or calling the static Equals method in System.Object if it does not.

Prototypes

The Distinct operator has one prototype I will cover.

Example. The Distinct Prototype
public static IEnumerable<T> Distinct<T> (
  this IEnumerable<T> source,
  IEqualityComparer<T> comparer);

Examples

In the first example, I create a DataTable from an array of Student objects using my common GetDataTable method, and the array will have one duplicate in it. The record whose Id is equal to 1 is repeated in the array. I then display the DataTable. This proves that the record is in the DataTable twice. Then I remove any duplicate rows by calling the Distinct operator, and display the DataTable again, showing that the duplicate row has been removed. Listing 10-1 shows the code.

Example. The Distinct Operator with an Equality Comparer
Student[] students = {
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 6, Name = "Ulyses Hutchens" },
  new Student { Id = 19, Name = "Bob Tanko" },
  new Student { Id = 45, Name = "Erin Doutensal" },
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 12, Name = "Bob Mapplethorpe" },
  new Student { Id = 17, Name = "Anthony Adams" },

new Student { Id = 32, Name = "Dignan Stephens" }
};

DataTable dt = GetDataTable(students);

Console.WriteLine("{0}Before calling Distinct(){0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt, 15);

foreach (DataRow dataRow in dt.Rows)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

IEnumerable<DataRow> distinct =
  dt.AsEnumerable().Distinct(DataRowComparer.Default);

Console.WriteLine("{0}After calling Distinct(){0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt, 15);

foreach (DataRow dataRow in distinct)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

Notice that I use the AsEnumerable operator to get a sequence of DataRow objects from the DataTable because that is what I must call the Distinct operator on. Also notice that in the students array, the record with an Id equal to 1 is repeated.

You no doubt noticed that I call a method named Field on the DataRow object. For now, just understand that this is a helper method that makes obtaining a DataColumn object's value from a DataRow more convenient. I cover the Field<T> operator in depth later in the "DataRow Field Operators" section of this chapter.

Here are the results:

Before calling Distinct()

Id             Name
==============================
1              Joe Rattz
6              Ulyses Hutchens
19             Bob Tanko
45             Erin Doutensal
1              Joe Rattz
12             Bob Mapplethorpe
17             Anthony Adams
32             Dignan Stephens

After calling Distinct()

Id             Name
==============================
1              Joe Rattz
6              Ulyses Hutchens
19             Bob Tanko
45             Erin Doutensal
12             Bob Mapplethorpe
17             Anthony Adams
32             Dignan Stephens

Notice that in the results, before I call the Distinct operator, the record whose Id is 1 is repeated, and that after calling the Distinct operator, the second occurrence of that record has been removed.

For a second example, I am going to demonstrate the results if I had called the Distinct operator without specifying the comparer object. The code is shown in Listing 10-2.

Example. The Distinct Operator Without an Equality Comparer
Student[] students = {
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 6, Name = "Ulyses Hutchens" },
  new Student { Id = 19, Name = "Bob Tanko" },
  new Student { Id = 45, Name = "Erin Doutensal" },
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 12, Name = "Bob Mapplethorpe" },
  new Student { Id = 17, Name = "Anthony Adams" },
  new Student { Id = 32, Name = "Dignan Stephens" }
};

DataTable dt = GetDataTable(students);

Console.WriteLine("{0}Before calling Distinct(){0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt, 15);

foreach (DataRow dataRow in dt.Rows)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

IEnumerable<DataRow> distinct = dt.AsEnumerable().Distinct();

Console.WriteLine("{0}After calling Distinct(){0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt, 15);

foreach (DataRow dataRow in distinct)
{

Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

The only difference between this code and the previous example is that the call to the Distinct operator does not have an equality comparer provided. Will it remove the duplicate row? Let's take a look:

Before calling Distinct()

Id             Name
==============================
1              Joe Rattz
6              Ulyses Hutchens
19             Bob Tanko
45             Erin Doutensal
1              Joe Rattz
12             Bob Mapplethorpe
17             Anthony Adams
32             Dignan Stephens

After calling Distinct()

Id             Name
==============================
1              Joe Rattz
6              Ulyses Hutchens
19             Bob Tanko
45             Erin Doutensal
1              Joe Rattz
12             Bob Mapplethorpe
17             Anthony Adams
32             Dignan Stephens

No, it did not remove the duplicate. As you can now see, these two examples are comparing rows differently.

Except

The Except operator produces a sequence of DataRow objects that are in the first sequence of DataRow objects that do not exist in the second sequence of DataRow objects. The operator returns an object that, when enumerated, enumerates the first sequence of DataRow objects collecting the unique elements, followed by enumerating the second sequence of DataRow objects, removing those elements from the collection that also occur in the first sequence. Lastly, it yields the remaining elements in the collection in the order they are collected.

To determine that elements from the same sequence are unique, and that one element in one sequence is or is not equal to an element in the other sequence, the operator must be able to determine whether two elements are equal. Typically, this operator determines element equality by calling each element's data type's GetHashCode and Equals methods. However, for DataRow type objects, this would cause an incorrect result.

Because I am going to call the additional prototype and provide the System.Data.DataRowComparer. Default comparer object, the element equality will be properly determined. With it, a row is deemed to be a duplicate by comparing DataRow objects using the number of columns in a row and the static data type of each column, and then using the IComparable interface on each column if its dynamic data type implements the IComparable interface, or calling the static Equals method in System.Object if it does not.

Prototypes

The Except operator has one prototype I will cover.

Example. The Except Prototype
public static IEnumerable<T> Except<T> (
  this IEnumerable<T> first,
  IEnumerable<T> second,
  IEqualityComparer<T> comparer);

Examples

In this example, I call the Except operator twice. The first time, I pass the System.Data.DataRowComparer.Default comparer object, so the results of the first query with the Except operator should be correct. The second time I call the Except operator I will not pass the comparer object. This causes the results of that query to be incorrect. Listing 10-3 shows the code.

Example. The Except Operator with and Without the Comparer Object
Student[] students = {
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 7, Name = "Anthony Adams" },
  new Student { Id = 13, Name = "Stacy Sinclair" },
  new Student { Id = 72, Name = "Dignan Stephens" }
};

Student[] students2 = {
  new Student { Id = 5, Name = "Abe Henry" },
  new Student { Id = 7, Name = "Anthony Adams" },
  new Student { Id = 29, Name = "Future Man" },
  new Student { Id = 72, Name = "Dignan Stephens" }
};

DataTable dt1 = GetDataTable(students);
IEnumerable<DataRow> seq1 = dt1.AsEnumerable();
DataTable dt2 = GetDataTable(students2);
IEnumerable<DataRow> seq2 = dt2.AsEnumerable();

IEnumerable<DataRow> except =
  seq1.Except(seq2, System.Data.DataRowComparer.Default);

Console.WriteLine("{0}Results of Except() with comparer{0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt1, 15);

foreach (DataRow dataRow in except)
{
  Console.WriteLine("{0,-15}{1,-15}",

dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

except = seq1.Except(seq2);

Console.WriteLine("{0}Results of Except() without comparer{0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt1, 15);

foreach (DataRow dataRow in except)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

There isn't much to this example. I basically create two DataTable objects that are populated from the Student arrays. I create sequences from each DataTable object by calling the AsEnumerable method. I then call the Except operator on the two sequences and display the results of each. As you can see, the first time I call the Except operator, I pass the System.Data.DataRowComparer.Default comparer object. The second time I do not.

Let's look at the results of that code by pressing Ctrl+F5:

Results of Except() with comparer

Id             Name
==============================
1              Joe Rattz
13             Stacy Sinclair

Results of Except() without comparer

Id             Name
==============================
1              Joe Rattz
7              Anthony Adams
13             Stacy Sinclair
72             Dignan Stephens

As you can see, the Except operator called with the System.Data.DataRowComparer.Default comparer object is able to properly determine the element equality for the two sequences, whereas the Except operator without the comparer object does not identify any elements from the two sequences as being equal, which is not the desired behavior for this operator.

Intersect

The Intersect operator produces a sequence of DataRow objects that is the intersection of two sequences of DataRow objects. It returns an object that when enumerated enumerates the first sequence of DataRow objects collecting the unique elements, followed by enumerating the second sequence of DataRow objects, marking those elements occurring in both sequences. Lastly, it yields the marked elements in the order they are collected.

To determine that elements from the same sequence are unique, and that one element in one sequence is or is not equal to an element in the other sequence, the operator must be able to determine whether two elements are equal. Typically, this operator determines element equality by calling each element's data type's GetHashCode and Equals methods. However, for DataRow type objects, this would cause an incorrect result.

Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.Default comparer object, the element equality will be properly determined. With it, a row is deemed to be a duplicate by comparing DataRow objects using the number of columns in a row and the static data type of each column, and then using the IComparable interface on each column if its dynamic data type implements the IComparable interface, or calling the static Equals method in System.Object if it does not.

Prototypes

The Intersect operator has one prototype I will cover.

Example. The Intersect Prototype
public static IEnumerable<T> Intersect<T> (
  this IEnumerable<T> first,
  IEnumerable<T> second,
  IEqualityComparer<T> comparer);

Examples

In this example, I use the same basic code I use in the Except example, except I will change the operator calls from Except to Intersect. Listing 10-4 shows that code.

Example. The Intersect Operator with and Without the Comparer Object
Student[] students = {
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 7, Name = "Anthony Adams" },
  new Student { Id = 13, Name = "Stacy Sinclair" },
  new Student { Id = 72, Name = "Dignan Stephens" }
};

Student[] students2 = {
  new Student { Id = 5, Name = "Abe Henry" },
  new Student { Id = 7, Name = "Anthony Adams" },
  new Student { Id = 29, Name = "Future Man" },
  new Student { Id = 72, Name = "Dignan Stephens" }
};

DataTable dt1 = GetDataTable(students);
IEnumerable<DataRow> seq1 = dt1.AsEnumerable();
DataTable dt2 = GetDataTable(students2);
IEnumerable<DataRow> seq2 = dt2.AsEnumerable();

IEnumerable<DataRow> intersect =
  seq1.Intersect(seq2, System.Data.DataRowComparer.Default);

Console.WriteLine("{0}Results of Intersect() with comparer{0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt1, 15);

foreach (DataRow dataRow in intersect)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

intersect = seq1.Intersect(seq2);

Console.WriteLine("{0}Results of Intersect() without comparer{0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt1, 15);

foreach (DataRow dataRow in intersect)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

There is nothing new here. I create a couple of DataTable objects from the two Student arrays and obtain sequences from them. I then call the Intersect operator first with the comparer object and then without. I display the results after each Intersect call. Let's look at the results of that code by pressing Ctrl+F5:

Results of Intersect() with comparer

Id             Name
==============================
7              Anthony Adams
72             Dignan Stephens

Results of Intersect() without comparer

Id             Name
==============================

As you can see, the Intersect operator with the comparer is able to properly determine the element equality from the two sequences, whereas the Intersect operator without the comparer did not identify any elements from the two sequences as being equal, which is not the desired behavior for this operator.

Union

The Union operator produces a sequence of DataRow objects that is the union of two sequences of DataRow objects. It returns an object that, when enumerated, enumerates the first sequence of DataRow objects, followed by the second sequence of DataRow objects, yielding any element that has not already been yielded.

To determine that elements have already been yielded, the operator must be able to determine whether two elements are equal. Typically, this operator determines element equality by calling each element's data type's GetHashCode and Equals methods. However, for DataRow type objects, this would cause an incorrect result.

Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.Default comparer object, the element equality will be properly determined. With it, a row is deemed to be a duplicate by comparing DataRow objects using the number of columns in a row and the static data type of each column, and then using the IComparable interface on each column if its dynamic data type implements the IComparable interface, or calling the static Equals method in System.Object if it does not.

Prototypes

The Union operator has one prototype I will cover.

Example. The Union Prototype
public static IEnumerable<T> Union<T> (
  this IEnumerable<T> first,
  IEnumerable<T> second,
  IEqualityComparer<T> comparer);

Examples

In this example, I use the same basic code I use in the Intersect example, except I will change the operator calls from Intersect to Union. Listing 10-5 shows that code.

Example. The Union Operator with and Without the Comparer Object
Student[] students = {
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 7, Name = "Anthony Adams" },
  new Student { Id = 13, Name = "Stacy Sinclair" },
  new Student { Id = 72, Name = "Dignan Stephens" }
};

Student[] students2 = {
  new Student { Id = 5, Name = "Abe Henry" },
  new Student { Id = 7, Name = "Anthony Adams" },
  new Student { Id = 29, Name = "Future Man" },
  new Student { Id = 72, Name = "Dignan Stephens" }
};

DataTable dt1 = GetDataTable(students);
IEnumerable<DataRow> seq1 = dt1.AsEnumerable();
DataTable dt2 = GetDataTable(students2);
IEnumerable<DataRow> seq2 = dt2.AsEnumerable();

IEnumerable<DataRow> union =
  seq1.Union(seq2, System.Data.DataRowComparer.Default);

Console.WriteLine("{0}Results of Union() with comparer{0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt1, 15);

foreach (DataRow dataRow in union)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

union = seq1.Union(seq2);

Console.WriteLine("{0}Results of Union() without comparer{0}",
  System.Environment.NewLine);

OutputDataTableHeader(dt1, 15);

foreach (DataRow dataRow in union)
{
  Console.WriteLine("{0,-15}{1,-15}",
    dataRow.Field<int>(0),
    dataRow.Field<string>(1));
}

Again, there is nothing new here. I create a couple of DataTable objects from the two Student arrays and obtain sequences from them. I then call the Union operator first with the comparer object and then without. I display the results after each Union call. Here are the results:

Results of Union() with comparer

Id             Name
==============================
1              Joe Rattz
7              Anthony Adams
13             Stacy Sinclair
72             Dignan Stephens
5              Abe Henry
29             Future Man

Results of Union() without comparer

Id             Name
==============================
1              Joe Rattz
7              Anthony Adams
13             Stacy Sinclair
72             Dignan Stephens
5              Abe Henry
7              Anthony Adams
29             Future Man
72             Dignan Stephens

Notice that the results of the Union operator with the comparer object are correct, but the results of the Union operator without the comparer object are not.

SequenceEqual

The SequenceEqual operator compares two sequences of DataRow objects to determine whether they are equal. It enumerates two source sequences, comparing the corresponding DataRow objects. If the two source sequences have the same number of records, and if all the corresponding DataRow objects are equal, true is returned. Otherwise, the two sequences are not equal, and false is returned.

This operator must be able to determine whether two elements are equal. Typically, this operator determines element equality by calling each element's data type's GetHashCode and Equals methods. However, for DataRow type objects, this would cause an incorrect result.

Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.Default comparer object, the element equality will be properly determined. With it, a row is deemed to be a duplicate by comparing DataRow objects using the number of columns in a row and the static data type of each column, and then using the IComparable interface on each column if its dynamic data type implements the IComparable interface, or calling the static Equals method in System.Object if it does not.

Prototypes

The SequenceEqual operator has one prototype I will cover.

Example. The SequenceEqual Prototype
public static bool SequenceEqual<T> (
  this IEnumerable<T> first,
  IEnumerable<T> second,
  IEqualityComparer<T> comparer);

Examples

In this example of the SequenceEqual operator, I build two identical sequences of DataRow objects and compare them first with the SequenceEqual operator with a comparer object followed by a comparison with the SequenceEqual operator without a comparer object. Because of the way equality comparisons are handled by the two different operator calls, the SequenceEqual operator call with the comparer object returns that the two sequences are equal, while the SequenceEqual operator call without the comparer object returns that the two sequences are not equal. Listing 10-6 shows the code.

Example. The SequenceEqual Operator with and Without a Comparer Object
Student[] students = {
  new Student { Id = 1, Name = "Joe Rattz" },
  new Student { Id = 7, Name = "Anthony Adams" },
  new Student { Id = 13, Name = "Stacy Sinclair" },
  new Student { Id = 72, Name = "Dignan Stephens" }
};

DataTable dt1 = GetDataTable(students);
IEnumerable<DataRow> seq1 = dt1.AsEnumerable();
DataTable dt2 = GetDataTable(students);
IEnumerable<DataRow> seq2 = dt2.AsEnumerable();

bool equal = seq1.SequenceEqual(seq2, System.Data.DataRowComparer.Default);
Console.WriteLine("SequenceEqual() with comparer : {0}", equal);

equal = seq1.SequenceEqual(seq2);
Console.WriteLine("SequenceEqual() without comparer : {0}", equal);

Not much to discuss here except that the first call should indicate that the two sequences are equal, while the second should indicate that they are not. The results are exactly as expected:

SequenceEqual() with comparer : True
SequenceEqual() without comparer : False

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset