Parallelizing Loops

Both GenerateAESKeys and GenerateMD5Hashes represent an opportunity to run iterations in parallel. They generate the input data to simplify the example and perform the same operation for each piece. Thus, it represents a data parallelism scenario. It is possible to refactor the loops to run the operations in parallel. This way, instead of running both subroutines in parallel, each one can take full advantage of parallelism and automatically scale according to the number of existing logical cores.

Parallel.For

You can think of refactoring an existing For loop to take advantage of parallelism as a simple replacement of For with Parallel.For. Unfortunately, it isn't as simple as that.

The following code snippets refactor the subroutines shown in the preceding section, showing the code for both the original loops and the new code with the refactored loops using the imperative syntax to implement the data parallelism offered by Parallel.For. The new methods, ParallelGenerateAESKeys and ParallelGenerateMD5Hashes, try to take advantage of all the cores available, relying on the work done under the hood by Parallel.For to optimize its behavior according to the existing hardware at runtime.

1. The original GenerateAESKeys subroutine with the sequential For loop, and its parallelized version
  • Original sequential For version (code file: Listing02.sln):
 Sub GenerateAESKeys()
     Dim sw = Stopwatch.StartNew()
     Dim aesM As New AesManaged()
     For i As Integer = 1 To NUM_AES_KEYS
         aesM.GenerateKey()
         Dim result = aesM.Key
         Dim hexString = ConvertToHexString(result)
         ' Console.WriteLine(ConvertToHexString(result))
     Next
     Console.WriteLine("AES: " + sw.Elapsed.ToString())
 End Sub
  • Parallelized version using Parallel.For (code file: Listing03.sln):
 Sub ParallelGenerateAESKeys()
     Dim sw = Stopwatch.StartNew()
     Parallel.For(1, NUM_AES_KEYS + 1, Sub(i As Integer)
                           Dim aesM As New AesManaged()
                           aesM.GenerateKey()
                           Dim result = aesM.Key
                           Dim hexString = ConvertToHexString(result)
                           ' Console.WriteLine(ConvertToHexString(result))
                       End Sub)
     Console.WriteLine("AES: " + sw.Elapsed.ToString())
 End Sub
2. The original GenerateMD5Hashes subroutine with the sequential For loop, and its parallelized version
  • Original sequential For version (code file: Listing02.sln):
 Sub GenerateMD5Hashes()
     Dim sw = Stopwatch.StartNew()
     Dim md5M As MD5 = MD5.Create()
     For i As Integer = 1 To NUM_MD5_HASHES
         Dim data = Encoding.Unicode.GetBytes(i.ToString())
         Dim result = md5M.ComputeHash(data)
         Dim hexString = ConvertToHexString(result)
         ' Console.WriteLine(ConvertToHexString(result))
     Next
     Console.WriteLine("MD5: " + sw.Elapsed.ToString())
 End Sub
  • Parallelized version using Parallel.For (code file: Listing03.sln):
Sub ParallelGenerateMD5Hashes()
    Dim sw = Stopwatch.StartNew()
    Parallel.For(1, NUM_MD5_HASHES + 1, Sub(i As Integer)
                    Dim md5M As MD5 = MD5.Create()
                    Dim data = Encoding.Unicode.GetBytes(i.ToString())
                    Dim result = md5M.ComputeHash(data)
                    Dim hexString = ConvertToHexString(result)
                    ' Console.WriteLine(ConvertToHexString(result))
                End Sub)
    Console.WriteLine("MD5: " + sw.Elapsed.ToString())
End Sub

The most basic version of the class function Parallel.For has the following parameters:

  • fromInclusive—The first number for the iteration range (Integer or Long).
  • toExclusive—The number before which the iteration will stop, an exclusive upper bound (Integer or Long). The iteration range will be from fromInclusive up to toExclusive- 1. It is very important to pay attention to this parameter because the classic For loop defines the iteration range using an inclusive upper bound. Thus, when converting a For loop to a Parallel.For loop, the original upper bound has to be converted to an upper bound minus 1.
  • body—The delegate to be invoked, once per iteration, and without a predefined execution plan. It can be of the type Action(Of Integer) or Action(Of Long)depending on the type used in the iteration range definition.

Note
Parallel.For supports neither floating-point values nor steps. It works with Integer and Long values and it runs adding 1 in each iteration. In addition, it partitions the iteration range according to the available hardware resources at run time and runs the body in parallel tasks. Thus, there are no guarantees made about the order in which the iterations are executed. For example, in an iteration from 1 to 101 – 1 (100 inclusive), the iteration number 50 could begin running before the iteration number 2, which could also be executing in parallel, because the time it takes to run each iteration is unknown and variable. Because the loop could be split into many parallel iterations, it's impossible to predict the execution order. The code has to be prepared for parallel execution, and it must avoid undesired side effects generated by parallel and concurrent executions.

In addition, Parallel.For can return a ParallelLoopResult value because parallelized loops, like any parallelized code, are more complex than sequential loops. Because execution is not sequential, you cannot access a variable to determine where the loop stopped its execution. In fact, many chunks are running in parallel.

Refactoring an Existing Sequential Loop

The code discussed in this section is from code file Listing03.sln. The code in the previous section showed the original GenerateAESKey subroutine with the sequential For loop. It is a good practice to create a new subroutine, function, or method with a different name when refactoring sequential code to create a parallelized version. In this case, ParallelGenerateAESKeys is the new subroutine.

The original For loop's iteration range definition is as follows:

For i As Integer = 1 To NUM_AES_KEYS

This means that it will run the loop body NUM_AES_KEYS times, from 1 (inclusive) to NUM_AES_KEYS (inclusive).

It is necessary to translate this definition to a Parallel.For, adding 1 to NUM_AES_KEYS because it is an exclusive upper bound:

Parallel.For(1, NUM_AES_KEYS + 1,

The third parameter is the delegate. In this case, this loop doesn't use the iteration variable. However, the code uses multiline lambda expression syntax to define a subroutine with an Integer parameter (i) that is going to work as the iteration variable, holding the current number:

Parallel.For(1, NUM_AES_KEYS + 1, Sub(i As Integer)

An End Sub) replaces the previous Next statement.

The preceding code was prepared to run alone, or perhaps with other methods running in parallel. However, each iteration was not designed to run in parallel with other iterations of the same loop body. Using Parallel.For changes the rules. The code has some problems that need to be solved. The sequential iterations shared the following three local variables:

1. aesM
2. result()
3. hexString

The loop body has code that changes the values of these variables in each iteration—for example, the following lines:

aesM.GenerateKey()
Dim result = aesM.Key
Dim hexString = ConvertToHexString(result)

First, the key generated by calling the GenerateKey method of the AesManaged instance, stored in aesM, is held in the Key property. Then, the code assigns the value stored in this property to the result variable. Finally, the last line assigns the product of converting it to a hexadecimal string to hexString, the third local variable. It is really difficult to imagine the results of running this code in parallel or concurrently, because it could result in a very large mess. For example, one part of the code could generate a new key, which would be stored in the aesM.Key property that was going to be read in another part of the code running in parallel. Therefore, the value read from the aesM.Key property is corrupted.

One possible solution could be using synchronization structures to protect each value and state that is changing. However, that's not appropriate in this case, because it would add more code and more synchronization overhead. There is another solution that is more scalable: refactoring the loop body, transferring these local variables as local variables inside the subroutine acting as a delegate. In order to do this, it is also necessary to create an instance of AesManaged inside the loop body. This way, it is not going to be shared by all the parallel iterations. This change adds more instructions to run for each iteration, but it removes the undesirable side effects and creates safe and stateless parallel code. The following lines show the new body. The highlighted lines of code are the variables moved inside the delegate:

Sub(i As Integer)
    Dim aesM As New AesManaged()
    
    aesM.GenerateKey()
    Dim result = aesM.Key
    Dim hexString = ConvertToHexString(result)
    ' Console.WriteLine(ConvertToHexString(result))
End Sub)

A very similar problem has to be solved in order to transform the original loop body found in GenerateMD5Hashes. The code in file Listing02.sln showed the original subroutine with the sequential For loop. In this case, ParallelGenerateMD5Hashes is the new subroutine. It was necessary to use the same aforementioned refactoring technique, because it is not known whether the MD5 instance holds internal states that could generate problems. It is safer to create a new independent instance for each iteration. The following lines show the new body. The highlighted lines of code are the variables moved inside the delegate:

Sub(i As Integer)
    Dim md5M As MD5 = MD5.Create()
    Dim data = Encoding.Unicode.GetBytes(i.ToString())
    Dim result = md5M.ComputeHash(data)
    Dim hexString = ConvertToHexString(result)
    ' Console.WriteLine(ConvertToHexString(result))
End Sub)

Measuring Scalability

Replace the Main subroutine with the following new version, launching first ParallelGenerateAESKeys and then ParallelGenerateMD5Hashes (code file: Listing03.sln):

Sub Main()
    Dim sw = Stopwatch.StartNew()
    ParallelGenerateAESKeys()
    ParallelGenerateMD5Hashes()
    Console.WriteLine(sw.Elapsed.ToString())
    ' Display the results and wait for the user to press a key
    Console.ReadLine()
End Sub

Now, ParallelGenerateAESKeys and ParallelGenerateMD5Hashes need approximately 7.5 seconds to run because each one takes full advantage of both cores offered by the microprocessor. Thus, the speedup achieved is 14 / 7.5 = 1.87x over the sequential version. It is better than the previous performance gain achieved using Parallel.Invoke (1.56x), because the time wasted in that version is now used to run the loops, using parallel chunks in an attempt to load-balance the work done by each core. ParallelGenerateAESKeys takes 2.2 seconds and ParallelGenerateMD5Hashes takes 5.3 seconds.

Using Parallel.For to parallelize this code has another advantage: The same code can scale when executed with more than two cores. The sequential version of this application running on a computer with a specific quad-core microprocessor needs approximately 11 seconds to run. It is necessary to measure the time needed to run the sequential version again, because each hardware configuration will provide different results with both sequential and parallel code.

In order to measure the achieved speedup, you will always need a baseline calculated on the same hardware configuration. The version optimized using Parallel.For needs approximately 4.1 seconds to run. Each subroutine takes full advantage of the four cores offered by the microprocessor. Thus, the speedup achieved is 11 / 4.1 = 2.68x over the sequential version. ParallelGenerateAESKeys takes 1.30 seconds and ParallelGenerateMD5Hashes takes 2.80 seconds.


Note
The parallelized code is capable of scaling as the number of cores increases. That didn't happen with the Parallel.Invoke version. However, it doesn't mean that the parallelized code will offer a linear speedup. In fact, most of the time, there is a limit to the scalability — that is, once it reaches a certain number of cores, the parallelized algorithms won't achieve additional speedup.
In this case, it was necessary to change the code for the loop's body used in each iteration. Thus, there is an additional overhead in each iteration that wasn't part of each sequential iteration, and calling delegates is more expensive than calling direct methods. In addition, Parallel.For and its underlying work adds additional overhead to distribute and coordinate the execution of different chunks with parallel iterations. This is why the speedup is not near 4x and is approximately 2.68x when running with four cores. Typically, the parallelized algorithms won't offer a linear speedup. Furthermore, serial and hardware architecture-related bottlenecks can make it very difficult to scale beyond a certain number of cores.
It is very important to measure speedup in order to determine whether the overhead added to parallelize the code brings present and potentially future (further scalability) performance benefits.

The diagram shown in Figure 19.6 represents one of the possible execution flows, taking advantage of the four cores. Each box shown inside a method represents a chunk, automatically created by Parallel.For at run time.

Figure 19.6 Representation of the execution flows that take advantage of four cores with Parallel.For

19.6

Parallel.ForEach

Sometimes, refactoring an existing For loop, as previously explained, can be a very complex task, and the changes to the code could generate too much overhead for each iteration, reducing the overall performance. Another useful alternative is to partition all the data to be processed into parts that can be run as smaller loops in parallel, defining a custom partitioner, a tailored mechanism to split the input data into specific pieces that overrides the default partitioning mechanism. It is possible to use a Parallel.ForEach loop with a custom partitioner in order to create new versions of the sequential loops with a simpler refactoring process.

The code in file Listing05.sln shows the new code with the refactored loops using the imperative syntax to implement data parallelism offered by Parallel.ForEach, combined with a sequential For loop and a custom partitioner created with System.Collections.Concurrent.Partitioner. The new methods, ParallelPartitionGenerateAESKeys and ParallelPartitionGenerateMD5Hashes, also try to take advantage of all the cores available, relying on the work done under the hood by Parallel.ForEach and the range partitioning performed to distribute smaller sequential loops inside as many parallel loops as available cores. The code also optimizes its behavior according to the existing hardware at run time.

The code uses another important namespace for TPL, the new System.Collections.Concurrent namespace. This namespace offers access to useful collections prepared for concurrency and custom partitioners introduced for the first time in .NET Framework 4. Therefore, it is a good idea to import this namespace by using Imports System.Collections.Concurrent to work with the new examples (code file: Listing05.sln):

Sub ParallelPartitionGenerateAESKeys()
    Dim sw = Stopwatch.StartNew()
    Parallel.ForEach(Partitioner.Create(1, NUM_AES_KEYS + 1),
             Sub(range)
                 Dim aesM As New AesManaged()
                 Debug.WriteLine("Range ({0}, {1}. Time: {2})",
                 range.Item1, range.Item2, Now().TimeOfDay)
                 For i As Integer = range.Item1 To range.Item2 - 1
                     aesM.GenerateKey()
                     Dim result = aesM.Key
                     Dim hexString = ConvertToHexString(result)
                     ' Console.WriteLine("AES: " +
                     '     ConvertToHexString(result))                 Next
             End Sub)
    Console.WriteLine("AES: " + sw.Elapsed.ToString())
End Sub

Sub ParallelPartitionGenerateMD5Hashes()
    Dim sw = Stopwatch.StartNew()
    Parallel.ForEach(Partitioner.Create(1, NUM_MD5_HASHES + 1),
             Sub(range)
                 Dim md5M As MD5 = MD5.Create()
                 For i As Integer = range.Item1 To range.Item2 - 1
                     Dim data = Encoding.Unicode.GetBytes(i.ToString())
                     Dim result = md5M.ComputeHash(data)
                     Dim hexString = ConvertToHexString(result)
                     ' Console.WriteLine(ConvertToHexString(result))
                 Next
             End Sub)
    Console.WriteLine("MD5: " + sw.Elapsed.ToString())
End Sub

The class function Parallel.ForEach offers 20 overrides. The definition used in this code file has the following parameters:

  • source—The partitioner that provides the data source split into multiple partitions.
  • body—The delegate to be invoked, once per iteration, and without a predefined execution plan. It receives each defined partition as a parameter—in this case, Tuple(Of Integer, Integer).

In addition, Parallel.ForEach can return a ParallelLoopResult value. The information offered in this structure is covered in detail later in this chapter.

Working with Partitions in a Parallel Loop

The code in file Listing03.sln showed the original GenerateAESKey subroutine with the sequential For loop. The highlighted lines of code shown in file Listing05.sln represent the same sequential For loop. The only line that changes is the For definition, which takes into account the lower bound and the upper bound of the partition assigned by range.Item1 and range.Item2:

For i As Integer = range.Item1 To range.Item2 - 1

In this case, it is easier to refactor the sequential loop because there is no need to move local variables. The only difference is that instead of working with the entire source data, it splits it into many independent and potentially parallel partitions. Each one works with a sequential inner loop.

The following call to the Partitioner.Create method defines the partitions as the first parameter for Parallel.ForEach:

Partitioner.Create(1, NUM_AES_KEYS + 1)

This line splits the range from 1 to NUM_AES_KEYS into many partitions with an upper bound and a lower bound, creating a Tuple(Of Integer, Integer). However, it doesn't specify the number of partitions to create. ParallelPartitionGenerateAESKeys includes a line to write the lower and upper bounds of each generated partition and the actual time when it starts to run the sequential loop for this range.

Debug.WriteLine("Range ({0}, {1}. Time: {2})", 
                range.Item1, range.Item2, Now().TimeOfDay)    

Replace the Main subroutine with the following new version, launching first ParallelPartitionGenerateAESKeys and then ParallelParallelGenerateMD5Hashes (code file: Listing05.sln):

Sub Main()
    Dim sw = Stopwatch.StartNew()
    ParallelPartitionGenerateAESKeys()
    ParallelPartitionGenerateMD5Hashes()
    Console.WriteLine(sw.Elapsed.ToString())
    ' Display the results and wait for the user to press a key
    Console.ReadLine()
End Sub

As shown in the following lines, the partitioner creates 13 ranges. Thus, the Parallel.ForEach will run 13 sequential inner For loops with ranges. However, they don't start at the same time, because that wouldn't be a good idea with four cores available. The parallelized loop tries to load-balance the execution, taking into account the available hardware resources. The highlighted line shows the complexity added by both parallelism and concurrency. If you take into account the time, the first partition that reaches the sequential inner For loop is (66667, 133333) and not (1, 66667). Remember that the upper bound values shown in the following output are exclusive.

Range (133333, 199999. Time: 15:45:38.2205775)
Range (66667, 133333. Time: 15:45:38.2049775)
Range (266665, 333331. Time: 15:45:38.2361775)
Range (199999, 266665. Time: 15:45:38.2205775)
Range (1, 66667. Time: 15:45:38.2205775)
Range (333331, 399997. Time: 15:45:39.0317789)
Range (399997, 466663. Time: 15:45:39.0317789)
Range (466663, 533329. Time: 15:45:39.1097790)
Range (533329, 599995. Time: 15:45:39.2345793)
Range (599995, 666661. Time: 15:45:39.3281794)
Range (666661, 733327. Time: 15:45:39.9365805)
Range (733327, 799993. Time: 15:45:40.0145806)
Range (799993, 800001. Time: 15:45:40.1705809)

In addition, the order in which the data appears in the debug output is different because there are many concurrent calls to WriteLine. In fact, when measuring speedups, it is very important to comment these lines before the loop begins, because they affect the overall time by generating a bottleneck.

This new version using Parallel.ForEach with custom partitions needs approximately the same time as the previous Parallel.For version to run.

Optimizing Partitions According to Number of Cores

It is possible to tune the generated partitions in order to match them with the number of logical cores found at run time. System.Environment.ProcessorCount offers the number of logical cores or logical processors detected by the operating system. Hence, it is possible to use this value to calculate the desired range size for each partition and use it as a third parameter for the call to Partitioner.Create, using the following formula:

((numberOfElements / numberOfLogicalCores) + 1)

ParallelPartitionGenerateAESKeys can use the following code to create the partitions:

Partitioner.Create(0, 
    NUM_AES_KEYS, 
    (CInt(NUM_AES_KEYS / Environment.ProcessorCount) + 1))

A very similar line can also help to improve ParallelPartitionGenerateMD5Hashes:

Partitioner.Create(1, 
    NUM_MD5_HASHES, 
    (CInt(NUM_MD5_HASHES / Environment.ProcessorCount) + 1))

As shown in the following lines, now the partitioner creates four ranges because the desired range size is CInt((800000 / 4) + 1) = 200001. Thus, the Parallel.ForEach will run four sequential inner For loops with ranges, according to the number of available logical cores.

Range (1, 200002. Time: 16:32:51.3754528)
Range (600004, 800000. Time: 16:32:51.3754528)
Range (400003, 600004. Time: 16:32:51.3754528)
Range (200002, 400003. Time: 16:32:51.3754528)

Now, ParallelPartitionGenerateAESKeys and ParallelPartitionGenerateMD5Hashes need approximately 3.40 seconds to run, because each one generates as many partitions as cores available and uses a sequential loop in each delegate; therefore, it reduces the previously added overhead. Thus, the speedup achieved is 11 / 3.4 = 3.23x over the sequential version. The reduced overhead makes it possible to reduce the time from 4.1 seconds to 3.4 seconds.


Note
Most of the time, the load-balancing schemes used by TPL under the hood are very efficient. However, you know your designs, code, and algorithms better than TPL at run time. Therefore, considering the capabilities offered by modern hardware architectures and using many of the features included in TPL, you can improve overall performance, reducing unnecessary overhead introduced by the first loop parallelization without the custom partitioner.

The diagram shown in Figure 19.7 represents one of the possible execution flows with the numbers for the lower and upper bounds for each partition, taking advantage of the four cores with the optimized partitioning scheme.

Figure 19.7 Lower and upper bounds for each partition when executed on four cores

19.7

Working with IEnumerable Sources of Data

Parallel.ForEach is also useful to refactor existing For Each loops that iterate over a collection that exposes an IEnumerable interface. The simplest definition of the class function Parallel.ForEach, used in the following code (code file: Listing08.sln) to generate a new version of the MD5 hashes generation subroutine, ParallelForEachGenerateMD5Hashes, has the following parameters:

  • source—The collection that exposes an IEnumerable interface and provides the data source.
  • body—The delegate to be invoked, once per iteration, and without a predefined execution plan. It receives each element of the source collection—in this case, an Integer.
Private Function GenerateMD5InputData() As IEnumerable(Of Integer)
    Return Enumerable.Range(1, NUM_AES_KEYS)
End Function

Sub ParallelForEachGenerateMD5Hashes()
    Dim sw = Stopwatch.StartNew()
    Dim inputData = GenerateMD5InputData()

    Parallel.ForEach(inputData, Sub(number As Integer)
                     Dim md5M As MD5 = MD5.Create()
                     Dim data = 
                       Encoding.Unicode.GetBytes(number.ToString())
                     result = md5M.ComputeHash(data)
                     hexString = ConvertToHexString(result)
                     ' Console.WriteLine(ConvertToHexString(result))
                 End Sub)
    Debug.WriteLine("MD5: " + sw.Elapsed.ToString())
End Sub

The GenerateMD5InputData function returns a sequence of Integer numbers from 1 to NUM_AES_KEYS (inclusive). Instead of using the loop to control the numbers for the iteration, the code in the ParallelForEachGenerateMD5Hashes subroutine saves this sequence in the inputData local variable.

The following line calls Parallel.ForEach with the source (inputData) and a multiline lambda delegate subroutine, receiving the number for each iteration:

Parallel.ForEach(inputData, Sub(number As Integer)

The line that prepares the input data for the hash computing method also changes to use the value found in number:

Dim data = Encoding.Unicode.GetBytes(number.ToString())

Note
In this case, performance isn't really good compared with the other versions. However, when each iteration performs time-consuming operations, it would improve performance with an IEnumerable collection. It should be obvious that this isn't an optimal implementation, because the code has to iterate the 900,000 items of a sequence. It does it in parallel, but it takes more time than running loops with less overhead. It also consumes more memory. The example isn't intended to be a best practice for this case. The idea is to understand the different opportunities offered by the Parallel class methods and to be able to evaluate them.

Exiting from Parallel Loops

If you want to interrupt a sequential loop, you can use Exit For or Exit For Each. When working with parallel loops, it requires more complex code, because exiting the delegate body sub or function doesn't have any effect on the parallel loop's execution, as it is the one that's being called on each new iteration. In addition, because it is a delegate, it is disconnected from the traditional loop structure.

The following code (code file: Listing09.sln) shows a new version of the ParallelForEachGenerateMD5Hashes subroutine, called ParallelForEachGenerateMD5HashesBreak. Now, the loopResult local variable saves the result of calling the Parallel.ForEach class function. Moreover, the delegate body subroutine receives a second parameter—a ParallelLoopState instance:

    Dim loopResult = Parallel.ForEach(
        inputData, 
        Sub(number As Integer, loopState As ParallelLoopState)
Private Sub DisplayParallelLoopResult(
    ByVal loopResult As ParallelLoopResult)
    Dim text As String
    If loopResult.IsCompleted Then
        text = "The loop ran to completion."
    Else
        If loopResult.LowestBreakIteration.HasValue = False Then
            text = "The loop ended prematurely with a Stop statement."
        Else
            text = "The loop ended by calling the Break statement."
        End If
    End If
    Console.WriteLine(text)
End Sub

Sub ParallelForEachGenerateMD5HashesBreak()
    Dim sw = Stopwatch.StartNew()
    Dim inputData = GenerateMD5InputData()

    Dim loopResult = Parallel.ForEach(
        inputData, 
        Sub(number As Integer, 
            loopState As ParallelLoopState)
             'If loopState.ShouldExitCurrentIteration Then
             '    Exit Sub
             'End If
             Dim md5M As MD5 = MD5.Create()
             Dim data = Encoding.Unicode.GetBytes(number.ToString())
             Dim result = md5M.ComputeHash(data)
             Dim hexString = ConvertToHexString(result)
             If (sw.Elapsed.Seconds > 3) Then
                 loopState.Break()
                 Exit Sub
             End If
             ' Console.WriteLine(ConvertToHexString(result))
         End Sub)
    DisplayParallelLoopResult(loopResult)
    Console.WriteLine("MD5: " + sw.Elapsed.ToString())
End Sub

Private Function GenerateMD5InputData() As IEnumerable(Of Integer)
    Return Enumerable.Range(1, NUM_AES_KEYS)
End Function

Understanding ParallelLoopState

The instance of ParallelLoopState (loopState) offers two methods to cease the execution of a Parallel.For or Parallel.ForEach:

1. Break—Communicates that the parallel loop should cease the execution beyond the current iteration, as soon as possible
2. Stop—Communicates that the parallel loop should cease the execution as soon as possible

Note
Using these methods doesn't guarantee that the execution will stop as soon as possible, because parallel loops are complex, and sometimes it is difficult to cease the execution of all the parallel and concurrent iterations. The difference between Break and Stop is that the former tries to cease execution once the current iteration is finished, whereas the latter tries to cease it immediately.

The code shown previously from file Listing09.sln calls the Break method if the elapsed time is more than 3 seconds:

If (sw.Elapsed.Seconds > 3) Then
    loopState.Break()
    Exit Sub
End If

It is very important to note that the code in the multiline lambda is accessing the sw variable that is defined in ParallelForEachGenerateMD5HashesBreak. It reads the value of the Seconds read-only property.

It is also possible to check the value of the ShouldExitCurrentIteration read-only property in order to make decisions when the current or other concurrent iterations make requests to stop the parallel loop execution. The code line in code file Listing 09.sln shows a few commented lines that check whether ShouldExitConcurrentIteration is True:

If loopState.ShouldExitCurrentIteration Then
   Exit Sub
End If

If the property is true, then it exits the subroutine, avoiding the execution of unnecessary iterations. The lines are commented because in this case an additional iteration isn't a problem; therefore, it isn't necessary to add this additional instruction to each iteration.

Analyzing the Results of a Parallel Loop Execution

Once the Parallel.ForEach finishes its execution, loopResult has information about the results, in a ParallelLoopResult structure.

The DisplayParallelLoopResult subroutine shown in the code file Listing09.sln receives a ParallelLoopResult structure, evaluates its read-only properties, and outputs the results of executing the Parallel.ForEach loop to the console. Table 19.1 explains the three possible results of in this example.

Table 19.1 ParallelLoopResult Read-only Properties

Condition Description
IsCompleted = True The loop ran to completion.
IsCompleted = False And LowesBreakIteration.HasValue = False The loop ended prematurely with a Stop statement.
IsCompleted = False And LowesBreakIteration.HasValue = True The loop ended by calling the Break statement. The LowestBreakIteration property holds the value of the lowest iteration that called the Break statement.

Note
It is very important to analyze the results of a parallel loop execution, because continuation with the next statement doesn't mean that it completed all the iterations. Thus, it is necessary to check the values of the ParallelLoopResult properties or to include customized control mechanisms inside the loop bodies. Again, converting sequential code to parallel and concurrent code isn't just replacing a few loops. It is necessary to understand a very different programming paradigm and new structures prepared for this new scenario.

Catching Parallel Loop Exceptions

As many iterations run in parallel, many exceptions can occur in parallel. The classic exception management techniques used in sequential code aren't useful with parallel loops.

When the code inside the delegate that is being called in each parallelized iteration throws an exception that isn't captured inside the delegate, it becomes part of a set of exceptions, handled by the new System.AggregateException class.

You have already learned how to handle exceptions in your sequential code in Chapter 6: “Exception Handling and Debugging.” You can apply almost the same techniques. The only difference is when an exception is thrown inside the loop body, which is a delegate. The following code (code file: Listing10.sln) shows a new version of the ParallelForEachGenerateMD5Hashes subroutine, called ParallelForEachGenerateMD5HashesException. Now, the body throws a TimeOutException if the elapsed time is more than three seconds:

If (sw.Elapsed.Seconds > 3) Then
    Throw New TimeoutException(
        "Parallel.ForEach is taking more than 3 seconds to complete.")
End If
Sub ParallelForEachGenerateMD5HashesExceptions()
    Dim sw = Stopwatch.StartNew()
    Dim inputData = GenerateMD5InputData()
    Dim loopResult As ParallelLoopResult

    Try
        loopResult = Parallel.ForEach(inputData, 
            Sub(number As Integer, loopState As ParallelLoopState)
                  'If loopState.ShouldExitCurrentIteration Then
                  '    Exit Sub
                  'End If
                  Dim md5M As MD5 = MD5.Create()
                  Dim data = Encoding.Unicode.GetBytes(number.ToString())
                  Dim result = md5M.ComputeHash(data)
                  Dim hexString = ConvertToHexString(result)
                  If (sw.Elapsed.Seconds > 3) Then
                      Throw New TimeoutException(
"Parallel.ForEach is taking more than 3 seconds to complete.")
                  End If
                  ' Console.WriteLine(ConvertToHexString(result))
              End Sub)
    Catch ex As AggregateException
        For Each innerEx As Exception In ex.InnerExceptions
            Debug.WriteLine(innerEx.ToString())
            ' Do something considering the innerEx Exception
        Next
    End Try
    DisplayParallelLoopResult(loopResult)
    Debug.WriteLine("MD5: " + sw.Elapsed.ToString())
End Sub

A TryCatchEnd Try block encloses the call to Parallel.ForEach. Nevertheless, the line that catches the exceptions is

Catch ex As AggregateException

instead of the classic

Catch ex As Exception

An AggregateException contains one or more exceptions that occurred during the execution of parallel and concurrent code. However, this class isn't specifically for parallel computing; it can be used to represent one or more errors that occur during application execution. Therefore, once it is captured, it is possible to iterate through each individual exception contained in the InnerExceptions read-only collection of Exception. In this case, the Parallel.ForEach without the custom partitioner will display the contents of many exceptions. The loop result will look like it was stopped using the Stop keyword. However, as it is possible to catch the AggregateException, you can make decisions based on the problems that made it impossible to complete all the iterations. In this case, a sequential For Each loop retrieves all the information about each Exception in InnerExceptions. The following code (code file: Listing11.sln) shows the information about the first two exceptions converted to a string and sent to the Debug output:

Catch ex As AggregateException
    For Each innerEx As Exception In ex.InnerExceptions
        Debug.WriteLine(innerEx.ToString())
        ' Do something considering the innerEx Exception
    Next
End Try

The output looks like this:

System.TimeoutException: Parallel.ForEach is taking
more than 3 seconds to complete.
   at ConsoleApplication3.Module1.
_Closure$____2._Lambda$____9(Int32 number, 
ParallelLoopState loopState) in 
C:UsersPublicDocumentsConsoleApplication3
ConsoleApplication3Module1.vb:line 255
   at System.Threading.Tasks.Parallel.<>c____DisplayClass32`2.
<PartitionerForEachWorker>b____30()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c____DisplayClass7.
<ExecuteSelfReplicating>b____6(Object )
System.TimeoutException: Parallel.ForEach is taking
more than 3 seconds to complete.
   at ConsoleApplication3.Module1._Closure$____2.
_Lambda$____9(Int32 number, ParallelLoopState loopState) in 
C:UsersPublicDocumentsConsoleApplication3
ConsoleApplication3Module1.vb:line 255
   at System.Threading.Tasks.Parallel.<>c____DisplayClass32`2.
<PartitionerForEachWorker>b____30()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c____DisplayClass7.
<ExecuteSelfReplicating>b____6(Object )

Note
As you can see in the previous lines, the two exceptions display the same information to the Debug output. However, most of the time you will use a more sophisticated exception management technique, and you will provide more information about the iteration that is generating the problem. This example focuses on the differences between an AggregateException and the traditional Exception. It doesn't promote the practice of writing information about errors to the Debug output as a complete exception management technique.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset