Specifying the Desired Degree of Parallelism

TPL methods always try to achieve the best results using all the available logical cores. Sometimes, however, you don't want to use all the available cores in a parallel loop, either because you have specific needs, and therefore better plans for the remaining available cores, or you want to leave one core free to create a responsive application and the remaining core can help you run another part of code in parallel. In these cases, you want to specify the maximum degree of parallelism for a parallel loop.

ParallelOptions

TPL enables you to specify a different maximum desired degree of parallelism by creating an instance of the new ParallelOptions class and changing the value of its MaxDegreeOfParallelism property. The code in file Listing 12.sln shows a new version of the two well-known subroutines that use Parallel.For, ParallelGenerateAESKeysMaxDegree, and ParallelGenerateMD5HashesMaxDegree.

Now, they receive an Integer with the maximum desired degree of parallelism, maxDegree. Each subroutine creates a local instance of ParallelOptions and assigns the value received as a parameter to its MaxDegreeOfParallelism property, which is a new parameter for each parallel loop before the body. This way, the loop won't be optimized to take advantage of all the available cores (MaxDegreeOfParallelism = -1). Instead, it will be optimized as if the total number of available cores were equal to the maximum degree of parallelism specified in the property (code file: Listing12.sln):

Private Sub ParallelGenerateAESKeysMaxDegree(ByVal maxDegree As Integer)
    Dim parallelOptions As New ParallelOptions()
    parallelOptions.MaxDegreeOfParallelism = maxDegree
    Dim sw = Stopwatch.StartNew()
    Parallel.For(1,
                 NUM_AES_KEYS + 1,
                 parallelOptions,
        Sub(i As Integer)
                  Dim aesM As New AesManaged()
                  aesM.GenerateKey()
                  Dim result = aesM.Key
                  Dim hexString = ConvertToHexString(result)
                  ' Console.WriteLine(ConvertToHexString(result))
              End Sub)
    Console.WriteLine("AES: " + sw.Elapsed.ToString())
End Sub

Sub ParallelGenerateMD5HashesMaxDegree(ByVal maxDegree As Integer)
    Dim parallelOptions As New ParallelOptions
    parallelOptions.MaxDegreeOfParallelism = maxDegree
    Dim sw = Stopwatch.StartNew()
    Parallel.For(1,
                 NUM_MD5_HASHES + 1,
                 parallelOptions,
        Sub(i As Integer)
                    Dim md5M As MD5 = MD5.Create()
                    Dim data = Encoding.Unicode.GetBytes(i.ToString())
                    Dim result = md5M.ComputeHash(data)
                    Dim hexString = ConvertToHexString(result)
                    ' Console.WriteLine(ConvertToHexString(result))
                End Sub)
    Console.WriteLine("MD5: " + sw.Elapsed.ToString())
End Sub

Note
It is not convenient to work with static values for the desired degree of parallelism because it can limit scalability when more cores are available. These options should be used carefully; it is best to work with relative values according to the number of available logical cores, or consider this number in order to prepare the code for further scalability.

This way, it is possible to call both subroutines with a dynamic value, considering the number of logical cores at runtime:

        ParallelGenerateAESKeysMaxDegree(Environment.ProcessorCount - 1)
        ParallelGenerateMD5HashesMaxDegree(Environment.ProcessorCount - 1)

Both Parallel.For loops are going to try to work with the number of logical cores minus 1. If the code runs with a quad-core microprocessor, then it will use just three cores.

The following is not a best practice for final code. However, sometimes you want to know whether two parallelized subroutines offer better performance if they are executed at the same time, limiting the number of cores for each one. You can test this situation using the following line (code file: Listing12.sln):

Parallel.Invoke(
    Sub() ParallelGenerateAESKeysMaxDegree(2), 
    Sub() ParallelGenerateAESKeysMaxDegree(2))

The two subroutines will be launched in parallel, and each will try to optimize its execution to use two of the four cores of a quad-core microprocessor. The obvious drawback of the previous line is that it uses a static number of cores. Nonetheless, this is just for performance testing purposes.

ParallelOptions also offers two additional properties to control more advanced options:

1. CancellationToken—Allows assigning a new System.Threading.CancellationToken instance in order to propagate notification that parallel operations should be canceled. The usage of this property is covered in detail later in this chapter
2. TaskScheduler—Allows assigning a customized System.Threading.Tasks.TaskScheduler instance. It is usually not necessary to define a customized task scheduler to schedule parallel tasks unless you are working with very specific algorithms.

Understanding Hardware Threads and Logical Cores

The Environment.ProcessorCount property provides the number of logical cores. However, sometimes the number of logical cores, also known as hardware threads, is different from the number of physical cores.

For example, an Intel Core i7 microprocessor with six physical cores offering HyperThreading technology doubles the number to twelve logical cores. Therefore, in this case, Environment.ProcessorCount is twelve, not six. The operating system also works with twelve logical processors.

All the code created with TPL runs using multiple software threads. Threads are the low-level lanes used to run many parts of code in parallel, taking advantage of the presence of multiple cores in the underlying hardware. However, most of the time, the code running in these lanes has some imperfections. It waits for I/O data or other threads to finish, or it causes latency as it waits for data to be fetched from the different caches available in the microprocessor or the system memory. This means that there are idle execution units.

HyperThreading technology offers an increased degree of instruction-level parallelism, by duplicating the architectural states for each physical core in order to mitigate the imperfections of the parallel code by starting the execution of a second thread when the first one is waiting. This way, it appears to be a microprocessor with two times the real number of physical cores.


Note
Logical cores are not the same as real physical cores. Although this technique sometimes improves performance through increased instruction-level parallelism when each physical core has two threads with independent instruction streams, if the software threads don't have many data dependencies, the performance improvements could be less than expected. It depends on the application.

As TPL uses the number of hardware threads, or logical cores, to optimize its execution, sometimes certain algorithms won't offer the expected scalability as more cores appear because they aren't real physical cores.

For example, if an algorithm offered a 6.5x speedup when executed with eight physical cores, it would offer a more reticent 4.5x speedup with a microprocessor with four physical cores and eight logical cores with HyperThreading technology.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset