Transforming Sequential Code to Parallel Code

Until recently, most Visual Basic code was written with a sequential and synchronous execution approach. Therefore, a lot of algorithms have been designed with neither concurrency nor parallelism in mind. Typically, you won't find algorithms that can be completely converted to fully parallelized and perfectly scalable code. It could happen, but it represents an ideal situation and it isn't the most common scenario.

When you have sequential code and you want to take advantage of potential parallelism to achieve better performance, you have to find hotspots. Then you can convert them to parallel code, measure speedups, identify potential scalability, and ensure that you haven't introduced new bugs while transforming the existing sequential code to parallel code.


Note
A hotspot is a part of the code that takes significant time to run. You can achieve speedups if it is split into two or more pieces running in parallel. If part of the code doesn't take significant time to run, the overhead introduced by TPL could reduce the performance improvement to worthless or even make the parallelized code run slower than the sequential version. Once you begin working with the different options offered by TPL, it is going to be easier for you to detect the hotspots in sequential code.

Detecting Hotspots

The code discussed in this section is from code file Listing01.sln and shows an example of a very simple console application that runs two sequential subroutines:

1. GenerateAESKeys—This runs a For loop to generate the number of AES keys specified by the NUM_AES_KEYS constant. It uses the GenerateKey method provided by the System.Security.Cryptography.AesManaged class. Once the key is generated, it stores the results of converting the Byte array into a hexadecimal string representation (ConvertToHexString) in the hexString local variable.
2. GenerateMD5Hashes—This runs a For loop to compute a number of hashes, using the Message-Digest algorithm 5 (MD5 algorithm), specified by the NUM_MD5_HASHES constant. It uses a number (i variable) converted to string in order to call the ComputeHash method provided by the System.Security.Cryptography.MD5 class. Once the hash is generated, it stores the results of converting the Byte array into a hexadecimal string representation (ConvertToHexString) in the hexString local variable.

The highlighted lines of code that follow are the ones added to measure the time it takes to run each subroutine, and the total elapsed time. It starts a new Stopwatch, calling its StartNew method at the beginning of each method, and then it writes the elapsed time to the Console output.

Imports System
Imports System.Text
Imports System.Security.Cryptography
’ This import will be used later to run code in parallel
Imports System.Threading.Tasks

Module Module1

    Private Const NUM_AES_KEYS As Integer = 800000
    Private Const NUM_MD5_HASHES As Integer = 900000

    Function ConvertToHexString(ByRef byteArray() As Byte) As String
        ' Convert the byte array to hexadecimal string
        Dim sb As New StringBuilder()

        For i As Integer = 0 To (byteArray.Length() - 1)
            sb.Append(byteArray(i).ToString("X2"))
        Next

        Return sb.ToString()
    End Function

    Sub GenerateAESKeys()
        Dim sw = Stopwatch.StartNew()
        Dim aesM As New AesManaged()
        For i As Integer = 1 To NUM_AES_KEYS
            aesM.GenerateKey()
            Dim result = aesM.Key
            Dim hexString = ConvertToHexString(result)
            ' Console.WriteLine(ConvertToHexString(result))
        Next
        Console.WriteLine("AES: " + sw.Elapsed.ToString())
    End Sub

    Sub GenerateMD5Hashes()
        Dim sw = Stopwatch.StartNew()
        Dim md5M As MD5 = MD5.Create()

        For i As Integer = 1 To NUM_MD5_HASHES
            Dim data = Encoding.Unicode.GetBytes(i.ToString())
            Dim result = md5M.ComputeHash(data)
            Dim hexString = ConvertToHexString(result)
            ' Console.WriteLine(ConvertToHexString(result))
        Next
        Console.WriteLine("MD5: " + sw.Elapsed.ToString())
    End Sub

    Sub Main()
        Dim sw = Stopwatch.StartNew()
        GenerateAESKeys()
        GenerateMD5Hashes()
        Console.WriteLine(sw.Elapsed.ToString())
        ' Display the results and wait for the user to press a key
        Console.ReadLine()
    End Sub
End Module

The For loop in the GenerateAESKeys subroutine doesn't use its controlled variable (i) in its code because it just controls the number of times it generates a random AES key. However, the For loop in the GenerateMD5Hashes subroutine uses its controlled variable (i) to convert it to a string. Then, it uses this string as the input data to call the method that computes its hash, as shown here:

For i As Integer = 1 To NUM_MD5_HASHES
    Dim data = Encoding.Unicode.GetBytes(i.ToString())
    Dim result = md5M.ComputeHash(data)
    Dim hexString = ConvertToHexString(result)
    'Console.WriteLine(ConvertToHexString(result))
Next

The lines of code that write the generated keys and hashes to the default console output appear commented in code file Listing01.sln because these operations would generate a bottleneck that would distort the accuracy of the time measurement.

Figure 19.3 shows the sequential execution flow for this application and the time it takes to run each of the two aforementioned subroutines in a specific computer with a dual-core microprocessor.

Figure 19.3 Sequential execution flow of two subroutines

19.3

GenerateAESKeys and GenerateMD5Hashes need approximately 14 seconds to run. The first one takes 6 seconds and the latter 8 seconds. Of course, these times will vary considerably according to the underlying hardware configuration.

There is no interaction between these two subroutines. Thus, they are completely independent from each other. As the subroutines run one after the other, in a sequential way, they aren't taking advantage of the parallel processing capabilities offered by the additional core(s). Therefore, these two subroutines represent a clear hotspot where parallelism could help to achieve a significant speedup over sequential execution. For example, it is possible to run both subroutines in parallel using Parallel.Invoke.

Measuring Speedups Achieved by Parallel Execution

Replace the Main subroutine shown in the simple console application with the following new version, launching both GenerateAESKeys and GenerateMD5Hashes in parallel, using Parallel.Invoke (code file: Snippet02.sln):

Sub Main()
    Dim sw = Stopwatch.StartNew()
    Parallel.Invoke(Sub() GenerateAESKeys(), Sub() GenerateMD5Hashes())
    Console.WriteLine(sw.Elapsed.ToString())
    ' Display the results and wait for the user to press a key
    Console.ReadLine()
End Sub

Figure 19.4 shows the parallel execution flow for the new version of this application and the time it takes to run each of the two subroutines in a specific computer with a dual-core microprocessor.

Figure 19.4 Parallel execution flow for two subroutines with a dual-core microprocessor

19.4

Now, GenerateAESKeys and GenerateMD5Hashes need approximately nine seconds to run because they take advantage of both cores offered by the microprocessor. Thus, it is possible to calculate the speedup achieved using the following formula:

Speedup = (Serial execution time)/(Parallel execution time)

In the preceding example, 14/9 = 1.56 times faster, usually expressed as a 1.56x speedup over the sequential version. GenerateAESKeys takes more time than GenerateMD5Hashes to run, nine seconds versus six seconds. However, Parallel.Invoke doesn't continue with the next line until all the delegates finish their execution. Therefore, during approximately three seconds, the application is not taking advantage of one of the cores, as shown in Figure 19.5.

Figure 19.5 Example of inefficient scheduling when using Parallel.Invoke on a dual-core microprocessor

19.5

In addition, if this application runs on a computer with a quad-core microprocessor, its speedup over the sequential version would be nearly the same, as it won't scale to take advantage of the two additional cores found in the underlying hardware.

In this section, you saw how it is possible to detect hotspots by adding some code to measure the elapsed time to run certain methods. By changing just a few lines of code, a noticeable improvement in speed was achieved. Now it is time to learn other TPL structures that can help to achieve better results and offer improved scalability when the number of available cores increases.


Note
There is no need to initialize TPL in order to begin working with its classes and methods. TPL does a lot of work under the hood and does its best to optimize its scheduling mechanisms to take advantage of the underlying hardware at runtime. However, choosing the right structure to parallelize a hotspot is a very important task.

Understanding Parallel

Now, uncomment the lines that send output to the console in both GenerateAESKeys and GenerateMD5Hashes:

Console.WriteLine(ConvertToHexString(result))

Writing to the console will generate a bottleneck for the parallel execution. However, this time, there is no need to measure accurate times. Instead, you can view the output to determine that both methods are running in parallel. The following lines show a sample console output generated by this application. The highlighted lines, the shorter hexadecimal strings, correspond to the MD5 hashes. The others represent AES keys. Each AES key takes less time to generate than each MD5 hash. Remember that the code creates 800,000 AES keys (NUM_AES_KEYS) and 900,000 MD5 hashes (NUM_MD5_HASHES). Depending on your environment, the code might take a lot of time to complete the execution.

0364DBC9A8FA3EAC793FC53AAE6D0193484087634C3033C470D96C72F89D7254
E410BCB82B36729CB7CCCCDFE30746F2DF141CC8275790360E2ED731F8C7113D
66CF85EA8FC77746A7C4A116F68D802D7167AE9E7C5FB0B6B85D44B8929386DE
0421897DCF492380BADF872205AE32D94632C60022A4E965652524D7023C59AD
C3BEF1DFFF5A9CAB11BFF8EA3F7DEFC97D91562A358DB56477AD445ACB4F1DE3
AF521D65489CA5C69517E32E652D464676E5F2487E438124DBF9ACF4157301AA
A641EB67C88A29985CFB0B2097B12CFB9296B4659E0949F20271984A3868E0B3
D7A05587DFDFD0C49BEF613F2EB78A43
90BF115C60B2DECA60C237F3D06E42EE
B3519CBA0137FD814C09371836F90322
1415C19F7F93306D35186721AF6B8DDE56427BB9AF29D22E37B34CB49E96BB49
208B73D3E6468F48B950E5F5006DDF30FE7A1B3BCC46489F7722BD98D54079D7
ACD0312DFF1BF29ECA2721DAFA9B20AB5FBDBD20E76C150C5CCE4026990C9D26
EB68C902145439F2A66514B9D89E9A958F18EE15D491014D3DCB312781F277D1
9DB8ABF087C78091F1E77AC769FF175A
F3EFB2804A969D890AFABCE17E84B26E
B342A8A253003754B752B85C67DA1560F30CD36A1AA759A0010E1F8E5045CBB5
9681656DC08F29AB1911A1CCCFBE6B468D1DF7B9D8722324E5E2BB4A314EC649
7DE56E111213655F54D6F8656238CA5E
196D194BA2B786EADD1B6852645C67C5
BA7AC6B878064E98D98336CA5DE45DEC
875DAB451CCE3B5FBD8E5091BAD1A8ED7DB2FF8C9E3EEA834C6DEA7C2467F27E
C1AA2CB88AB669317CB90CD842BF01DB26C6A655D10660AF01C37ECC7AEDA267
66E1F4F56E04FC9BFF225F68008A129D93F9B277ADAB43FF764FB87FFD098B78

Now, comment the lines that send output to the console in both GenerateAESKeys and GenerateMD5Hashes again.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset