Parallel execution

When you are confident that you have really eked out the maximum performance from your ForEach loop, there might still be room for improvement through parallelization. Usually, your beefy developer/operations machine laughs at your scripts, so you might as well use some more CPU cycles and some additional memory.

The most common approach to parallelization is to use the job cmdlets and work with the System.Management.Automation.Job objects. This yields good results but requires you to manage the jobs and their output properly. You will not be able to change variables in your scope from within your running jobs.

When working remotely in Invoke-Command, parallelization happens out of the box. This can be controlled by modifying the ThrottleLimit parameter. It has a default value of 32 potentially parallel sessions.

With Windows PowerShell, there has been the option of using workflows to parallelize for quite some time now. Since the workflow foundation will not be included in PowerShell Core, this will not be a viable option.

The last option that you have to step up the game is PowerShell runspaces. Managing runspaces, however, involves liberal use of .NET calls. We will explore the benefits of each possibility.

In addition to all that, there are also great community-driven PowerShell modules out there that do the heavy lifting and help you parallelize sensibly. We would like to highlight the module SplitPipeline module:

$machines = 1..32 | ForEach-Object {'NODE{0:d2}' -f $_}
$scriptBlock = {
Get-WinEvent -FilterHashtable @{
LogName = 'System'
ID = 6005
} -MaxEvents 50
}

$startInvoke1 = Get-Date
$events = Invoke-Command -ComputerName $machines -ScriptBlock $scriptBlock
$endInvoke1 = Get-Date

$startInvoke2 = Get-Date
$events = Invoke-Command -ComputerName $machines -ScriptBlock $scriptBlock -ThrottleLimit 16
$endInvoke2 = Get-Date

Write-Host ('ThrottleLimit 32: {0}s' -f ($endInvoke1 - $startInvoke1).TotalSeconds)
Write-Host ('ThrottleLimit 16: {0}s' -f ($endInvoke2 - $startInvoke2).TotalSeconds)

The first example uses Invoke-Command with its default throttle limit of 32 parallel connections. The events are returned quite fast, especially when compared to executing the cmdlet with the ComputerName parameter instead.

Dividing the throttle limit by half yields a higher execution time than expected, as seen in the following example:

$start = Get-Date
$jobs = 1..50 | ForEach-Object {
Start-Job -ScriptBlock { Start-Sleep -Seconds 1}
}
$jobs | Wait-Job
$end = Get-Date

$jobs | Remove-Job
$end - $start # Not THAT parallel
Write-Host ('It took {0}s to sleep 50*1s in 50 jobs' -f ($end -$start).TotalSeconds)

Using jobs can also be an effective way to parallelize things. In this case, you will get 1000 parallel jobs—it might be a good idea to bound them by the amount of resources you have available, which we will see with runspaces in a couple of lines. The job cmdlets used in the following example allow you to queue new jobs, wait for all or just a subset of them, and in the end, get the results that were produced with Receive-Job:

# Grab the logical processors to set the upper boundary for our runspace pool
$proc = Get-CimInstance -ClassName CIM_Processor
$runspacepool = [runspacefactory]::CreateRunspacePool(1, $proc.NumberOfLogicalProcessors, $Host)
$runspacepool.Open()

# We need to collect the handles to query them later on
$Handles = New-Object -TypeName System.Collections.ArrayList

# Queue 1000 jobs
1..1000 | Foreach-Object {
$posh = [powershell]::Create()
$posh.RunspacePool = $runspacepool

# Add your script and parameters. Note that your script block may of course have parameters
$null = $posh.AddScript( {
param
(
[int]$Seconds
)
Start-Sleep @PSBoundParameters})
$null = $posh.AddArgument(1)


[void] ($Handles.Add($posh.BeginInvoke()))
}

$start = Get-Date
while (($handles | Where IsCompleted -eq $false).Count)
{
Start-Sleep -Milliseconds 100
}
$end = Get-Date

Write-Host ('It took {0}s to sleep 1000*1s in up to {1} parallel runspaces' -f ($end -$start).TotalSeconds, $proc.NumberOfLogicalProcessors)

# When done: Clean up
$runspacepool.Close()
$runspacepool.Dispose()

Using runspaces and a runspacepool, we can parallelize our scripts easily as well. As long as .NET code does not mean that a chill runs down your spine, go for it. The sample creates a runspace pool that will hold all PowerShell runspaces. The pool has boundaries: we want at least one runspace and not more than the amount of our logical processors.

There are more nuances to parallel execution than there is room for it in this book. The boundaries mentioned in the previous paragraph are not set in stone and can be increased for other workloads.
Read more about it here: https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming 

This means that the load will be shared among runspaces , and all we are doing is queuing new PowerShell instances to use our runspace pool. As soon as a script in the pool has finished processing, the runspace frees up, allowing the next script block to run.

After queuing all 1,000 jobs, we simply wait for all jobs to finish their work, and then clean up after ourselves:

# Sample from github.com/nightroman/SplitPipeline
Measure-Command { 1..10 | . {process{ $_; sleep 1 }} }
Measure-Command { 1..10 | Split-Pipeline {process{ $_; sleep 1 }} }
Measure-Command { 1..10 | Split-Pipeline -Count 10 {process{ $_; sleep 1 }} }

# A practical example: Hash calculation
Measure-Command { Get-ChildItem -Path $PSHome -File -Recurse | Get-FileHash } #2.3s
Measure-Command { Get-ChildItem -Path $PSHome -File -Recurse | Split-Pipeline {process{Get-FileHash -Path $_.FullName}} } # 0.6s

SplitPipeline leverages PowerShell runspaces like you did in the previous example and packages everything neatly in one cmdlet, SplitPipeline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset