In this chapter, you will learn to:
Now that you have your vSphere environment running the way you designed it, you want to know how it is faring over time. For this, you can use the built-in statistical data. When you are using a vCenter Server, you will have access to aggregated data from the last year. When you are using stand-alone ESXi servers, the data at your disposal is available for a much more limited amount of time.
Statistical data is an indispensable source of information that allows you to answer questions like these:
To understand what is available and how you can use it, you need to grasp some basic concepts related to the statistical data in your vSphere environment.
When you are using stand-alone ESXi servers, you have access to statistical data, but what is offered is limited in time. You can get real-time data, which spans 20-second intervals, and aggregated data, which is aggregated over 5-minute intervals. The data for both intervals is kept on the ESXi server itself.
If you add a vCenter Server, you will get more, as shown in Figure 16-1.
The vCenter Server keeps the statistical data in four historical intervals (HIs), also known as statistical intervals. The data is transferred from the ESXi server into Historical Interval 1 (HI1) on the vCenter Server. This is done by the vCenter Server Agent that runs on each ESXi server that is added to the vCenter.
The historical intervals HI2, HI3, and HI4 are populated through aggregation. The aggregation process is done through three scheduled database jobs that are created when you install the vCenter Server and its database, as shown in Figure 16-2. This screenshot was taken from a Microsoft SQL Server, but similar scheduled jobs are present in whatever database engine you select to host the vCenter database.
With the function shown in Listing 16-1, you can get a closer look at what the aggregation jobs on the SQL Server that hosts the vCenter database are doing.
Listing 16-1: Listing the vCenter Server aggregation jobs
function Get-AggregationJob {
<#
.SYNOPSIS
Returns the SQL jobs that perform vCenter statistical data
aggregation
.DESCRIPTION
The function takes all SQL jobs in the "Stats Rollup" category
and returns key data for each of the jobs
.PARAMETER SqlServer
Name of the SQL server where the vSphere database is hosted
.EXAMPLE
Get-AggregationJob 'serverA'
#>
Param(
[parameter(Mandatory = $true,
HelpMessage = 'Enter the name of the vCenter SQL server')]
[string]$SqlServer)
$SMO = 'Microsoft.SqlServer.SMO'
[System.Reflection.Assembly]::LoadWithPartialName($SMO) |
Out-Null
$SMOSrv = 'Microsoft.SqlServer.Management.Smo.Server'
$sqlSRv = New-Object ($SMOSrv) $sqlServer
$sqlSrv.JobServer.Jobs |
Where-Object {$_.Category -eq 'Stats Rollup'} | Foreach-Object {
$object = [ordered]@{
Name = $_.Name
Description = $_.Description
LastRun = $_.LastRunDate
NextRun = $_.NextRunDate
LastRunResult = $_.LastRunOutcome
'Schedule(s)' = $_.JobSchedules | Foreach-Object {$_.Name}
}
New-Object PSObject -Property $object
}
}
The function produces an output listing similar to this:
Get-AggregationJob -SqlServer vcenter6
Name : Past Day stats rollupVC6DB
Description : This job is to roll up 5 min stats and should run every 30 mins
LastRun : 6/21/2015 1:30:00 AM
NextRun : 6/21/2015 2:00:00 AM
LastRunResult : Succeeded
Schedule(s) : 30 min schedule
Name : Past Month stats rollupVC6DB
Description : This job is to roll up Past Month stats and should run every day
LastRun : 1/1/0001 12:00:00 AM
NextRun : 6/21/2015 2:15:00 AM
LastRunResult : Unknown
Schedule(s) : Daily schedule
Name : Past Week stats rollupVC6DB
Description : This job is to roll up past week stats and should run every 2 hours
LastRun : 6/21/2015 1:45:00 AM
NextRun : 6/21/2015 3:45:00 AM
LastRunResult : Succeeded
An ESXi server gathers statistical data over a 20-second interval. This interval is called the real-time interval. On ESXi servers, these 20-second intervals are aggregated into 5-minute intervals. Unmanaged ESXi servers keep the aggregated data for approximately 1 day.
Managed ESXi server(s), those connected to a vCenter Server, send the 5-minute interval data to the vCenter Server. This is done through the vCenter Agent that runs on a managed ESXi server. The vCenter Server aggregates the data from these initial 5-minute intervals into longer intervals. These intervals are called historical intervals. Historical interval data is stored in the vCenter Server database.
On the vCenter Server you find the following four default historical intervals:
The settings for the historical intervals can be consulted and configured from the vSphere Client. See Figure 16-3.
You can, of course, also use PowerCLI to report the historical interval settings:
Get-StatInterval
This produces a listing similar to the following:
Name Sampling Period Secs Storage Time Secs
---- -------------------- -----------------
Past day 300 86400
Past week 1800 604800
Past month 7200 2592000
Past year 86400 31536000
You can change the parameters for one or more historical intervals. Listing 16-2 shows how you could change, for example, how long the statistical data is kept in the Past year
interval. Using this script, the default of 1 year is changed to 1 year and 1 month (365 + 31 days).
Listing 16-2: Changing historical interval parameters
$newInterval = New-TimeSpan -Days (365 + 31)
$targetInterval = 'Past year'
Get-StatInterval -Name $targetInterval |
Set-StatInterval -StorageTimeSecs $newInterval.TotalSeconds `
-Confirm:$false
The length of the retention period has to be specified in seconds. The Timespan
object has a property TotalSeconds
that you can use to retrieve the total number of seconds for our new retention period.
Make sure you know what you are doing when changing these intervals. Changing the parameters may have a serious impact on the size of the vCenter Server database. And it could mean that you lose data—data you wanted to keep—the moment the aggregation job on the SQL server fires. Remember, if you shorten the retention period of a HI, the next time the aggregation jobs run they will remove the data that falls outside the new retention period. Also note that you cannot specify just any value for these parameters. The accepted values are restricted; consult VMware documentation to determine the accepted values for your vSphere version.
The statistics level defines which metrics are available in a specific historical interval. (In the real-time interval, all metrics, except for the ones that are aggregated on the vCenter, are available.) The four statistics levels shown in Table 16-1 are available.
Table 16-1: Statistic levels
Level | Content |
1 | Basic metrics. Device metrics excluded. Only average rollups. |
2 | All metrics except those for devices. Maximum and minimum rollups excluded. |
3 | All metrics, maximum and minimum rollups excluded. |
4 | All metrics. |
You can change the statistics level through the vSphere Client.
Your first reaction may be to set all the historical intervals to level 4 with the idea “you never have enough input.” But reconsider for a minute. Is it really useful to know, for example, what the daily minimum and maximum memory usage was? If you do not think you have a use case for this data, set the statistics level for the historical interval to level 2. This removes the data for minima and maxima during the aggregation. And most important, it will save space on your vCenter Server database, make the statistic queries faster, and make the aggregation job a bit faster.
If you are looking to change the statistics levels from PowerCLI, you won’t find a parameter on the Set-StatInterval
cmdlet to do this in the current build. But with the help of the SDK method called UpdatePerfInterval
, it is quite easy, as shown in Listing 16-3.
Listing 16-3: Changing the statistics level
function Set-StatIntervalLevel {
<#
.SYNOPSIS
Change the statistics level of a Historical Interval
.DESCRIPTION
The function changes the statistics level, specified in the
Interval parameter, to a new level, specified in $Level.
The new statistics level cannot be higher than the statistics
level of the previous Historical Interval
.PARAMETER Interval
The Historical Interval for which you want to change the level
.PARAMETER Level
New statistics level
.EXAMPLE
Set-StatIntervalLevel -Level 3 `
>> -Interval (Get-StatInterval -Name 'Past week')
.EXAMPLE
Get-StatInterval -Name 'Past day' | `
>> Set-StatIntervalLevel -Level 4
#>
[CmdletBinding(SupportsShouldProcess = $true,
ConfirmImpact='High')]
Param(
[parameter(ValueFromPipeline = $true, Mandatory = $true,
HelpMessage = "Enter the name of the interval")]
[VMware.VimAutomation.Types.StatInterval]$Interval,
[parameter(Mandatory = $true,
HelpMessage = `
'Enter the new level of the Historical Interval')]
[string]$Level)
Begin{
$si = Get-View ServiceInstance
$perfMgr = Get-View $si.content.perfManager
}
Process{
$intervalSDK = $perfMgr.historicalInterval | `
Where-Object {$_.Name -eq $Interval.Name}
$intervalSDK.Level = $level
$msg = @(
"$((Get-Date).ToString())",
"$($MyInvocation.MyCommand)",
"Changing interval '$Interval' to level $Level"
)
Write-Verbose ($msg -join "`t")
if($PSCmdlet.ShouldProcess($Level,'Change statistics level')){
$perfMgr.UpdatePerfInterval($intervalSDK)
}
}
End{}
}
With the Set-StatIntervalLevel
function, it is now very easy to change the statistics level from within a script. If you want to automate your vCenter Server setup, use something like the code in Listing 16-4 to automate the level part of the statistics. The results are shown in Figure 16-4.
Note that the vCenter Server does not allow a higher statistics level for a historical interval than the statistics level used in the preceding historical interval. For example, you can’t specify a statistics level 3 for HI4 when HI3 has a statistics level of 2.
Listing 16-4: Changing the statistics level of a historical interval
Get-StatInterval -Name 'Past day' | Set-StatIntervalLevel -Level 4 -Confirm:$false
Get-StatInterval -Name 'Past week' | Set-StatIntervalLevel -Level 2 -Confirm:$false
Get-StatInterval -Name 'Past month' | Set-StatIntervalLevel -Level 2 -Confirm:$false
Get-StatInterval -Name 'Past year' | Set-StatIntervalLevel -Level 1 -Confirm:$false
Several managed entities provide utilization and other performance metrics. These managed entities are as follows:
Each of the performance providers that generate the statistical data has its own set of performance counters. Each performance counter is identified by a unique ID, which you will also find in the statistical data. Note that these IDs are not necessarily the same in different vSphere environments.
The performance counters are organized in groups based on the resources they cover (Table 16-2).
Table 16-2: Performance counter groups
Group | Description |
Cluster Services | Performance for clusters using DRS and/or HA |
CPU | CPU utilization |
Disk I/O Counters | I/O performance |
Storage Utilization Counters | Storage utilization |
Management Agent | Consumption of resources by the various management agents |
Memory | All memory statistics for guest and host |
Network | Network utilization for pNIC,vNIC, and other network devices |
Resource Scheduler | CPU-load-history statistics about resource pools and virtual machines |
System | Overall system availability |
Virtual Machine Operations | Virtual machine power and provisioning operations in a cluster or datacenter |
Host-based Replication | Host-based replication protection |
Power | Power resources |
Storage Capacity | Utilization |
A good source of information for the available metrics is the PerformanceManager
entry. For each of the groups listed in Table 16-2, you will find a list of the available metrics.
But you can also use PowerCLI to compile a list of metrics yourself. First, it is important to know that there are only metrics for the following entities in your vSphere environment: clusters, hosts, virtual machines, and resource pools. Second, the Get-StatType
cmdlet returns only the name of the metrics. (See Listing 16-5.)
Listing 16-5: The Get-StatType
cmdlet, which returns metric names
Get-StatType -Entity (Get-VMHost | Select-Object -First 1)
cpu.usage.average
cpu.usage.minimum
cpu.usage.maximum
cpu.usagemhz.average
cpu.usagemhz.minimum
cpu.usagemhz.maximum
cpu.reservedCapacity.average
cpu.wait.summation
cpu.ready.summation
cpu.idle.summation
cpu.used.summation
cpu.capacity.provisioned.average
cpu.capacity.usage.average
cpu.capacity.demand.average
cpu.capacity.contention.average
cpu.corecount.provisioned.average
cpu.corecount.usage.average
cpu.corecount.contention.average
With the following code (Listing 16-6), you can capture the metrics in text files. Notice the -Unique
parameter on the Sort-Object
cmdlet; this ensures there won’t be any duplicate entries due to the different instances (as you’ll learn later in this chapter) that can be present for a specific metric.
Listing 16-6: Retrieving the metrics
# Cluster metrics
Get-StatType -Entity (Get-Cluster | Select-Object -First 1) | `
Out-File 'Cluster-metrics.txt'
# Host metrics
Get-StatType -Entity (Get-VMHost | Select-Object -First 1) | `
Sort-Object -Unique | `
Out-File 'Host-metrics.txt'
# Virtual machine metrics
Get-StatType -Entity (Get-VM | Select-Object -First 1) | `
Sort-Object -Unique | `
Out-File 'VM-metrics.txt'
# Resource pool metrics
Get-StatType -Entity (Get-ResourcePool | `
Select-Object -First 1) | `
Sort-Object -Unique | `
Out-File 'Resource-pool-metrics.txt'
While the resulting text files give you a complete list of the available metrics, there is still a lot of the available information that stays hidden. By using an SDK method, you can produce a better and more detailed report of the available metrics. The function in Listing 16-7 uses the SDK methods to return more details about the available metrics for an entity.
Listing 16-7: Listing available metrics
function Get-StatTypeDetail {
<#
.SYNOPSIS
Returns available metrics for an entity
.DESCRIPTION
The function returns the available metrics for a specific
entity. Entities can be ESX(i)ESXi host, clusters, resource
pools or virtual machines.
The function can return the available metrics for all the
historical intervals together or for the realtime interval
.PARAMETER Entity
The entity for which the metrics should be returned
.PARAMETER Realtime
Switch to select the realtime metrics
.EXAMPLE
Get-StatTypeDetail -Entity (Get-VM 'Guest1')
.EXAMPLE
Get-StatTypeDetail -Entity (Get-VMHost 'esx1') -Realtime
.EXAMPLE
Get-VM 'Guest1' | Get-StatTypeDetail
#>
[CmdletBinding()]
Param(
[parameter(ValueFromPipeline = $true, Mandatory = $true,
HelpMessage = 'Enter an entity')]
[VMware.VimAutomation.ViCore.Impl.V1.Inventory.InventoryItemImpl[]]
$Entity,
[switch]$Realtime)
Begin{
# Create performance counter hashtables
$si = Get-View ServiceInstance
$perfMgr = Get-View $si.Content.perfManager
$pcTable = New-Object Hashtable
$keyTable = New-Object Hashtable
foreach($pC in $perfMgr.PerfCounter){
if($pC.Level -ne 99){
$pCKey = $pC.GroupInfo.Key, $pC.NameInfo.Key, $pC.RollupType `
-join "."
if(!$pctable.ContainsKey($pCKey.ToLower())){
$pctable.Add($pCKey,$pC.Key)
$keyTable.Add($pC.Key, $pC)
}
}
}
$metricslist = @()
}
Process{
# Get the metrics
$Entity | Foreach-Object {
Write-Verbose "Type $($_.GetType().Name) " | Out-Default
$metrics = $perfMgr.QueryAvailablePerfMetric(
$_.ExtensionData.MoRef,
$null,
$null,
$null)
$metricsNoInstances = $metrics | Where-Object {$_.Instance -eq ''}
foreach($pmId in $metricsNoInstances){
$pC = $keyTable[$pmId.CounterId]
$row = [ordered]@{
CounterId = $pc.Key
Group = $pC.GroupInfo.Key
Name = $pC.NameInfo.Key
Rollup = $pC.RollupType
Id = $pC.Key
Level = $pC.Level
Type = $pC.StatsType
Unit = $pC.UnitInfo.Key
Description = $pc.NameInfo.Summary
Entity = $_.ExtensionData.GetType().Name
MetricName = $pC.GroupInfo.Key, $pC.NameInfo.Key, `
$pC.RollupType -join "."
}
$metricslist += New-Object -TypeName PSObject -Property $row
}
}
}
End{
$metricslist | Sort-Object -Property Entity,Group,Name,Rollup
}
}
A run of the Get-StatTypeDetail
function returns detailed information about the available metrics for the entity. Note that if you use the -Realtime
switch, the entity should be accessible. An ESXi host, for example, has to be in the Connected
state.
Get-StatTypeDetail `
>>-Entity (Get-VMHost | Select-Object -First 1) -Realtime |
>>Select CounterId, Name, Group, Rollup |
>>ft -AutoSize -Force
>>
CounterId Name Group Rollup
--------- ---- ----- ------
215 cpufairness clusterServices latest
216 memfairness clusterServices latest
19 capacity.contention cpu average
18 capacity.demand cpu average
15 capacity.provisioned cpu average
17 capacity.usage cpu average
22 corecount.contention cpu average
20 corecount.provisioned cpu average
21 corecount.usage cpu average
390 coreUtilization cpu average
391 coreUtilization cpu maximum
392 coreUtilization cpu minimum
397 costop cpu summation
Instance is the last concept we want to introduce before we show some practical examples. The official definition of an instance comes from the VMware vSphere API Reference documentation: “An identifier that is derived from configuration names for the device associated with the metric. It identifies the instance of the metric with its source.”
Let’s try to make this a bit more understandable through an example.
Take the CPU-related metrics for a host. If the host is, for example, equipped with a quad-core CPU, there will be four instances for each CPU-related metric: 0, 1, 2, and 3. In this case, each instance corresponds with the numeric position of the core within the CPU block. And there will be an additional instance, the so-called aggregate, which is the metric averaged over all the other instances.
Each instance gets a unique identifier, which is included in the returned statistical data. The aggregate instance is always represented by a blank identifier.
If you want to list the available instances for a metric on a specific entity, you will have to make use of an SDK method called QueryAvailablePerfMetric
, as shown in Listing 16-8.
Listing 16-8: Listing available instances
function Get-StatInstance {
<#
.SYNOPSIS
Returns the available instances for a specific metric and entity
.DESCRIPTION
The function returns all the available instances for a metric on
an entity. The entity can be an ESXi host, a cluster, a
resource pool or a virtual machine.
.PARAMETER Entity
The entity or entities for which the instances should be returned
.PARAMETER Stat
The metric or metrics for which the instances should be returned
.PARAMETER Realtime
Switch to select the realtime metrics
.EXAMPLE
Get-StatInstance -Entity (Get-VM 'Guest1') `
>> -Stat "cpu.usage.average"
.EXAMPLE
Get-StatInstance -Entity $esx -Stat 'cpu.usage.average' `
>> -Realtime
.EXAMPLE
Get-VMHost MyEsx | Get-StatInstance `
>> -Stat 'disk.devicelatency.average'
#>
[CmdletBinding()]
Param(
[parameter(ValueFromPipeline = $true, Mandatory = $true,
HelpMessage = 'Enter an entity')]
[PSObject[]]$Entity,
[parameter(Mandatory=$true,
HelpMessage = 'Enter a metric')]
[string[]]$Stat,
[switch]$Realtime)
begin{
# Create performance counter hashtables
$si = Get-View ServiceInstance
$perfMgr = Get-View $si.content.perfManager
$pcTable = New-Object Hashtable
foreach($pC in $perfMgr.PerfCounter){
if($pC.Level -ne 99){
$pKeyComponents = $pC.GroupInfo.Key,
$pC.NameInfo.Key,
$pC.RollupType
$pCKey = $pKeyComponents -join '.'
$pCKey = $pCKey.ToLower()
if(!$pctable.ContainsKey($pCKey)){
$pctable.Add($pcKey,$pC.Key)
}
}
}
}
process{
$entSDK = $entity | Get-View
# Handle the Realtime switch
$numinterval = $null
if($Realtime){
$provSum = $perfMgr.QueryPerfProviderSummary($entSDK.MoRef)
$numinterval = $provSum.refreshRate
}
# Get the metrics for the entity
$entSDK | Foreach-Object {
$metrics += $perfMgr.QueryAvailablePerfMetric($_.MoRef,
$null,
$null,
$numinterval)
# Check is stat is valid
foreach($st in $stat){
if(!$pcTable.ContainsKey($st.ToLower())){
Throw "-Stat parameter $st is invalid."
}
else{
$ids += $pcTable[$st]
}
foreach($metric in $metrics){
if($metric.CounterId -eq $pcTable[$st.ToLower()]){
$obj = [ordered]@{
StatName = $st
Instance = $metric.Instance
}
New-Object PSObject -Property $obj
}
}
}
}
}
end{}
}
With the Get-StatInstance
function, you can list the instances that are available for one or more metrics (Listing 16-9).
Listing 16-9: Listing the available instances for a metric
Get-StatInstance `
>> -Entity (Get-VMHost | Select -First 1) `
>> -Stat cpu.usage.average
>>
StatName Instance
-------- --------
cpu.usage.average
cpu.usage.average 0
cpu.usage.average 1
Get-VMHost | Select -First 1 | `
>> Get-StatInstance -Stat disk.kernelLatency.average
>>
StatName Instance
-------- --------
disk.kernelLatency.average mpx.vmhba1:C0:T0:L0
disk.kernelLatency.average eui.0307cf5fa7c2eb72
disk.kernelLatency.average eui.840b6fa6de6c389b
When you’re comfortable with these basic concepts, you can start working with the statistical data.
The PowerCLI module offers several cmdlets for use with performance data. Let’s have a quick look:
Get-Command -Noun Stat* -Module VMware*
CommandType Name Version Source
----------- ---- ------- ------
Cmdlet Get-Stat 6.0.0.0 VMware.VimAutomation.Core
Cmdlet Get-StatInterval 6.0.0.0 VMware.VimAutomation.Core
Cmdlet Get-StatType 6.0.0.0 VMware.VimAutomation.Core
Cmdlet New-StatInterval 6.0.0.0 VMware.VimAutomation.Core
Cmdlet Remove-StatInterval 6.0.0.0 VMware.VimAutomation.Core
Cmdlet Set-StatInterval 6.0.0.0 VMware.VimAutomation.Core
The following list gives a short description of what each of these cmdlets is used for:
Get-Stat
returns the statistical data for one or more specific objects.Get-StatInterval
returns the available statistics intervals.Get-StatType
returns the available metrics for a specific object.Set-StatInterval
changes settings on the specified statistics interval.Remove-StatInterval
is obsolete, unless you’re still on Virtual Center 2.0.New-StatInterval
is obsolete, unless you’re still on Virtual Center 2.0.The most important cmdlet in this list is the Get-Stat
cmdlet. It gives you access to the statistical data that is stored on your ESXi servers and on the vCenter Server(s).
Let’s have a look at the important parameters you can use with the Get-Stat
cmdlet:
-Entity
This parameter specifies one or more objects for which you want to retrieve the statistics. You can pass the name of the entity, but be aware that there will be execution time overhead since the cmdlet logic will have to fetch that object for the entity.-Stat
This parameter specifies one or more metrics for which you want to retrieve the statistics. If you want to retrieve more than one metric, specify them as a string array. The names of the metrics are not case sensitive.-Start
This defines the beginning of the time range for which you want to retrieve statistics.-Finish
Specifies the end of the time range for which you want to retrieve statistics.-Realtime
This parameter specifies that you want to retrieve real-time statistics. These come directly from your ESXi server(s) and by default use a 20-second interval.The Get-Stat
cmdlet returns the statistical data as one or more FloatSampleImpl
objects. To have a good grasp of what you can do with the returned data, you should know what is present in the FloatSampleImpl
object. With the Get-Member
cmdlet, it is easy to check what properties are available in the object; see Listing 16-10.
Listing 16-10: Using the Get-Stat
and Get-Member
cmdlets to obtain statistical data
Get-Stat -Entity (Get-VMHost | `
>> Select -First 1) -Stat "mem.usage.average" `
>> -MaxSamples 1 | Select-Object *
>>
Value : 17.95
Timestamp : 6/19/2015 2:00:00 AM
MetricId : mem.usage.average
Unit : %
Description : Memory usage as percentage of total configured or available memory
Entity : esx1.local.test
EntityId : HostSystem-host-10
IntervalSecs : 86400
Instance :
Uid : /VIServer=locallucd@vcenter6:443/VMHost=`
HostSystem-host-10/FloatSample=mem.usage.average\635702760000000000/
Get-Stat -Entity (Get-VMHost | `
>> Select -First 1) -Stat “mem.usage.average” `
>> -MaxSamples 1 | Get-Member -MemberType Property
>>
Type: VMware.VimAutomation.ViCore.Impl.V1.Stat.FloatSampleImpl
Name MemberType Definition
---- ---------- ----------
Description Property System.String Description {get;set;}
Entity Property VMware.VimAutomation.Types.VIObject E…
EntityId Property System.String EntityId {get;}
Instance Property System.String Instance {get;set;}
IntervalSecs Property System.Int32 IntervalSecs {get;set;}
MetricId Property System.String MetricId {get;set;}
Timestamp Property System.DateTime Timestamp {get;set;}
Uid Property string Uid {get;}
Unit Property System.String Unit {get;set;}
Value Property System.Single Value {get;}
Most of these properties are self-explanatory and use a basic type like string
, int32
, or DateTime
. The exception is the Entity
property. This property contains the actual automation object—in other words, the object that cmdlets like Get-VM
or Get-VMHost
would return. This behavior can be useful in your reporting scripts if you need to include certain properties of the entity.
Now that you have everything set up as you want, it’s time to start producing some reports. The first problem you will encounter is the selection of the metrics for your reports. Given the huge number of available metrics in vSphere, this book is not going to describe in detail what is available or when to use a specific metric. But the Get-StatReference
function might help you. The function creates an HTML page containing all the metrics that are available on the vCenter Server where you are connected. Because you can easily re-create this page, you will always have access to the latest list of metrics. Listing 16-11 contains the Get-StatReference
function.
Listing 16-11: The Get-StatReference
function
function Get-StatReference {
<#
.SYNOPSIS
Creates an HTML reference of all the available metrics
.DESCRIPTION
The function returns a simple HTML page which contains all the
available metrics in the environment where you are connected.
.EXAMPLE
Get-StatReference | Out-File "$env:tempmetricRef.html"
#>
Begin{
# In API 4.0 there is a bug.
# There are 4 duplicate metrics that only differ in the case
# These are excluded with the -notcontains condition
$badMetrics = 'mem.reservedcapacity.average',
'cpu.reservedcapacity.average',
'managementAgent.swapin.average',
'managementAgent.swapout.average'
$si = Get-View ServiceInstance
$perfMgr = Get-View $si.content.perfManager
}
Process{
# Create performance counter hashtables
$metricRef = foreach($pC in $perfMgr.PerfCounter){
if($pC.Level -ne 99){
$pKeyComponents = $pC.GroupInfo.Key,
$pC.NameInfo.Key,
$pC.RollupType
$pCKey = $pKeyComponents -join '.'
if($badMetrics -notcontains $pCKey){
$pCKey = $pCKey.ToLower()
New-Object PSObject -Property @{
Metric = $PCKey
Level = $pC.Level
Unit = $pC.UnitInfo.Label
Description = $pC.NameInfo.Summary
}
}
}
}
}
End{
$metricRef | Sort-Object -Property Metric | `
ConvertTo-Html -Property Metric,Level,Unit,Description
}
}
You produce the HTML page as follows:
Get-StatReference | Out-File metricRef.html
Stats Toolbox is an alternative for exploring metrics:
www.lucd.info/2014/09/02/stats-toolbox/
This PowerShell script offers a GUI interface with several features to work with vSphere performance metrics, including searching for a metric in the VMTN Communities and generating the Get-Stat
-based code.
Besides the VMware documentation, several sources are available where you can learn which metric should be used for what. A good starting point is the Performance Community on VMTN.
There are several techniques (such as defining the correct time range, selecting the correct intervals, and the like) that you will use regularly when you are working with statistical data. It will be worth your while to spend a bit of time practicing these basic techniques.
The -Start
and -Finish
parameters determine for which time period you want to retrieve the statistics.
Get-Stat -Entity (Get-VMHost |
>> Select -First 1) -Stat 'mem.usage.average' `
>> -Start '06/21/2015 17:15:20' -Finish '06/21/2015 17:16:00'
>>
MetricId Timestamp Value Unit Instance
-------- --------- ----- ---- --------
mem.usage.average 6/21/2015 5:16:00 PM 18.05 %
mem.usage.average 6/21/2015 5:15:40 PM 18.05 %
mem.usage.average 6/21/2015 5:15:20 PM 18.05 %
The format depends on the regional settings on the workstation from where you run the Get-Stat
cmdlet. You can check which format to use with the Get-Culture
cmdlet. The seconds can be left out for the time portion.
(Get-Culture).DateTimeFormat.ShortDatePattern
M/d/yyyy
(Get-Culture).DateTimeFormat.ShortTimePattern
h:mm tt
The next run of the Get-Stat
cmdlet was on a system with a customized en-GB
setting. Notice how the date portion is entered in a different format:
Get-Stat -Entity (Get-VMHost |
>> Select -First 1) -Stat 'mem.usage.average' `
>> -Start '21-06-2015 17:15' -Finish '21-06-2015 17:16'
MetricId Timestamp Value Unit Instance
--------- ---------- ----- ---- --------
mem.usage.average 21-06-2010 17:16:00 9,46 %
mem.usage.average 21-06-2010 17:15:40 9,46 %
mem.usage.average 21-06-2010 17:15:20 9,46 %
The DateTime
constructor, which is inherited from the .NET Framework, is handy to create DateTime
objects with the use of variables. Notice how you can use specific properties from the DateTime
object returned by the Get-Date
cmdlet to populate variables.
$year = (Get-Date).Year
$day = 21
$hour = 18
New-Object DateTime($year,5,$day,$hour,30,0)
>>
Thursday, May 21, 2015 6:30:00 PM
You can use the Get-Date
cmdlet with its parameters in a similar way:
Get-Date -Year $year -Day $day -Hour $hour
Sunday, June 21, 2015 6:29:35 PM
An often overlooked point is which intervals you need to include in your calculations. Suppose you want to produce a report that covers from 17:50 until 18:00 and that you must use real-time data. Which intervals do you need to include?
To answer this question, you first need to understand the Timestamp
property. Does the timestamp give you the time the interval started, stopped, or some value smack in the middle of the interval?
According to the SDK Reference, Timestamp
represents “the time at which the sample was collected.” In other words, Timestamp
is at the end of the measured interval. So for our example, we don’t want the interval with the timestamp 17:50, because that contains the measured data for the interval from 17:49:40 until 17:50:00. The first interval we use will have a timestamp of 17:50:20.
What about the end of the requested interval? It is quite clear that we need to stop with the data that has a Timestamp
of 18:00:00, because that interval measured data from 17:59:40 until 18:00:00. This is an important concept that we will return to later.
When you’re scripting the creation of your statistical reports, you don’t want to use hard-coded dates and times. Luckily PowerShell provides multiple methods and properties on the DateTime
type to allow you to get exactly the time interval you want.
It’s handy to know how to get the specifications for the -Start
and -Finish
parameters for recurring time frames. Armed with this knowledge, you’ll find it easy to roll your own via Listing 16-12.
Listing 16-12: Scripting recurring time frames
Midnight until now
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$Start = $todayMidnight.AddSeconds(1)
$Finish = Get-Date
Yesterday
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$Start = $todayMidnight.AddDays(-1).AddSeconds(1)
$Finish = $todayMidnight
Day before yesterday
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$Start = $todayMidnight.AddDays(-2).AddSeconds(1)
$Finish = $todayMidnight.AddDays(-1)
Previous week
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$endOfWeek = $todayMidnight.`
AddDays(-$todayMidnight.DayOfWeek.value__ +1)
$Start = $endOfWeek.AddDays(-7).AddSeconds(1)
$Finish = $endOfWeek
“x” months back
$monthsBack = <number-of-months-back>
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$xMonthsAgoMidnight = $todayMidnight.AddMonths(-$monthsBack + 1)
$endOfMonth = $xMonthsAgoMidnight.`
AddDays(-$xMonthsAgoMidnight.Day)
$Start = $endOfMonth.AddMonths(-1).AddDays(1).AddSeconds(1)
$Finish = $endOfMonth Grouping your Data
When you collect statistical data, an important time-saver is the use of the Group-Object
cmdlet. This cmdlet offers so many features that it should definitely be in your PowerShell tool belt.
The script in Listing 16-13 shows how to use grouping to produce a report for an individual server. This simple example produces a report that shows the average transmit rate per physical adapter (pNIC) on a specific ESXi server over the previous day. In this case, the Instance
property is used for the grouping. The aggregate data is skipped by filtering out that group, characterized by an empty string in the Instance
property.
Listing 16-13: Average transmit rate per pNIC on a single server
$esxName = "esx1.test.local"
$metric = "net.transmitted.average"
# Define the time frame
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$Start = $todayMidnight.AddDays(-1).AddSeconds(1)
$Finish = $todayMidnight
# Get the entity
$esxImpl = Get-VMHost -Name $esxName
# Produce the report
Get-Stat -Entity $esxImpl -Stat $metric -Start $Start `
-Finish $Finish | Group-Object -Property Instance | `
Where {$_.Name -ne ""} | %{
New-Object PsObject -Property @{
pNIC = $_.Name
AvgKbps = [Math]::Round(($_.Group | `
Measure-Object -Property Value -Average).Average, 1)
TotalKbps = [Math]::Round(($_.Group | `
Measure-Object -Property Value -Sum).Sum, 1)
}
}
The script produces a simple table that lists the average and total transmit rate per physical NIC for the previous day, like the one shown in the code that follows. This table allows you to see if your load balancing is performing as expected.
TotalKbps pNIC AvgKbps
--------- ---- -------
282 vmnic0 5,9
35338 vmnic1 736,2
5 vmnic6 0,1
0 vmnic7 0
In this case, vmnic0
and vmnic7
were in a port group without load balancing active.
If you want to run the previous example against multiple ESXi servers, you have to introduce a second level of grouping: the ESXi hostname.
The script in Listing 16-14 produces a table similar to the one in the previous section, but it will also include a column with the ESXi hostname.
There are some noteworthy points in this script:
Get-VMHost
cmdlet for the -Name
parameter. In this script, the following were used:
[01]
means that in that position there can be a 0 or a 1.Get-Stat
cmdlet.Group-Object
cmdlet uses the Values
property to store an array with the specific values that were used for that group. The order of the elements in the Values
property corresponds with the order used in the -Property
parameter.Listing 16-14: Average transmit rate per pNIC over several servers
$esxName = "esx[01][01289]*"
$metric = "net.transmitted.average"
# Define the time frame
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$Start = $todayMidnight.AddDays(-1).AddSeconds(1)
$Finish = $todayMidnight
# Get the entity
$esxImpl = Get-VMHost -Name $esxName
# Produce the report
Get-Stat -Entity $esxImpl -Stat $metric -Start $Start `
-Finish $Finish | Group-Object -Property Entity,Instance | `
Where {$_.Values[1] -ne ""} | %{
New-Object PsObject -Property @{
ESXname = $_.Values[0]
pNIC = $_.Values[1]
AvgKbps = [Math]::Round(($_.Group | `
Measure-Object -Property Value -Average).Average, 1)
TotalKbps = [Math]::Round(($_.Group | `
Measure-Object -Property Value -Sum).Sum, 1)
}
}
The script in Listing 16-14 produces a simple table (similar to the following) that lists the average and total transmit rate per pNIC for the previous day on each server. This table allows you to see how your load balancing is performing across several servers.
TotalKbps ESXname pNIC AvgKbps
--------- ------- ---- -------
282 esx08.test.local vmnic0 5,9
35338 esx08.test.local vmnic1 736,2
5 esx08.test.local vmnic6 0,1
0 esx08.test.local vmnic7 0
210 esx09.test.local vmnic0 4,4
13368 esx09.test.local vmnic1 278,5
0 esx09.test.local vmnic6 0
1 esx09.test.local vmnic7 0
175 esx10.test.local vmnic0 3,6
26463 esx10.test.local vmnic1 551,3
0 esx10.test.local vmnic6 0
0 esx10.test.local vmnic7 0
186 esx11.test.local vmnic0 3,9
12857 esx11.test.local vmnic1 267,9
0 esx11.test.local vmnic6 0
4460 esx11.test.local vmnic7 92,9
287 esx12.test.local vmnic0 6
17960 esx12.test.local vmnic1 374,2
0 esx12.test.local vmnic6 0
1 esx12.test.local vmnic7 0
If you investigate the Values
array closely, you will notice that the first element is the complete VMHostImpl
object, not just the name property. In this sample script, that didn’t make a difference, but if you want to use only the ESXi host’s name, just replace the line containing the Group-Object
cmdlet with this line:
-Finish $Finish | Group-Object -Property {$_.Entity.Name},Instance | `
As we explained in the previous section, you are not obliged to use actual level 1 properties as a grouping criterion; you can also use a code block, in which you can perform whatever computation you want and return a value to the Group-Object
cmdlet.
This feature comes in quite handy when you are working with statistical data. Assume you want a report on the pNICs but you only want to see a line in the report when the transmission rate goes above a certain threshold. That would make it easy to see, for example, at which points in time your pNICs have to deal with a heavier transmission rate. Listing 16-15 details how to extract potentially problematic transmit rates.
Listing 16-15: Extracting potentially problematic transmit rates
$esxName = "esx1.test.local"
$metric = "net.transmitted.average"
# Define the time frame
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$Start = $todayMidnight.AddDays(-1).AddSeconds(1)
$Finish = $todayMidnight
# Transmission groups
$codeBlock = {
if($_.Instance -ne ""){
if($_.Value -lt 750){"OK"}
elseif($_.Value -lt 1500){"Investigate"}
else{"Problem"}
}
}
# Get the entity
$esxImpl = Get-VMHost -Name $esxName
# Produce the report
Get-Stat -Entity $esxImpl -Stat $metric -Start $Start `
-Finish $Finish | Group-Object -Property $codeBlock | `
Where {$_.Name -eq "Problem"} | %{
$_.Group | %{
New-Object PsObject -Property @{
ESXname = $_.Entity.Name
Time = $_.Timestamp
pNIC = $_.Instance
TransmittedKbps = $_.Value
}
}
}
The script in Listing 16-15 produces an output table similar to the one that follows. Because the script deals with yesterday’s data, it falls within Historical Interval 2, which has a reporting interval of 30 minutes. That means that in the 30 minutes before 00:30 there was an average transmission rate on vmnic1 of 11341 Kbps. That could be the backup window!
TransmittedKbps ESXname pNIC Time
--------------- ------- ---- ----
11342 esx10.test.local vmnic1 20-06-2015 00:30:00
1527 esx10.test.local vmnic1 20-06-2015 12:30:00
4123 esx10.test.local vmnic1 20-06-2015 23:30:00
As you saw in the previous section, you pass a code block to the -Properties
parameter of the Group-Object
cmdlet. This script uses another PowerShell feature where you can store a code block in a variable and use it later to pass as a parameter to a cmdlet. This feature is handy when you have to use, for example, the same set of group selection criteria in multiple reports. A good use for this feature could be grouping your statistical data to show business-hour and non-business-hour activity.
Nested groups allow you to use the groups resulting from the first Group-Object
cmdlet as input to a second Group-Object
cmdlet. As an example, say you want to get two separate reports on the average transmission rates, one for the backup administrator and one for the vSphere administrator. Now, if you want to report over an interval that is not one of those predefined historical intervals, you will have to do the required calculations in your script. As so often is the case, the Group-Object
comes to the rescue, as shown in Listing 16-16.
Listing 16-16: Extracting potentially problematic transmit rates per time period
$esxName = 'esx1.test.local'
$metric = 'net.transmitted.average'
# Define the time frame
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$Start = $todayMidnight.AddDays(-1).AddSeconds(1)
$Finish = $todayMidnight
# Transmission groups
$codeBlock1 = {
if($_.Instance -ne ''){
if($_.Value -gt 750){'Problem'}
else{'No problem'}
}
}
$codeBlock2 = {
if($_.Timestamp.Hour -le 1 -or $_.Timestamp.Hour -ge 23){
"Backup window"
}
else{
'Outside backup window'
}
}
# Get the entity
$esxImpl = Get-VMHost -Name $esxName
# Produce the report
Get-Stat -Entity $esxImpl -Stat $metric -Start $Start `
-Finish $Finish | Group-Object -Property $codeBlock1 | `
Where {$_.Name -eq “Problem”} | %{
$_.Group | Group-Object -Property $codeBlock2 | %{
if($_.Name -eq 'Outside backup window'){
Write-Host "`n==> Report for the vSphere admin <==`n"
}
else{
Write-Host "`n==> Report for the backup admin <==`n"
}
$_.Group | %{
Write-Host $_.Timestamp $_.Entity.Name $_.Instance $_.Value
}
}
}
Since the script was written for demonstration purposes, we did not include any fancy output formatting. As you can see in the output that follows, the second Group-Object
cmdlet smoothly allows the script to produce two separate reports based on the time frame:
==> Report for the backup admin <==
20-06-2015 23:30:00 esx1.test.local vmnic1 4123
20-06-2015 00:30:00 esx1.test.local vmnic1 11342
==> Report for the vSPhere admin <==
20-06-2015 12:30:00 esx1.test.local vmnic1 1527
20-06-2015 12:00:00 esx1.test.local vmnic1 819
20-06-2015 09:30:00 esx1.test.local vmnic1 954
20-06-2015 09:00:00 esx1.test.local vmnic1 886
20-06-2015 06:00:00 esx1.test.local vmnic1 1067
As we explained earlier in the “Understand Some Basic Concepts” section, there are four statistics intervals and one real-time interval, each with its own interval duration. However, the statistics intervals vCenter Server provides are not always the intervals you want to use in your reports.
There are several ways to produce reports with user-defined intervals. Listing 16-17 shows one of the methods. The script generates a report with the average CPU busy metric over a business day for all of the previous week’s business days.
Listing 16-17: Report with user-defined intervals
$esxName = 'esx1.test.local'
$metric = 'cpu.usage.average'
# Define the time frame
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$endOfWeek = $todayMidnight.`
AddDays(-$todayMidnight.DayOfWeek.value__ +1)
$Start = $endOfWeek.AddDays(-7).AddSeconds(1)
$Finish = $endOfWeek
$workingDays = 'Monday','Tuesday','Wednesday','Thursday','Friday'
# Use New-Object to create a DateTime Object from which we will
# be able to use the TimeofDay property
$businessStart = New-Object DateTime(1,1,1,9,00,0) # 09:00 AM
$businessEnd = New-Object DateTime(1,1,1,17,30,0) # 05:30 PM
# Group per hour
$codeBlock1 = {
$_.Timestamp.Day
}
# Aggregate value for Business hours
$codeBlock2 = {
if($_.Instance -eq '' -and
$workingDays -contains $_.Timestamp.DayOfWeek -and
$_.Timestamp.TimeOfDay -gt $businessStart.TimeOfDay -and
$_.Timestamp.TimeOfDay -lt $businessEnd.TimeOfDay){
$true
}
else{
$false
}
}
# Get the entity
$esxImpl = Get-VMHost -Name $esxName
# Produce the report
Get-Stat -Entity $esxImpl -Stat $metric -Start $Start `
-Finish $Finish | `
Group-Object -Property $codeBlock1,$codeBlock2 | `
Where {$_.Values[1] -eq $true} | %{
New-Object PsObject -Property @{
ESXname = ($_.Group | Select-Object -First 1).Entity.Name
Time = ($_.Group | Select-Object -First 1).Timestamp.Date
AvgCPU = [Math]::Round(($_.Group | `
Measure-Object -Property Value -Average).Average, 1)
}
}
The script produces the following output:
AvgCPU ESXname Time
------ ------- ----
21,6 esx10.test.local 15-06-2015 00:00:00
18,1 esx10.test.local 16-06-2015 00:00:00
20,3 esx10.test.local 17-06-2015 00:00:00
20,5 esx10.test.local 18-06-2015 00:00:00
17,7 esx10.test.local 19-06-2015 00:00:00
In PowerShell v3 the concept of workflows was introduced. One of the nice features that PowerShell workflows offer is the concept of parallelism. In other words, you can run multiple instances of the same code. This allows you to split up single-threaded long running jobs into many smaller jobs, each running against a specific instance or entity.
In Listing 16-18, we retrieve the statistical data for two ESXi nodes. A retrieval task is started for each of the ESXi nodes.
Listing 16-18: Parallel jobs
workflow Get-StatData {
Param(
[string]$vcenter,
[string[]]$names,
[string]$session
)
foreach -parallel($name in $names){
$stats = InlineScript{
Add-PSSnapin VMware*
Connect-VIServer -Server $Using:vcenter `
-Session $Using:session | Out-Null
Get-Stat -Entity $Using:name -Realtime `
-Stat cpu.usage.average -MaxSamples 5
}
$stats
}
}
$esxNames = 'esx1.local.test','esx2.local.test'
$data = Get-StatData -names $esxNames -vcenter 'vcenter6' `
-session $global:DefaultVIServer.SessionSecret
$data | Group-Object -Property {$_.Entity}
The code launches a Get-Stat
cmdlet in a separate thread against each of the ESXi nodes. Data returns to the caller as one data collection, as demonstrated in our resulting groups.
Count Name Group
----- ---- -----
15 esx2.local.test {0.38, 0.71, 0.64, 0.37...}
15 esx1.local.test {1.39, 0.42, 0.36, 0.55...}
This is, of course, a constructed and meaningless example, but besides retrieving the statistical data, you could also handle the data in each thread. You could do some calculations for each ESXi node, produce the report, and get all the results back in one block after the call to the workflow.
As we’ve said many times before, with PowerShell there is always more than one way to do something. At times, you’ll find that the path you chose was not the optimal path. When you are working with lots of statistical data, the execution time of your script can become a critical factor. The following list shows some ways to improve execution time.
Get-Stat
cmdlet.
Group-Object
.As you learned in the “Understand Some Basic Concepts” section, the data for HI4 is kept for one year by default. You can change the retention time to five years, but that will increase the size of your vCenter Server database with a possible performance impact.
If you want to be able to run a statistics report on older data, there is another solution. At regular points in time, collect the oldest statistical data you want to use later and store it in an external file. PowerShell provides the Export-Clixml
cmdlet to store PowerShell objects in an external file. Later, you can use the Import-Clixml
cmdlet to retrieve the objects and bring them back into your PowerShell session.
The script in Listing 16-19 exports some basic metrics to an external file.
Listing 16-19: Exporting statistical data
$entityNames = 'esx[01][01289]* '
$metrics = 'cpu.usage.average’,’mem.usage.average',
'disk.usage.average','net.usage.average'
$archiveLocation = 'C:Archive'
# Time range - 12 months ago
$monthsBack = 12
$todayMidnight = (Get-Date -Hour 0 -Minute 0 -Second 0)
$xMonthsAgoMidnight = $todayMidnight.AddMonths(-$monthsBack + 1)
$endOfMonth = $xMonthsAgoMidnight.`
AddDays(-$xMonthsAgoMidnight.Day)
$Start = $endOfMonth.AddMonths(-1).AddDays(1).AddSeconds(1)
$Finish = $endOfMonth
# All entities
$entities = Get-VMHost -Name $entityNames
$archiveFilename = 'Stat-' + $Start.ToString("MM_dd_yyyy") + '-' + `
$Finish.ToString("MM_dd_yyyy") + '.xml'
# Export statistics
Get-Stat -Entity $entities -Stat $metrics -Start $Start `
-Finish $Finish | Export-Clixml `
-Path ($archiveLocation + '' + $archiveFilename)
The following script reads the archive file back into a PowerShell array. You can work with that imported array just as you would if you had created it with the Get-Stat
cmdlet.
$archiveFilename = 'C:ArchiveStat-01_01_2015-31_01_2015.xml'
$stats = Import-Clixml -Path $archiveFilename