The items we have used so far were collecting data from some Zabbix agent or SNMP device. It is also possible to reuse this data in some calculation, store the result and treat it as a normal item to be used for graphs, triggers, and other purposes. Zabbix offers two types of such items:
We will start with calculated items that require typing in a formula. We are already monitoring total and used disk space. If we additionally wanted to monitor free disk space, we could query the agent for this information. This is where calculated items come in—if the agent or device does not expose a specific view of the data, or if we would like to avoid querying monitored hosts, we can do the calculation from the already retrieved values. To create a calculated item that would compute the free disk space, navigate to Configuration | Hosts, click on Items next to A test host, and then click on Create item. Fill in the following information:
Diskspace on / (free)
calc.vfs.fs.size[/,free]
last("vfs.fs.size[/,total]")-last("vfs.fs.size[/,used]")
B
1800
When done, click on the Add button at the bottom.
All the referenced items must exist. We cannot enter keys here and have them gather data by extension from the calculated item. Values to compute the calculated item are retrieved from the Zabbix server caches or the database; no connections are made to the monitored devices.
With this item added, let's go to the Latest data page. As the interval was set to 1,800 seconds, we might have to wait a bit longer to see the value, but eventually it should appear:
The interval we used, 1,800 seconds, was not matched to the intervals of both referenced items. Total and used disk space items were collecting data every 3,600 seconds, but calculated items are not connected to the data collection of the referenced items in any way. A calculated item is not evaluated when the referenced items get values—it follows its own scheduling, which is completely independent from the schedules of the referenced items, and is semi-random. If the referenced items stopped collecting data, our calculated item would keep on using the latest value for the calculation, as we used the last()
function. If one of them stopped collecting data, we would base our calculation on one recent and one outdated value. And if our calculated item could get very incorrect results if called at the wrong time because one of the referenced items has significantly changed but the other has not received a new value yet, there is no easy solution to that, unfortunately. The custom scheduling, discussed in Chapter 3, Monitoring with Zabbix Agents and Basic Protocols, could help here, but it could also introduce performance issues by polling values in uneven batches—and it would also be more complicated to manage. It is suggested to be used only as an exception.
We might also want to compute the total of incoming and outgoing traffic on an interface, and a calculated item would work well here. The formula would be like this:
last(net.if.it[enp0s8])+last(net.if.out[enp0s8])
Did you spot how we quoted item keys in the first example, but not here? The reason is that calculated item formula entries follow a syntax of function(key,function_parameter_1,function_parameter_2...)
. The item keys we referenced for the disk space item had commas in them like this—vfs.fs.size[/,total]
. If we did not quote the keys, Zabbix would interpret them as the key being vfs.fs.size[/ with a function parameter of total]
. That would not work.
The items we referenced had relatively simple keys—one or two parameters, no quoting. When the referenced items get more complicated, it is a common mistake to get quoting wrong. That in turn makes the item not work properly or at all. Let's look at the formula that we used to calculate free disk space:
last("vfs.fs.size[/,total]")-last("vfs.fs.size[/,used]")
The referenced item keys had no quoting. But what if the keys have the filesystem parameter quoted like this:
vfs.fs.size["/",total]
We would have to escape the inner quotes with backslashes:
last("vfs.fs.size["/",total]")-last("vfs.fs.size["/",used]")
The more quoting the referenced items have, the more complicated the calculated item formula gets. If such a calculated item does not seem to work properly for you, check the escaping very, very carefully. Quite often users have even reported some behavior as a bug which turns out to be a misunderstanding about the quoting.
The calculated items we have created so far referenced items on a single host or template. We just supplied item keys to the functions. We may also reference items from multiple hosts in a calculated item—in that case, the formula syntax changes slightly. The only thing we have to do is prefix the item key with the hostname, separated by a colon— the same as in the trigger expressions:
function(host:item_key)
Let's configure an item that would compute the average CPU load on both of our hosts. Navigate to Configuration | Hosts, click on Items next to A test host, and click on Create item. Fill in the following:
Average system load for both servers
calc.system.cpu.load.avg
(last(A test host:system.cpu.load)+last(Another host:system.cpu.load))/2
When done, click on the Add button at the bottom.
For triggers, when we referenced items, those triggers were associated with the hosts which the items came from. Calculated items also reference items, but they are always created on a single, specific host. The item we created will reside on A test host
only. This means that such an item could also reside on a host which is not included in the formula—for example, some calculation across a cluster could be done on a meta-host which holds cluster-wide items but is not directly monitored itself.
Let's see whether this item works in Monitoring | Latest data. Make sure both of our hosts are shown and expand all entries. Look for three values—CPU load both for A test host and Another host, as well as Average system load for both servers:
The value seems to be properly calculated. It could now be used like any normal item, maybe by including it and individual CPU load items from both hosts in a single graph. But if we look at the values, the system loads for individual hosts are 0.94 and 0.01, but the average is calculated as 0.46. If we calculate it manually, it should be 0.475—or 0.48, if rounding to two decimal places. Why such a difference? Data for both items that the calculated item depends on comes in at different intervals, and the calculated value also is computed at a slightly different time, thus, while the value itself is correct, it might not match the exact average of values seen at any given time. Here, both CPU load items had some values, the calculated average was correctly computed. Then, one or both of the CPU load items got new values, but the calculated item has not been updated with them yet.
We discuss a few additional aspects regarding calculated items in Chapter 12, Automating Configuration.
The calculated items allowed us to write a specific formula, referencing exact individual items. This worked well for small-scale calculations, but the CPU load item we created last would be very hard to create and maintain for dozens of hosts—and impossible for hundreds. If we want to calculate something for the same item key across many hosts, we would probably opt for aggregate items. They would allow us to find out the average load on a cluster, or the total available disk space for a group of file servers, without naming each item individually. Same as the calculated items, the result would be a normal item that could be used in triggers or graphs.
To find out what we can use in such a situation, go to Configuration | Hosts, select Linux servers in the Group dropdown and click on Items next to A test host, then click on Create item. Now we have to figure out what item type to use. Expand the Type dropdown and look for an entry named Zabbix aggregate. That's the one we need, so choose it and click on Select next to the Key field. Currently, the key is listed as grpfunc
, but that's just a placeholder—click on it. We have to replace it with the actual group key—one of grpsum
, grpmin
, grpmax
, or grpavg
. We'll calculate the average for several hosts, so change it to grpavg
. This key, or group function, takes several parameters:
group
: As the name suggests, the host group name goes here. Enter Linux servers
for this parameter.key
: The key for the item to be used in calculations. Enter system.cpu.load
here.func
: A function used to retrieve data from individual items on hosts. While multiple functions are available, in this case, we'll want to find out what the latest load is. Enter last
for this field.param
: A parameter for the function above, following the same rules as normal function parameters (specifying either seconds or value count, prefixed with #
). The function we used, last()
, can be used without a parameter, thus simply remove the last comma and the placeholder that follows it.For individual item data, the following functions are supported:
Function |
Details |
---|---|
|
Average value |
|
Number of values |
|
Last value |
|
Maximum value |
|
Minimum value |
|
Sum of values |
For aggregate items, two levels of functions are available. They are nested—first, the function specified as the func
parameter gathers the required data from all hosts in the group. Then, grpfunc
(grpavg
in our case) calculates the final result from all the intermediate results retrieved by func
.
The final item key should be grpavg[Linux servers,system.cpu.load,last]
.
To finish the item configuration, fill in the following:
Average system load for Linux servers
The final item configuration should look like this:
When done, click on the Add button at the bottom. Go to Monitoring | Latest data, make sure all hosts are shown and look for the three values again—CPU load both for A test host and Another host, as well as Average system load for Linux servers:
Same as before, the computed average across both hosts does not match our result if we look at the values on individual hosts—and the reason is exactly the same as with the calculated items.
As the key parameters indicate, an aggregate item can be calculated for a host group—there is no way to pick individual hosts. Creating a new group is required if arbitrary hosts must have an aggregate item calculated for them. We discussed other benefits from careful host group planning in Chapter 5, Managing Hosts, Users, and Permissions.
We used the grpavg
aggregate function to find out the average load for a group of servers, but there are other functions:
This way, a limited set of functions can be applied across a large number of hosts. While less flexible than calculated items, it is much more practical in case we want to do such a calculation for a group that includes hundreds of hosts. Additionally, a calculated item has to be updated whenever a host or item is to be added or removed from the calculations. An aggregate item will automatically find all the relevant hosts and items. Note that only enabled items on enabled hosts will be considered.
Nothing limits the usage of aggregate items by servers. They can also be used on any other class of devices, such as calculating average CPU load for a group of switches, monitored over SNMP.
The basic syntax allows us to specify one host group. Although we mentioned earlier that aggregating across arbitrary hosts would require creating a new group, there is one more possibility—an aggregate item may reference several host groups. If we modified our aggregate item key to also include hosts in a "Solaris servers
" group, it would look like this:
grpavg[[Linux servers,Solaris servers],system.cpu.load,last]
That is, multiple groups can be specified as comma-delimited entries in square brackets. If any host appears in several of those groups, the item from that host would be included only once in the calculation. There is no strict limit on the host group count here, although both readability and overall item key length limit—2,048 characters—should be taken into account.