Reusing data on the server

The items we have used so far were collecting data from some Zabbix agent or SNMP device. It is also possible to reuse this data in some calculation, store the result and treat it as a normal item to be used for graphs, triggers, and other purposes. Zabbix offers two types of such items:

  • Calculated items require writing exact formulas and referencing each individual item. They are more flexible than the aggregate items, but are not feasible over a large number of items and have to be manually adjusted if the items to be included in the calculation change.
  • Aggregate items operate on items that share the same key across a host group. Minimum, maximum, sum, or average can be computed. They cannot be used on multiple items on the same host, but if hosts are added to the group or removed from it, no adjustments are required for the aggregate item.

Calculated items

We will start with calculated items that require typing in a formula. We are already monitoring total and used disk space. If we additionally wanted to monitor free disk space, we could query the agent for this information. This is where calculated items come in—if the agent or device does not expose a specific view of the data, or if we would like to avoid querying monitored hosts, we can do the calculation from the already retrieved values. To create a calculated item that would compute the free disk space, navigate to Configuration | Hosts, click on Items next to A test host, and then click on Create item. Fill in the following information:

  • Name: Diskspace on / (free)
  • Type: Calculated
  • Key: calc.vfs.fs.size[/,free]
  • Formula: last("vfs.fs.size[/,total]")-last("vfs.fs.size[/,used]")
  • Units: B
  • Update interval: 1800

When done, click on the Add button at the bottom.

Tip

We chose a key that would not clash with the native key in case somebody decides to use that later, but we are free to use any key for calculated items.

All the referenced items must exist. We cannot enter keys here and have them gather data by extension from the calculated item. Values to compute the calculated item are retrieved from the Zabbix server caches or the database; no connections are made to the monitored devices.

With this item added, let's go to the Latest data page. As the interval was set to 1,800 seconds, we might have to wait a bit longer to see the value, but eventually it should appear:

Calculated items

Tip

If the item turns unsupported, check the error message and make sure the formula you typed in is correct.

The interval we used, 1,800 seconds, was not matched to the intervals of both referenced items. Total and used disk space items were collecting data every 3,600 seconds, but calculated items are not connected to the data collection of the referenced items in any way. A calculated item is not evaluated when the referenced items get values—it follows its own scheduling, which is completely independent from the schedules of the referenced items, and is semi-random. If the referenced items stopped collecting data, our calculated item would keep on using the latest value for the calculation, as we used the last() function. If one of them stopped collecting data, we would base our calculation on one recent and one outdated value. And if our calculated item could get very incorrect results if called at the wrong time because one of the referenced items has significantly changed but the other has not received a new value yet, there is no easy solution to that, unfortunately. The custom scheduling, discussed in Chapter 3, Monitoring with Zabbix Agents and Basic Protocols, could help here, but it could also introduce performance issues by polling values in uneven batches—and it would also be more complicated to manage. It is suggested to be used only as an exception.

Tip

The free disk space that we calculated might not match the "available" diskspace reported by system tools. Many filesystems and operating systems reserve some space which does not count as used, but counts against the available disk space.

We might also want to compute the total of incoming and outgoing traffic on an interface, and a calculated item would work well here. The formula would be like this:

last(net.if.it[enp0s8])+last(net.if.out[enp0s8])

Tip

Did you spot how we quoted item keys in the first example, but not here? The reason is that calculated item formula entries follow a syntax of function(key,function_parameter_1,function_parameter_2...). The item keys we referenced for the disk space item had commas in them like this—vfs.fs.size[/,total]. If we did not quote the keys, Zabbix would interpret them as the key being vfs.fs.size[/ with a function parameter of total]. That would not work.

Quoting in calculated items

The items we referenced had relatively simple keys—one or two parameters, no quoting. When the referenced items get more complicated, it is a common mistake to get quoting wrong. That in turn makes the item not work properly or at all. Let's look at the formula that we used to calculate free disk space:

last("vfs.fs.size[/,total]")-last("vfs.fs.size[/,used]")

The referenced item keys had no quoting. But what if the keys have the filesystem parameter quoted like this:

vfs.fs.size["/",total]

We would have to escape the inner quotes with backslashes:

last("vfs.fs.size["/",total]")-last("vfs.fs.size["/",used]")

The more quoting the referenced items have, the more complicated the calculated item formula gets. If such a calculated item does not seem to work properly for you, check the escaping very, very carefully. Quite often users have even reported some behavior as a bug which turns out to be a misunderstanding about the quoting.

Referencing items from multiple hosts

The calculated items we have created so far referenced items on a single host or template. We just supplied item keys to the functions. We may also reference items from multiple hosts in a calculated item—in that case, the formula syntax changes slightly. The only thing we have to do is prefix the item key with the hostname, separated by a colon— the same as in the trigger expressions:

function(host:item_key)

Let's configure an item that would compute the average CPU load on both of our hosts. Navigate to Configuration | Hosts, click on Items next to A test host, and click on Create item. Fill in the following:

  • Name: Average system load for both servers
  • Type: Calculated
  • Key: calc.system.cpu.load.avg
  • Formula: (last(A test host:system.cpu.load)+last(Another host:system.cpu.load))/2
  • Type of information: Numeric (float)

When done, click on the Add button at the bottom.

For triggers, when we referenced items, those triggers were associated with the hosts which the items came from. Calculated items also reference items, but they are always created on a single, specific host. The item we created will reside on A test host only. This means that such an item could also reside on a host which is not included in the formula—for example, some calculation across a cluster could be done on a meta-host which holds cluster-wide items but is not directly monitored itself.

Let's see whether this item works in Monitoring | Latest data. Make sure both of our hosts are shown and expand all entries. Look for three values—CPU load both for A test host and Another host, as well as Average system load for both servers:

Referencing items from multiple hosts

Tip

You can filter by "load" in the item names to see only relevant entries.

The value seems to be properly calculated. It could now be used like any normal item, maybe by including it and individual CPU load items from both hosts in a single graph. But if we look at the values, the system loads for individual hosts are 0.94 and 0.01, but the average is calculated as 0.46. If we calculate it manually, it should be 0.475—or 0.48, if rounding to two decimal places. Why such a difference? Data for both items that the calculated item depends on comes in at different intervals, and the calculated value also is computed at a slightly different time, thus, while the value itself is correct, it might not match the exact average of values seen at any given time. Here, both CPU load items had some values, the calculated average was correctly computed. Then, one or both of the CPU load items got new values, but the calculated item has not been updated with them yet.

Tip

We discuss a few additional aspects regarding calculated items in Chapter 12, Automating Configuration.

Aggregate items

The calculated items allowed us to write a specific formula, referencing exact individual items. This worked well for small-scale calculations, but the CPU load item we created last would be very hard to create and maintain for dozens of hosts—and impossible for hundreds. If we want to calculate something for the same item key across many hosts, we would probably opt for aggregate items. They would allow us to find out the average load on a cluster, or the total available disk space for a group of file servers, without naming each item individually. Same as the calculated items, the result would be a normal item that could be used in triggers or graphs.

To find out what we can use in such a situation, go to Configuration | Hosts, select Linux servers in the Group dropdown and click on Items next to A test host, then click on Create item. Now we have to figure out what item type to use. Expand the Type dropdown and look for an entry named Zabbix aggregate. That's the one we need, so choose it and click on Select next to the Key field. Currently, the key is listed as grpfunc, but that's just a placeholder—click on it. We have to replace it with the actual group key—one of grpsum, grpmin, grpmax, or grpavg. We'll calculate the average for several hosts, so change it to grpavg. This key, or group function, takes several parameters:

  • group: As the name suggests, the host group name goes here. Enter Linux servers for this parameter.
  • key: The key for the item to be used in calculations. Enter system.cpu.load here.
  • func: A function used to retrieve data from individual items on hosts. While multiple functions are available, in this case, we'll want to find out what the latest load is. Enter last for this field.
  • param: A parameter for the function above, following the same rules as normal function parameters (specifying either seconds or value count, prefixed with #). The function we used, last(), can be used without a parameter, thus simply remove the last comma and the placeholder that follows it.

For individual item data, the following functions are supported:

Function

Details

avg

Average value

count

Number of values

last

Last value

max

Maximum value

min

Minimum value

sum

Sum of values

For aggregate items, two levels of functions are available. They are nested—first, the function specified as the func parameter gathers the required data from all hosts in the group. Then, grpfunc (grpavg in our case) calculates the final result from all the intermediate results retrieved by func.

Note

All the referenced items must exist. We cannot enter keys here and have them gather data by extension from the aggregate item. Values to compute the calculated item are retrieved from the Zabbix server caches or the database; no connections are made to the monitored devices.

The final item key should be grpavg[Linux servers,system.cpu.load,last].

Tip

If the referenced item key had parameters, we would have to quote it.

To finish the item configuration, fill in the following:

  • Name: Average system load for Linux servers
  • Type of information: Numeric (float)

The final item configuration should look like this:

Aggregate items

When done, click on the Add button at the bottom. Go to Monitoring | Latest data, make sure all hosts are shown and look for the three values again—CPU load both for A test host and Another host, as well as Average system load for Linux servers:

Aggregate items

Tip

You can filter by "load" in the item names again.

Same as before, the computed average across both hosts does not match our result if we look at the values on individual hosts—and the reason is exactly the same as with the calculated items.

Tip

As the key parameters indicate, an aggregate item can be calculated for a host group—there is no way to pick individual hosts. Creating a new group is required if arbitrary hosts must have an aggregate item calculated for them. We discussed other benefits from careful host group planning in Chapter 5, Managing Hosts, Users, and Permissions.

We used the grpavg aggregate function to find out the average load for a group of servers, but there are other functions:

Function

Details

grpmax

Maximum value is reported. One could find out what the maximum SQL queries per second are across a group of database servers.

grpmin

Minimum value is reported. The minimum free space for a group of file servers could be determined.

grpsum

Values for the whole group are summed. Total number of HTTP sessions could be calculated for a group of web servers.

This way, a limited set of functions can be applied across a large number of hosts. While less flexible than calculated items, it is much more practical in case we want to do such a calculation for a group that includes hundreds of hosts. Additionally, a calculated item has to be updated whenever a host or item is to be added or removed from the calculations. An aggregate item will automatically find all the relevant hosts and items. Note that only enabled items on enabled hosts will be considered.

Nothing limits the usage of aggregate items by servers. They can also be used on any other class of devices, such as calculating average CPU load for a group of switches, monitored over SNMP.

Aggregating across multiple groups

The basic syntax allows us to specify one host group. Although we mentioned earlier that aggregating across arbitrary hosts would require creating a new group, there is one more possibility—an aggregate item may reference several host groups. If we modified our aggregate item key to also include hosts in a "Solaris servers" group, it would look like this:

grpavg[[Linux servers,Solaris servers],system.cpu.load,last]

That is, multiple groups can be specified as comma-delimited entries in square brackets. If any host appears in several of those groups, the item from that host would be included only once in the calculation. There is no strict limit on the host group count here, although both readability and overall item key length limit—2,048 characters—should be taken into account.

Tip

Both calculated and aggregate items can reuse values from any other item, including calculated and aggregate items. They can also be used in triggers, graphs, network map labels, and anywhere else where other items can be used.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset