Nagios provides a very powerful feature called adaptive monitoring that allows the modification of various check-related parameters on the fly. This is done by sending a command to the Nagios external command pipe.
The first thing that can be changed on the fly is the command to be executed by Nagios, along with the attributes that will be passed to it-an equivalent of the check_command
directive in the object definition. In order to do that we can use the CHANGE_HOST_CHECK_COMMAND
or CHANGE_SVC_CHECK_COMMAND
command. These require the hostname, or the hostname and service description, and the check command as arguments.
This can be used to actually change how hosts or services are checked, or to only modify parameters that are passed to the check commands-for example, a check for ping latency can be modified based on whether a primary or a backup connection is used. An example to change a check command of a service, which changes the command and its specified parameters, is as follows:
[1206096000] CHANGE_SVC_CHECK_COMMAND;linux1;PING;check_ping!500.0,50%
A similar possibility is to change the custom variables that are used later in a check command. An example where the following command and service are used is:
define command { command_name check-ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -p $_SERVICEPACKETS$ -w $_SERVICEWARNING$ -c $_SERVICECRITICAL$ } define service { host_name linux2 service_description PING use ping check_command check-ping _PACKETS 5 _WARNING 100.0,40% _CRITICAL 300.0,60% }
This example is very similar to the one we saw earlier. The main benefit is that parameters can be set independently—for example, one event handler might modify the number of packets to send while another one can modify the warning and/or critical state limits.
The following is an example to modify the warning level for the ping service on a linux1
host:
[1206096000] CHANGE_CUSTOM_SVC_VAR;linux1;PING;_WARNING;500.0,50%
It is also possible to modify event handlers on the fly. This can be used to enable or disable scripts that try to resolve a problem. To do this, you need to use the CHANGE_HOST_EVENT_HANDLER
and CHANGE_SVC_EVENT_HANDLER
commands.
In order to set an event handler command for the Apache2 service mentioned previously in this section, you need to send the following command:
[1206096000] CHANGE_SVC_EVENT_HANDLER;localhost;webserver; restart-apache2
Please note that setting an empty event handler disables any previous event handlers for this host or service. The same comment also applies for modifying the check command definition. In case you are modifying commands or event handlers, please make sure that the corresponding command definitions actually exist; otherwise, Nagios might reject your modifications.
Another feature that you can use to fine-tune the execution of checks is the ability to modify the time period during which a check should be performed. This is done with the CHANGE_HOST_CHECK_TIMEPERIOD
and CHANGE_SVC_CHECK_TIMEPERIOD
commands. Similar to the previous commands, these accept the host, or host and service names, and the new time period to be set. See the following example:
[1206096000] CHANGE_SVC_CHECK_TIMEPERIOD;localhost;webserver; workinghours
As is the case with command names, you need to make sure that the time period you are requesting to be set exists in the Nagios configuration. Otherwise, Nagios will ignore this command and leave the current check time period.
Nagios also allows modifying intervals between checks—both for the normal checks, and retrying during soft states. This is done through the CHANGE_NORMAL_HOST_CHECK_
, CHANGE_RETRY_HOST_CHECK_INTERVAL
, CHANGE_NORMAL_SVC_CHECK_INTERVAL
,and CHANGE_RETRY_SVC_CHECK_INTERVAL
commands. All of these commands require passing the host, or the host and service names, as well as the intervals that should be set.
A typical example of when intervals would be modified on the fly is when the priority of a host or service relies on other parameters in your network. An example might be a failover server—which will only be run if the primary server is down.
Making sure that the host and all of the services on it are working properly is very important before actually performing scheduled backups. During idle time, its priority might be much lower. Another issue might be that monitoring the failover server should be performed more often in case the primary server fails.
An example to modify the normal interval for a host to every 15 minutes is as follows:
[1206096000] CHANGE_NORMAL_HOST_CHECK_INTERVAL;backupserver;15
There is also the possibility to modify how many checks need to be performed before a state is considered to be hard. The commands for this are CHANGE_MAX_HOST_CHECK_ATTEMPTS
and CHANGE_MAX_SVC_CHECK_ATTEMPTS
The following is an example command to modify max retries for a host to 5
:
[1206096000] CHANGE_MAX_HOST_CHECK_ATTEMPTS;linux1;5
There are many more commands that allow the fine tuning of monitoring and check settings on the fly. It is recommended that you get acquainted with all of the external commands that your version of Nagios supports, as mentioned in the section introducing the external commands pipe.