The Nagios plugin check_multi
is a convenient tool to execute multiple checks within a single check command that generates an overall returned state and output from it. Here in this recipe, we will show you how to set it up and use it to quickly monitor a list of important system metrics on your clients.
It is assumed that you've gone through this chapter recipe by recipe, therefore by now, you should have a Nagios server running and another client computer that you want to monitor, which can already be accessed via its NRPE service externally by our Nagios server. This client computer that you want to monitor needs an installation of the CentOS 7 operating system with root privileges and a console-based text editor of your choice installed on it, as well as a connection to the Internet in order to facilitate the download of additional packages. The client computer will have the IP address 192.168.1.8
.
The check_multi
Nagios plugin is available from Github, so we will begin this recipe to install the git
program by downloading it:
yum install git
check_multi
plugin by compiling it from the source:cd /tmp;git clone git://github.com/flackem/check_multi;cd /tmp/check_multi ./configure --with-nagios-name=nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-plugin-path=/usr/lib64/nagios/plugins --libexecdir=/usr/lib64/nagios/plugins/ make all;make install;make install-config
check_mem
, which is not available in the CentOS 7 Nagios plugin rpms
:cd /tmp;git clone https://github.com/justintime/nagios-plugins.git cp /tmp/nagios-plugins/check_mem/check_mem.pl /usr/lib64/nagios/plugins/
check_multi
command file that will contain all your desired client checks that you want to combine in a single run; open the following file:vi /usr/local/nagios/etc/check_multi/check_multi.cmd
command[ sys_load::check_load ] = check_load -w 5,4,3 -c 10,8,6 command[ sys_mem::check_mem ] = check_mem.pl -w 10 -c 5 -f -C command[ sys_users::check_users ] = check_users -w 5 -c 10 command[ sys_disks::check_disk ] = check_disk -w 5% -c 2% -X nfs command[ sys_procs::check_procs ] = check_procs
/usr/lib64/nagios/plugins/check_multi -f /usr/local/nagios/etc/check_multi/check_multi.cmd
OK - 5 plugins checked
. Next, we will install this new command in the NRPE service on our client so that the Nagios server is able to execute it remotely by calling its name. Open the NRPE configuration file:vi /etc/nagios/nrpe.cfg
# command
line to expose a new command called check_multicmd
to our Nagios server:command[check_multicmd]=/usr/lib64/nagios/plugins/check_multi -f /usr/local/nagios/etc/check_multi/check_multi.cmd
systemctl restart nrpe
check_multicmd
command that we defined in the last step from our Nagios server. Log in as root and type the following command (change the IP address of your client, 192.168.1.8
, appropriately):/usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.8 -c "check_multicmd"
vi /etc/nagios/objects/commands.cfg
check_nrpe_multi
, which we can use in any service definition:define command { command_name check_nrpe_multi command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c "check_multicmd" }
vi /etc/nagios/servers/192.168.1.8.cfg
define host { use linux-server host_name host1 address 192.168.1.22 contact_groups unix-admins } define service { use generic-service host_name host1 check_command check_nrpe_multi normal_check_interval 15 service_description check_nrpe_multi service }
vi /etc/nagios/objects/contacts.cfg
define contactgroup{ contactgroup_name unix-admins alias Unix Administrators } define contact { contact_name pelz use generic-contact alias Oliver Pelz contactgroups unix-admins email [email protected] }
systemctl restart nagios
We started this recipe by installing the check_multi
and check_mem
plugins from their author's Github repositories; they are plain command-line tools. Nagios performs checks by running such external commands, and it uses the return code along with output from the command as information on whether the check was successful or not. Nagios has a very flexible architecture that can be easily extended using plugins, add-ons, and extensions. A central place to search for all kinds of extensions is at https://exchange.nagios.org/. Next, we added a new command file for check_multi
, where we put five different system check_
commands in. These checks act as a starting point for customizing your monitoring needs and will check system load, memory consumption, system users, free space, and processes. All available check_
commands can be found at /usr/lib64/nagios/plugins/check_*
. As you can see in our command file, the parameters of those check_
commands can be very different, and explaining them all is out of the scope of this recipe. Most of them are used to set threshold values to reach a certain state, for example, the CRITICAL
state. To get more information about a specific command, use the --help
parameter with the command. For example, to find out what all the parameters in the check_load -w 5,4,3 -c 10,8,6
command are doing, use run /usr/lib64/nagios/plugins/check_load --help
. You can easily add any number of new check commands to our command file from existing plugins, or you can download and install any new commands, if you like. There are also a number of command file examples shipped with the check_multi
plugin, which are very useful for learning, so please have a look at the directory: /usr/local/nagios/etc/check_multi/*.cmd
.
Afterwards, we checked the correctness of our new command file that we just created by dry-running it as an -f
parameter from the check_multi
command locally on the client. In its output, you will find all the single outputs as if you would have run these five commands individually. If one single check fails, the complete check_multi
will do. Next, we defined a new NRPE command in the NRPE config file called check_multicmd
that can then be executed from the Nagios server, which we tested in the next step from our Nagios server. For a test to be successful, we expect the same results as we got when calling the command from the client itself. Afterwards, we defined this command in our commands.cfg
on the Nagios server so that we can reuse it as much as we like in any service definition by referencing the command's name, check_nrpe_multi
. Next, we created a new server file named as the IP address (you can name it anything you like as long it has the .cfg
extension in the directory) of the client we want to monitor: 192.168.1.8.cfg
. It contains exactly one host definition and one or multiple service definitions, which are linked by the value of host_name
of the host with the host_name
value in your service definitions.
In the host definition, we defined a contact_groups
contact that links to the contacts.cfg
file's contact group and contact entry. These will be used to send notification e-mails if the checked service has any errors. The most important value in the service definition is the check_command check_nrpe_multi
line, which executes the command that we created before as our one and only check. Also, the normal_check_interval
is important as it defines how often the service will be checked under normal conditions. Here, it gets checked every 15 minutes. You can add as many service definitions to a host as you like.
Now, go to your Nagios web frontend to inspect your new host and service. Here, go to the Hosts tab, where you will see the new host, host1, that you defined in this recipe, and it should give you information about its status. If you click on the Services tab, you will see the check_nrpe_multi service. It should show the Status as Pending, OK, or CRITICAL, depending on the success of the single checks. If you click on its check_nrpe_multi link, you will see details about the checks.
Here in this chapter, we could only show you the very basics of Nagios, and there is always more to learn, so please read the official Nagios Core documentation at https://www.nagios.org, or check out the book Learning Nagios 4, Packt Publishing, by Wojciech Kocjan.