So far, we have discussed using web-based applications to view the current status and manage things such as downtimes or comments.
There are also multiple tools that let us perform the same operations from the command line in a convenient way.
One tool that provides an easy way to manage Nagios and view its data from command line is nagios_commander.
This is a shell script that communicates with Nagios using the web interface, using HTTP-based authentication. Since it is communicating over the network, the script can be run on any machine, not only on the machine where Nagios is running. It can also be used to manage multiple Nagios instances from a single machine.
All that is needed is to have the curl command available on your machine. For Ubuntu-based distributions, we'll need to run the following command:
root@ubuntu:~# apt-get -y install bsdmainutils curl
For CentOS, RHEL, and Oracle Linux, the command is:
[rootcentos ~]# yum install -y curl
Next, all that we have to do is download the nagios_commander script using the following commands:
root@ubuntu:~# curl -sSL https://raw.github.com/brandoconnor/nagios_commander/master/nagios_commander.sh >/usr/local/bin/nagios_commander root@ubuntu:~# chmod 0755 /usr/local/bin/nagios_commander
After that nagios_commander will work properly.
The command takes the URL to the Nagios web interface and the username and password from the command line using the -n
, -u
and -p
arguments, respectively:
# nagios_commander -n 127.0.0.1/nagios -u nagiosadmin -p nagiosadmin -q list -h Hostname Status linuxbox01 UP localhost UP
The preceding command will list all hosts on our Nagios instance and print their status. The -q list -h
command indicates a list of hosts that will be printed, and will be described in more detail later in this section.
It is also a good idea to create an alias or a helper script that will not require passing the location, username, and password on each invocation.
# alias ncmd='nagios_commander -n 127.0.0.1/nagios -u nagiosadmin -p nagiosadmin'
To be able to use it in all shells and not just the current one, the alias can be put in shell initialization scripts, such as .bash_aliases
, in your home directory if you are using the bash shell.
This way we can simply call:
# ncmd -q list -h
Also, this should return the same result as the original command we invoked earlier.
The nagios_commander allows specifying a context for which a command is run. If not specified (or specified as an empty value for -h option), the context is global.
It is possible to run commands for specific hosts and hostgroups using the -h
and -H
options, respectively. The first one specifies that a specific host should be used. The -H
option allows querying a specific hostgroup. For example, the -q list -h localhost
command indicates that services for the host localhost should be shown.
# ncmd -q list -h localhost Fetching services and health on localhost --- Service State --- Current+Load OK Current+Users OK HTTP OK PING OK Root+Partition OK SSH OK Swap+Usage OK Total+Processes OK
Similarly the -H
option can be used to list the status of all hosts inside a hostgroup:
ncmd -q list -H linux-servers Hostname Status localhost UP linuxbox01 UP
The -s
option allow specifying services to run the query or command against. Similarly, the -S
option can be used to run a command against a service group. These are only used when running commands to manage and/or acknowledge downtimes.
The -q
option allows us to go information from Nagios. The following table shows the available query types:
Command |
Contexts |
Description |
|
global, host |
Lists all hosts or services, depending on the context |
|
host |
Lists host downtimes for all hosts or a specific host/hostgroup |
|
service |
Lists all service downtimes for all hosts, a specific host/hostgroup or for specific service/service group only |
|
global |
Shows whether notification sending is enabled |
|
global |
Shows whether running event handlers is enabled |
|
global |
Shows whether performing active service checks is enabled |
|
global |
Shows whether performing active host checks is enabled |
|
global |
Shows whether accepting passive service check results is enabled |
|
global |
Shows whether accepting passive host check results is enabled |
Event handlers and notifications are described in more detail in Chapter 8, Notifications and Events. The concept of passive checks is explained in more detail in Chapter 9, Passive Checks and NRDP.
The -c
option allows us to change Nagios settings and/or manage host and service downtimes from the command line. The first argument is the action to perform and the second argument is the scope. The flag also takes a third argument when the Nagios settings are to be changed.
To change any Nagios settings, the action has to be set and the scope should be notifications
, event_handlers
, active_svc_checks
, active_host_checks
, passive_svc_checks
, or passive_host_checks
. The third argument should either be enable
or disable
. For example, to disable or enable sending notifications we can run:
# ncmd -c set notifications disable # ncmd -c set notifications enable
Another possibility is to manage downtimes. In this case, the action should either be set
, del
, or ack
to add a downtime, delete it, or acknowledge a problem, respectively. The -h
, -H
, -s
, and -S
options can be used to specify the host, hostgroup, and service or service group the downtime is related to.
When adding downtime or acknowledging a problem, it is also required to specify a comment and planned downtime. The -C
option is used to specify a comment, and the -t
option specifies time in minutes.
For example, to add a downtime for two hours for the localhost
host, we can use:
# ncmd -c add downtime -C "Planned downtime" -t 120 -h localhost
We can then check the downtime by running the following command:
# ncmd -q host_downtime
The output of the preceding command will be as follows:
Hostname Downtime-id End_date_and_time Author Comment localhost 1 2-14-2016 20:49:46 Nagios Admin Planned downtime
The downtime id is the unique identifier of a downtime. In order to delete a downtime, we need to know its id and delete it using the del
action:
# ncmd -c del downtime -d 1
This will delete a downtime with id 1
.
The ack
action can be invoked in order to acknowledge a problem. The command itself does not require any additional argument; the only required flag is -C
to indicate the comment for acknowledgement, as shown here:
# ncmd -c ack -h localhost -s SSH -C "SSH upgrade in progress, will be up soon"
This will add a new acknowledgement for service SSH on localhost.
Another command-line-based tool is nagios-cli, which provides a shell-like interface for Nagios. This is an open source project, whose homepage is http://nagios-cli.maze.io/ and its source code is in GitHub at https://github.com/tehmaze/nagios-cli. This tool reads the Nagios status file and sends commands using the Nagios pipe. It has to be run on the same machine or container where the Nagios service is running.
To install nagios-cli, we first need to install the prerequisites, which include Python, pip tool for installing the Python package, readline
library, and development packages for those as well as Git to be able to retrieve nagios-cli itself.
On Debian and Ubuntu, the command to install the prerequisites is:
root@ubuntu:~# apt-get -y install patch python python-pip libpython-dev libncurses-dev libreadline-dev git
For CentOS, RHEL, and Oracle Linux, the command is:
[root@centos ~]# yum install -y patch python python-devel python-pip git readline-devel
Installing nagios-cli also requires some of the prerequisites for building Nagios. If the machine where nagios-cli will be run does not have them, it is recommended that you install them as well. The dependencies for different Linux distributions are described in more details in Chapter 2, Installing Nagios 4.
The next step is to install the readline Python package by running pip:
# pip install readline
After that, we can retrieve the nagios-cli source package by running the following command:
# git clone https://github.com/tehmaze/nagios-cli.git
This will retrieve the latest version of source code in a new directory called nagios-cli. We now need to install it by running:
# cd nagios-cli ; python setup.py install
This will install the nagios-cli
binary into the /usr/local/bin
directory. Next, we need to create a configuration file in /etc/nagios/nagios-cli.cfg
with the following contents:
[nagios] log = /var/nagios command_file = %(log)s/rw/nagios.cmd log_file = %(log)s/nagios.log object_cache_file = %(log)s/objects.cache status_file = %(log)s/status.dat
This will specify nagios-cli where the Nagios data is kept. Next, we can run the tool using the following command:
# nagios-cli -c /etc/nagios/nagios-cli.cfg
This will start the interactive shell. The shell accepts commands similar to any other Unix shell, as commands and arguments separated by space. It also supports tab-based expansion of arguments, such as for host
and service
commands, where it will auto expand host and service names, respectively.
The tool also provides the help
command which provides all currently available commands. For example:
nagios > help Global commands: .. EOF about configure exit help host license quit tail Local commands: list ls
We can now issue the ls
or list
command to list hosts. For example:
nagios > ls linuxbox01 localhost
This will list all the hosts currently configured in Nagios. In this example, this includes linuxbox01
and localhost
.
To change the context to a specific host, simply call the host
command by providing the name of a host. The ls
or list
command will list all services, and the service
command can be used for changing the context to a specific service for a specific host. Commands ..
or the EOF
commands can be used to go back to global context.
For example:
nagios > host localhost nagios (host) localhost> ls Current-Load Current-Users HTTP PING Root-Partition SSH Swap-Usage Total-Processes nagios (host) localhost> service SSH nagios (host) localhost SSH> .. nagios (host) localhost > .. nagios >
When in the context of a host or service, the status
command will report information about the current host and/or service, as shown here:
nagios (host) localhost> status host name : localhost current state : OK plugin output : PING OK - Packet loss = 0%, RTA = 0.10 ms (...) service : SSH OK service : Swap Usage OK service : Total Processes OK nagios (host) localhost> service SSH nagios (host) localhost SSH> status host name : localhost service description : SSH current state : OK (...)
The preceding example shows only partial output from the status commands.
The check
and acknowledge
commands can be used to check the current status and acknowledge a problem for the current host or service. For example:
nagios (host) localhost SSH> check Service check scheduled nagios (host) localhost SSH> acknowledge comment : Reinstalling service sticky [Yn]: n notify [Yn]: n persistent [Yn]: n Service problem acknowledged
The following table shows key commands available in nagios-cli:
Command |
Contexts |
Description |
|
Always |
Provides a list of commands valid in the current scope |
|
Always |
Changes the context to a specific host |
|
Host |
Changes the context to a specific service in the current host |
|
Host or service |
Returns to the preceding context, that is from the service context to the host context or from host to global |
|
Host or service |
Returns to the preceding context, that is from the service context to the host context or from host to global |
|
Host or service |
Prints the detailed status for a host or service |
|
Host or service |
Forces a check to be made for a host or service |
|
Host or service |
Acknowledges a problem for a host or service |