





















































In this article written by Tom Ryder, author of the book Nagios Core Administration Cookbook, Second Edition, we will cover the following topics:
(For more resources related to this topic, see here.)
Nagios Core is perhaps best thought of as a monitoring framework and less as a monitoring tool.Its modular design allows any kind of program that returns appropriate values based on some kind of check as a check_command option for a host or service. This is where the concepts of commands and pluginscome into play.
For Nagios Core, a plugin is any program that can be used to gather information about a host or service. To ensure that a host is responding to ping requests, we'd use a plugin, such as check_ping,which when run against a hostname or address—whether by Nagios Core or not—returns a status code to whatever called it, based on whether a response was received to the pingrequest within a certain period of time. This status code and any accompanying message is what Nagios Core uses to establish the state that a host or service is in.
Plugins are generally just like any other program on a Unix-like system; they can be run from the command line, are subject to permissions and owner restrictions, can be written in any language, can read variables from their environment, and can take parameters and options to modify how they work. Most importantly, they are entirely separate from Nagios Core itself (even if programmed by the same people), and the way that they're used by the application can be changed.
To allow for additional flexibility in how plugins are used, Nagios Core uses these programs according to the terms of a command definition. A command for a specific plugin defines the way in which that plugin is used, including its location in the filesystem, any parameters that should be passed to it, and any other options. In particular, parameters and options often include thresholds for the WARNINGand CRITICAL states.
Nagios Core is usually downloaded and installed alongside a set of plugins called Nagios Plugins, available at https://nagios-plugins.org/, which this article assumes you have installed. These plugins were chosen because they cover the most common needs for a monitoring infrastructure quite well as a set, including checks for common services, such as web services, mail services, DNS services, and others as well as more generic checks, such as whether a TCP or UDP port is accessible and open on a server. It's possible that for most, if not all, of our monitoring needs, we won't need any other plugins—but if we do, Nagios Core makes it possible to use existing plugins in novel ways using custom command definitions, adding third-party plugins written by contributors on the Nagios Exchange website or even writing custom plugins ourselves from scratch in some special cases.
In this recipe, we'll install a custom plugin that we retrieved from Nagios Exchange onto a Nagios Core server so that we can use it in a Nagios Core command, and hence check a service with it.
You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already, and you should have found an appropriate plugin to install to solve some particular monitoring needs. Your Nagios Core server should have Internet connectivity to allow you to download the plugin directly from the website.
In this example, we'll use check_rsync,which is available on the Web at https://exchange.nagios.org/directory/Plugins/Network-Protocols/Rsync/check_rsync/details.
This particular plugin is quite simple,consisting of a single Perlscript with only very basic dependencies. If you want to install this script as an example,the server will also need to have a Perl interpreter installed, for example, in /usr/bin/perl.
This example will also include directly testing a server running an rsync(1)daemon called troy.example.net.
We can download and install a new plugin using the following steps:
OK: Rsync is up
If all of this works, then the plugin is now installed and working correctly.
Because Nagios Core plugins are programs in themselves, all that installing a plugin really amounts to is saving a program or script into an appropriate directory, in this case, /usr/local/nagios/libexec, where all the other plugins live. It's then available to be used the same way as any other plugin.
The next step once the plugin is working is defining a command for it in the Nagios Core configuration so that it can be used to monitor hosts and/or services. This can be done with the Creating a new commandrecipe in this article.
If we inspect the Perl script, we can see a little bit of how it works. It works like any other Perl script except perhaps for the fact that its return valuesare defined in a hash called %ERRORS,and the return values it chooses depend on what happens when it tries to check the rsync(1)process. This is the most important part of implementing a plugin for Nagios Core.
Installation procedures for different plugins vary. In particular, many plugins are written in languages like C, and hence, they need to be compiled. One such plugin is the popular check_nrpe plugin.Rather than simply being saved into a directory and made executable, these sorts of plugins often follow the usual pattern of configuration, compilation, and installation:
$ ./configure
$ make
# make install
For many plugins that are built in this style, the final step of make installwill often install the compiled plugin into the appropriate directory for us. In general, if instructions are included with the plugin, it pays to read them to see how best to install it.
In this recipe, we'll remove a plugin that we no longer need as part of our Nagios Core installation. Perhaps it's not working correctly, the service it monitors is no longer available, or there are security or licensing concerns with its usage.
You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already and have a plugin that you would like to remove from the server. In this instance, we'll remove the now unneeded check_rsync plugin from our Nagios Core server.
We can remove a plugin from our Nagios Core instance using the following steps:
define command {
command_name check_rsync
command_line $USER1$/check_rsync -H $HOSTADDRESS$
}
Using a tool, such as grep(1), can be a good way to find mentions of the command and plugin:
Nagios Core plugins are simply external programs that the server uses to perform checks of hosts and services. If a plugin is no longer needed, all that we need to do is remove references to it in our configuration, if any, and delete it from /usr/local/nagios/libexec.
There's not usually any harm in leaving the plugin's program on the server even if Nagios Core isn't using it. It doesn't slow anything down or cause any other problems, and it may be needed later. Nagios Core plugins are generally quite small programs and should not really cause disk space concerns on a modern server.
In this recipe, we'll create a new command for a plugin that was just installed into the /usr/local/nagios/libexecdirectory in the Nagios Core server. This will define the way in which Nagios Core should use the plugin, and thereby allow it to be used as part of a service definition.
You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already and have a plugin installed for which you'd like to define a new command so that you can use it as part of a service definition. In this instance, we'll define a command for an installed check_rsyncplugin.
We can define a new command in our configuration as follows:
define command {
command_name check_rsync
command_line $USER1$/check_rsync -H $HOSTADDRESS$
}
If the validation passed and the server restarted successfully, we should now be able to use the check_rsync command in a service definition.
The configuration we added to the commands.cfgfile in the preceding steps defines a new command called check_rsync,which specifies a method for using the plugin of the same name to monitor a service. This enables us to use check_rsyncas a value for the check_commanddirective in a service declaration, which might look like this:
define service {
use generic-service
host_name troy.example.net
service_description RSYNC
check_command check_rsync
}
Only two directives are required for command definitions, and we've defined both:
This particular command line also uses the following two macros:
So, if we used the command in a service, checking the rsync(1) server on troy.example.net, the completed command might look like this:
$ /usr/local/nagios/libexec/check_rsync -H troy.example.net
We can run this straight from the command line ourselves as the nagios userto see what kind of results it returns:
$ /usr/local/nagios/libexec/check_rsync -H troy.example.net
OK: Rsync is up
A plugin can be used for more than one command. If we had a particular rsync(1) module, which we wanted to check named backup, we could write another command called check_rsync_backupas follows:
define command {
command_name check_rsync_backup
command_line $USER1$/check_rsync -H $HOSTADDRESS$ -m backup
}
Alternatively, if one or more of our rsync(1) servers were running on an alternate port, say, port 5873, we could define a separate command check_rsync_altport for that:
define command {
command_name check_rsync_altport
command_line $USER1$/check_rsync -H $HOSTADDRESS$ -p 5873
}
Commands can thus be defined as precisely as we need them to be. We explore this in more detail in the Customizing an existing commandrecipe in this article.
In this recipe, we'll customize an existing command definition. There are a number of reasons why you might want to do this, but a common one is if a check is overzealous, sending notifications for the WARNING orCRITICALstates, which aren't actually terribly worrisome, or on the other hand, if a check is too "forgiving" and doesn't flag hosts or services as having problems when it would actually be appropriate to do so.
Another reason is to account for peculiarities in your own network. For example, if you run HTTPdaemons on a large number of hosts in your hosts on the alternative port 8080 that you need to check, it would be convenient to have a check_http_altportcommand available. We can do this by copying and altering the definition for the vanilla check_httpcommand.
You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already. You should also already be familiar with the relationship between services, commands, and plugins.
We can customize an existing command definition as follows:
define command {
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
define command {
command_name check_http_altport
command_line $USER1$/check_http -I $HOSTADDRESS$ -p 8080 $ARG1$
}
If the validation passed and the server restarted successfully, we should now be able to use the check_http_altportcommand, which is based on the original check_httpcommand, in a service definition.
The configuration we added to the commands.cfgfile in the preceding steps reproduces the command definition for check_http,but changes it in two ways:
The check_http_altcommand can now be used as a check command in the same way a check_httpcommand can be used. For example, a service definition that checks whether the sparta.example.nethost is running an HTTP daemon on port 8080 might look something like this:
define service {
use generic-service
host_name sparta.example.net
service_description HTTP_8080
check_command check_http_alt
}
This recipe's title implies that we should customize the existing commands by editing them in-place, and indeed, this works fine if we really do want to do things this way. Instead of copying the command definition, we can just add -p 8080 or any other customization to the command line and change the original command.
However, this is bad practice in most cases, mostly because it can break existing monitoring and can be potentially confusing to other administrators of the Nagios Core server. If we have a special case for monitoring, in this case, checking a nonstandard port for HTTP, then it's wise to create a whole new command based on the existing one with the customisations we need.
Particularly if you share monitoring configuration duties with someone else on your team, changing the command can break the monitoring for anyone who had set up the services using the check_http command beforeyou changed it, meaning that their checks would all start failing because port 8080 would be checked instead.
There is no limit to the number of commands you can define, so you can be very liberal in defining as many alternative commands as you need. It's a good idea to give them instructive names that say something about what they do as well as to add explanatory comments to the configuration file. You can add a comment to the file by prefixing it with a # character:
#
# 'check_http_altport' command_definition. This is to keep track of the
# servers that have administrative panels running on an alternative port
# to confer special privileges to a separate instance of Apache HTTPD
# that we don't want to confer to the one for running public-facing
# websites.
#
define command {
command_name check_http_altport
command_line $USER1$/check_http -H $HOSTADDRESS$ -p 8080 $ARG1$
}
Even given the very useful standard plugins in the Nagios Plugins set and the large number of custom plugins available on Nagios Exchange, occasionally, as our monitoring setup becomes more refined, we may well find that there is some service or property of a host that we would like to check, but for which there doesn't seem to be any suitable plugin available. Every network is different, and sometimes, the plugins that others have generously donated their time to make for the community don't quite cover all your bases. Generally, the more specific your monitoring requirements get, the less likely it is for there to be a plugin available that does exactly what you need.
In this example, we'll deal with a very particular problem that we'll assume can't be dealt with effectively by any known Nagios Core plugins, and we'll write one ourselves using Perl. Here's the example problem.
Our Linux security team wants to be able to automatically check whether any of our servers are running kernels that have known exploits. However, they're not worried about every vulnerable kernel, only certain ones. They have provided us with the version numbers of three kernels that have small vulnerabilities that they're not particularly worried about but that do need patching, and one they're extremely worried about.
Let's say the minor vulnerabilities are in the kernels with version numbers 2.6.19, 2.6.24, and 3.0.1. The serious vulnerability is in the kernel with version number 2.6.39. Note that these version numbers in this case are arbitrary and don't necessarily reflect any real kernel vulnerabilities!
The team could log in to all of the servers individually to check them, but the servers are of varying ages and access methods, and they are managed by different people. They would also have to check manually more than once because it's possible that a naive administrator could upgrade to a kernel that's known to be vulnerable in an older release, and they also might want to add other vulnerable kernel numbers for checking later on.
So, the team have asked us to solve the problem with Nagios Core monitoring, and we've decided that the best way to do it is to write our own plugin, check_vuln_kernel, thatchecks the output of uname(1)for a kernel version string, and then does the following:
For the purposes of this example, we'll only monitor the Nagios Core server; however, via NRPE, we'd be able to install this plugin on the other servers that require this monitoring, they'll work just fine here as well.
While this problem is very specific, we'll approach it in a very general way, which you'll be able to adapt to any solution where it's required for a Nagios plugin to:
All that this means is that if you're able to do this, you'll be able to monitor anything effectively from Nagios Core!
You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already. You should also already be familiar with the relationship between services, commands, and plugins.
You should have Perl installed, at least version 5.10. This will include the required POSIX module. You should also have the Perl modules Nagios::Plugin(or Monitoring::Plugin) andReadonly installed. On Debian-like systems, you can install this with the following:
# apt-get install libnagios-plugin-perl libreadonly-perl
On RPM-based systems, such as CentOS or Fedora Core, the following command should work:
# yum install perl-Nagios-Plugin perl-Readonly
This will be a rather long recipe that ties in a lot of Nagios Core concepts. You should be familiar with all the following concepts:
Some familiarity with Perl would also be helpful, but it is not required. We'll include comments to explain what each block of code is doing in the plugin.
We can write, test, and implement our example plugin as follows:
#!/usr/bin/env perl
# Use strict Perl style
use strict;
use warnings;
use utf8;
# Require at least Perl v5.10
use 5.010;
# Require a few modules, including Nagios::Plugin
use Nagios::Plugin;
use POSIX;
use Readonly;
# Declare some constants with patterns that match bad kernels
Readonly::Scalar my $CRITICAL_REGEX => qr/^2[.]6[.]39[^d]/msx;
Readonly::Scalar my $WARNING_REGEX =>
qr/^(?:2[.]6[.](?:19|24)|3[.]0[.]1)[^d]/msx;
# Run POSIX::uname() to get the kernel version string
my @uname = uname();
my $version = $uname[2];
# Create a new Nagios::Plugin object
my $np = Nagios::Plugin->new();
# If we couldn't get the version, bail out with UNKNOWN
if ( !$version ) {
$np->nagios_die('Could not read kernel version
string');
}
# Exit with CRITICAL if the version string matches the critical pattern
if ( $version =~ $CRITICAL_REGEX ) {
$np->nagios_exit( CRITICAL, $version );
}
# Exit with WARNING if the version string matches the warning pattern
if ( $version =~ $WARNING_REGEX ) {
$np->nagios_exit( WARNING, $version );
}
# Exit with OK if neither of the patterns matched
$np->nagios_exit( OK, $version );
VULN_KERNEL OK: 3.16.0-4-amd64
We should now be able to use the plugin in a command, and hence in a service check just like any other command.
The code we added in the new plugin file, check_vuln_kernel,earlier is actually quite simple:
It also prints the status as a string along with the kernel version number, if it was able to retrieve one.
We might set up a command definition for this plugin, as follows:
define command {
command_name check_vuln_kernel
command_line $USER1$/check_vuln_kernel
}
In turn, we might set up a service definition for that command, as follows:
define service {
use local-service
host_name localhost
service_description VULN_KERNEL
check_command check_vuln_kernel
}
If the kernel was not vulnerable, the service's appearance in the web interface might be something like this:
However, if the monitoring server itself happened to be running a vulnerable kernel, it might look more like this (and send consequent notifications, if configured to do so):
This may be a simple plugin, but its structure can be generalised to all sorts of monitoring tasks. If we can figure out the correct logic to return the status we want in an appropriate programming language, then we can write a plugin to do basically anything.
A plugin like this can just as effectively be written in C or for improved performance, but we'll assume for simplicity's sake that high performance for the plugin is not required, we can instead use a language that's better suited for quick ad hoc scripts like this one, in this case, Perl. The utils.shfile,also in /usr/local/nagios/libexec, allows us to write in shell script if we'd prefer that.
If you prefer Python, the nagiosplugin library should meet your needs for both Python 2 and Python 3. Ruby users may like the nagiosplugin gem.
If you write a plugin that you think could be generally useful for the Nagios community at large, consider putting it under a free software license and submitting it to the Nagios Exchange so that others can benefit from your work. Community contribution and support is what has made Nagios Core such a great monitoring platform in such wide use.
Any plugin you publish in this way should confirm to the Nagios Plugin Development Guidelines. At the time of writing, these are available at https://nagios-plugins.org/doc/guidelines.html.
You may find older Nagios Core plugins written in Perl using the utils.pm file instead of Nagios::Plugin or Monitoring::Plugin. This will work fine, but Nagios::Plugin is recommended, as it includes more functionality out of the box and tends to be easier to use.
In this article, we learned about how to install a custom plugin that we retrieved from Nagios Exchange onto a Nagios Core server so that we can use it in a Nagios Core command, removing a plugin that we no longer need as part of our Nagios Core installation, creating new command, writing and customizing commands.
Further resources on this subject: