In this article written by Tom Ryder, author of the book Nagios Core Administration Cookbook, Second Edition, we will cover the following topics:

Installing a plugin
Removing a plugin
Creating a new command
Customizing an existing command

(For more resources related to this topic, see here.)

Introduction

Nagios Core is perhaps best thought of as a monitoring framework and less as a monitoring tool.Its modular design allows any kind of program that returns appropriate values based on some kind of check as a check_command option for a host or service. This is where the concepts of commands and pluginscome into play.

For Nagios Core, a plugin is any program that can be used to gather information about a host or service. To ensure that a host is responding to ping requests, we'd use a plugin, such as check_ping,which when run against a hostname or address—whether by Nagios Core or not—returns a status code to whatever called it, based on whether a response was received to the pingrequest within a certain period of time. This status code and any accompanying message is what Nagios Core uses to establish the state that a host or service is in.

Plugins are generally just like any other program on a Unix-like system; they can be run from the command line, are subject to permissions and owner restrictions, can be written in any language, can read variables from their environment, and can take parameters and options to modify how they work. Most importantly, they are entirely separate from Nagios Core itself (even if programmed by the same people), and the way that they're used by the application can be changed.

To allow for additional flexibility in how plugins are used, Nagios Core uses these programs according to the terms of a command definition. A command for a specific plugin defines the way in which that plugin is used, including its location in the filesystem, any parameters that should be passed to it, and any other options. In particular, parameters and options often include thresholds for the WARNINGand CRITICAL states.

Nagios Core is usually downloaded and installed alongside a set of plugins called Nagios Plugins, available at https://nagios-plugins.org/, which this article assumes you have installed. These plugins were chosen because they cover the most common needs for a monitoring infrastructure quite well as a set, including checks for common services, such as web services, mail services, DNS services, and others as well as more generic checks, such as whether a TCP or UDP port is accessible and open on a server. It's possible that for most, if not all, of our monitoring needs, we won't need any other plugins—but if we do, Nagios Core makes it possible to use existing plugins in novel ways using custom command definitions, adding third-party plugins written by contributors on the Nagios Exchange website or even writing custom plugins ourselves from scratch in some special cases.

Installing a plugin

In this recipe, we'll install a custom plugin that we retrieved from Nagios Exchange onto a Nagios Core server so that we can use it in a Nagios Core command, and hence check a service with it.

Getting ready

You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already, and you should have found an appropriate plugin to install to solve some particular monitoring needs. Your Nagios Core server should have Internet connectivity to allow you to download the plugin directly from the website.

In this example, we'll use check_rsync,which is available on the Web at https://exchange.nagios.org/directory/Plugins/Network-Protocols/Rsync/check_rsync/details.

This particular plugin is quite simple,consisting of a single Perlscript with only very basic dependencies. If you want to install this script as an example,the server will also need to have a Perl interpreter installed, for example, in /usr/bin/perl.

This example will also include directly testing a server running an rsync(1)daemon called troy.example.net.

How to do it...

We can download and install a new plugin using the following steps:

Copy the URL for the download link for the most recent version of the check_rsync plugin.
Navigate to the plugins directory for the Nagios Core server. The default location is /usr/local/nagios/libexec:
# cd /usr/local/nagios/libexec
Download the plugin using the wget command into a file called check_rsync. It's important to enclose the URL in quotes:
# wget
'https://exchange.nagios.org/components/com_mtree/attachment.
php?link_id=307&cf_id=29' -O check_rsync
Make the plugin executable using the chmod(1) and chown(1) commands:
# chown root.nagios check_rsync
# chmod 0770 check_rsync
Run the plugin directly with no arguments to check that it runs and to get usage instructions. It's a good idea to test it as the nagios user using the su(8) or sudo(8) commands:
# sudo -s -u nagios
$ ./check_rsync
Usage: check_rsync -H <host> [-p <port>] [-m
<module>[,<user>,<password>] [-m
<module>[,<user>,<password>]...]]
Try running the plugin directly against a host running rsync(1) to check whether it works and reports a status:
$ ./check_rsync -H troy.example.net
The output normally starts with the status determined, with any extra information after a colon:

OK: Rsync is up
If all of this works, then the plugin is now installed and working correctly.

How it works...

Because Nagios Core plugins are programs in themselves, all that installing a plugin really amounts to is saving a program or script into an appropriate directory, in this case, /usr/local/nagios/libexec, where all the other plugins live. It's then available to be used the same way as any other plugin.

The next step once the plugin is working is defining a command for it in the Nagios Core configuration so that it can be used to monitor hosts and/or services. This can be done with the Creating a new commandrecipe in this article.

There's more...

If we inspect the Perl script, we can see a little bit of how it works. It works like any other Perl script except perhaps for the fact that its return valuesare defined in a hash called %ERRORS,and the return values it chooses depend on what happens when it tries to check the rsync(1)process. This is the most important part of implementing a plugin for Nagios Core.

Installation procedures for different plugins vary. In particular, many plugins are written in languages like C, and hence, they need to be compiled. One such plugin is the popular check_nrpe plugin.Rather than simply being saved into a directory and made executable, these sorts of plugins often follow the usual pattern of configuration, compilation, and installation:

$ ./configure

$ make

# make install

For many plugins that are built in this style, the final step of make installwill often install the compiled plugin into the appropriate directory for us. In general, if instructions are included with the plugin, it pays to read them to see how best to install it.

Removing a plugin

In this recipe, we'll remove a plugin that we no longer need as part of our Nagios Core installation. Perhaps it's not working correctly, the service it monitors is no longer available, or there are security or licensing concerns with its usage.

Getting ready

You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already and have a plugin that you would like to remove from the server. In this instance, we'll remove the now unneeded check_rsync plugin from our Nagios Core server.

How to do it...

We can remove a plugin from our Nagios Core instance using the following steps:

Remove any part of the configuration that uses the plugin, including the hosts or services that use it for check_command and command definitions that refer to the program. As an example, the following definition for a command would no longer work after we remove the check_rsync plugin:
```
define command {
    command_name  check_rsync
    command_line  $USER1$/check_rsync -H $HOSTADDRESS$
}
```
Using a tool, such as grep(1), can be a good way to find mentions of the command and plugin:
# grep -R check_rsync /usr/local/nagios/etc
Change the directory on the Nagios Core server to wherever the plugins are kept. The default location is /usr/local/nagios/libexec:
# cd /usr/local/nagios/libexec
Delete the plugin with the rm(1) command:
# rm check_rsync
Validate the configuration and restart the Nagios Core server:
# /usr/local/nagios/bin/nagios -v
/usr/local/nagios/etc/nagios.cfg
# /etc/init.d/nagios reload

How it works...

Nagios Core plugins are simply external programs that the server uses to perform checks of hosts and services. If a plugin is no longer needed, all that we need to do is remove references to it in our configuration, if any, and delete it from /usr/local/nagios/libexec.

There's more...

There's not usually any harm in leaving the plugin's program on the server even if Nagios Core isn't using it. It doesn't slow anything down or cause any other problems, and it may be needed later. Nagios Core plugins are generally quite small programs and should not really cause disk space concerns on a modern server.

Creating a new command

In this recipe, we'll create a new command for a plugin that was just installed into the /usr/local/nagios/libexecdirectory in the Nagios Core server. This will define the way in which Nagios Core should use the plugin, and thereby allow it to be used as part of a service definition.

Getting ready

You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already and have a plugin installed for which you'd like to define a new command so that you can use it as part of a service definition. In this instance, we'll define a command for an installed check_rsyncplugin.

How to do it...

We can define a new command in our configuration as follows:

Change to the directory containing the objects configuration for Nagios Core. The default location is /usr/local/nagios/etc/objects:
# cd /usr/local/nagios/etc/objects
Edit the commands.cfg file:

# vi commands.cfg
At the bottom of the file, add the following command definition:

define command {

    command_name  check_rsync

    command_line  $USER1$/check_rsync -H $HOSTADDRESS$
}

Validate the configuration and restart the Nagios Core server:
# /usr/local/nagios/bin/nagios -v
/usr/local/nagios/etc/nagios.cfg
# /etc/init.d/nagios reload

If the validation passed and the server restarted successfully, we should now be able to use the check_rsync command in a service definition.

How it works...

The configuration we added to the commands.cfgfile in the preceding steps defines a new command called check_rsync,which specifies a method for using the plugin of the same name to monitor a service. This enables us to use check_rsyncas a value for the check_commanddirective in a service declaration, which might look like this:

define service {

    use                  generic-service

    host_name            troy.example.net

    service_description  RSYNC

    check_command        check_rsync

}

Only two directives are required for command definitions, and we've defined both:

command_name: This defines the unique name with which we can reference the command when we use it in host or service definitions
command_line: This defines the command line that should be executed by Nagios Core to make the appropriate check

This particular command line also uses the following two macros:

$USER1$: This expands to /usr/local/nagios/libexec, the location of the plugin binaries, including check_rsync. This is defined in the sample configuration in the /usr/local/nagios/etc/resource.cfg file.
$HOSTADDRESS$: This expands to the address of any host for which this command is used as a host or service definition.

So, if we used the command in a service, checking the rsync(1) server on troy.example.net, the completed command might look like this:

$ /usr/local/nagios/libexec/check_rsync -H troy.example.net

We can run this straight from the command line ourselves as the nagios userto see what kind of results it returns:

$ /usr/local/nagios/libexec/check_rsync -H troy.example.net

OK: Rsync is up

There's more...

A plugin can be used for more than one command. If we had a particular rsync(1) module, which we wanted to check named backup, we could write another command called check_rsync_backupas follows:

define command {

    command_name  check_rsync_backup

    command_line  $USER1$/check_rsync -H $HOSTADDRESS$ -m backup

}

Alternatively, if one or more of our rsync(1) servers were running on an alternate port, say, port 5873, we could define a separate command check_rsync_altport for that:

define command {

    command_name  check_rsync_altport

    command_line  $USER1$/check_rsync -H $HOSTADDRESS$ -p 5873

}

Commands can thus be defined as precisely as we need them to be. We explore this in more detail in the Customizing an existing commandrecipe in this article.

Customizing an existing command

In this recipe, we'll customize an existing command definition. There are a number of reasons why you might want to do this, but a common one is if a check is overzealous, sending notifications for the WARNING orCRITICALstates, which aren't actually terribly worrisome, or on the other hand, if a check is too "forgiving" and doesn't flag hosts or services as having problems when it would actually be appropriate to do so.

Another reason is to account for peculiarities in your own network. For example, if you run HTTPdaemons on a large number of hosts in your hosts on the alternative port 8080 that you need to check, it would be convenient to have a check_http_altportcommand available. We can do this by copying and altering the definition for the vanilla check_httpcommand.

Getting ready

You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already. You should also already be familiar with the relationship between services, commands, and plugins.

How to do it...

We can customize an existing command definition as follows:

Change to the directory containing the objects configuration for Nagios Core. The default location is /usr/local/nagios/etc/objects:
# cd /usr/local/nagios/etc/objects
Edit the commands.cfg or whichever file is an appropriate location for the check_http command:
# vi commands.cfg
Find the definition for the check_http command. In a default Nagios Core configuration, it should look something like this:
# 'check_http' command_definition
```
define command {

    command_name  check_http

    command_line  $USER1$/check_http -I $HOSTADDRESS$ $ARG1$

}
```
Copy this definition into a new definition directly under it and alter it to look like the following, renaming the command and adding a new option to its command line:
# 'check_http_altport' command_definition
```
define command {
    command_name  check_http_altport
    command_line  $USER1$/check_http -I $HOSTADDRESS$ -p 8080 $ARG1$
}
```
Validate the configuration and restart the Nagios Core server:
# /usr/local/nagios/bin/nagios -v
/usr/local/nagios/etc/nagios.cfg
# /etc/init.d/nagios reload

If the validation passed and the server restarted successfully, we should now be able to use the check_http_altportcommand, which is based on the original check_httpcommand, in a service definition.

How it works...

The configuration we added to the commands.cfgfile in the preceding steps reproduces the command definition for check_http,but changes it in two ways:

It renames the command from check_http to check_http_alt, which is necessary to distinguish the commands from one another. Command names in Nagios Core, like host names, must be unique.
It adds the -p 8080 option to the command line call, specifying that when the call to check_http is made, the check will be made using TCP port 8080 rather than the default value for TCP port 80.

The check_http_altcommand can now be used as a check command in the same way a check_httpcommand can be used. For example, a service definition that checks whether the sparta.example.nethost is running an HTTP daemon on port 8080 might look something like this:

define service {

    use                  generic-service

    host_name            sparta.example.net

    service_description  HTTP_8080

    check_command        check_http_alt

}

There's more...

This recipe's title implies that we should customize the existing commands by editing them in-place, and indeed, this works fine if we really do want to do things this way. Instead of copying the command definition, we can just add -p 8080 or any other customization to the command line and change the original command.

However, this is bad practice in most cases, mostly because it can break existing monitoring and can be potentially confusing to other administrators of the Nagios Core server. If we have a special case for monitoring, in this case, checking a nonstandard port for HTTP, then it's wise to create a whole new command based on the existing one with the customisations we need.

Particularly if you share monitoring configuration duties with someone else on your team, changing the command can break the monitoring for anyone who had set up the services using the check_http command beforeyou changed it, meaning that their checks would all start failing because port 8080 would be checked instead.

There is no limit to the number of commands you can define, so you can be very liberal in defining as many alternative commands as you need. It's a good idea to give them instructive names that say something about what they do as well as to add explanatory comments to the configuration file. You can add a comment to the file by prefixing it with a # character:

#

# 'check_http_altport' command_definition. This is to keep track of the

# servers that have administrative panels running on an alternative port

# to confer special privileges to a separate instance of Apache HTTPD

# that we don't want to confer to the one for running public-facing

# websites.

#

define command {

    command_name  check_http_altport

    command_line  $USER1$/check_http -H $HOSTADDRESS$ -p 8080 $ARG1$

}

Writing a new plugin from scratch

Even given the very useful standard plugins in the Nagios Plugins set and the large number of custom plugins available on Nagios Exchange, occasionally, as our monitoring setup becomes more refined, we may well find that there is some service or property of a host that we would like to check, but for which there doesn't seem to be any suitable plugin available. Every network is different, and sometimes, the plugins that others have generously donated their time to make for the community don't quite cover all your bases. Generally, the more specific your monitoring requirements get, the less likely it is for there to be a plugin available that does exactly what you need.

In this example, we'll deal with a very particular problem that we'll assume can't be dealt with effectively by any known Nagios Core plugins, and we'll write one ourselves using Perl. Here's the example problem.

Our Linux security team wants to be able to automatically check whether any of our servers are running kernels that have known exploits. However, they're not worried about every vulnerable kernel, only certain ones. They have provided us with the version numbers of three kernels that have small vulnerabilities that they're not particularly worried about but that do need patching, and one they're extremely worried about.

Let's say the minor vulnerabilities are in the kernels with version numbers 2.6.19, 2.6.24, and 3.0.1. The serious vulnerability is in the kernel with version number 2.6.39. Note that these version numbers in this case are arbitrary and don't necessarily reflect any real kernel vulnerabilities!

The team could log in to all of the servers individually to check them, but the servers are of varying ages and access methods, and they are managed by different people. They would also have to check manually more than once because it's possible that a naive administrator could upgrade to a kernel that's known to be vulnerable in an older release, and they also might want to add other vulnerable kernel numbers for checking later on.

So, the team have asked us to solve the problem with Nagios Core monitoring, and we've decided that the best way to do it is to write our own plugin, check_vuln_kernel, thatchecks the output of uname(1)for a kernel version string, and then does the following:

If it's one of the slightly vulnerable kernels, it will return a WARNING state so that we can let the security team know that they should address it when they're next able to.
If it's the highly vulnerable kernel version, it will return a CRITICAL state so that the security team knows that a patched kernel needs to be installed immediately.
If uname(1) gives an error or output we don't understand, it will return an UNKNOWN state, alerting the team to a bug in the plugin or possibly more serious problems with the server.
Otherwise, it returns an OK state, confirming that the kernel is not known to be a vulnerable one.
Finally, in the Nagios Core monitoring, they want to be able to see at a glance what the kernel version is and whether it's vulnerable or not.

For the purposes of this example, we'll only monitor the Nagios Core server; however, via NRPE, we'd be able to install this plugin on the other servers that require this monitoring, they'll work just fine here as well.

While this problem is very specific, we'll approach it in a very general way, which you'll be able to adapt to any solution where it's required for a Nagios plugin to:

Run a command and pull its output into a variable.
Check the output for the presence or absence of certain patterns.
Return an appropriate status based on those tests.

All that this means is that if you're able to do this, you'll be able to monitor anything effectively from Nagios Core!

Getting ready

You should have Perl installed, at least version 5.10. This will include the required POSIX module. You should also have the Perl modules Nagios::Plugin(or Monitoring::Plugin) andReadonly installed. On Debian-like systems, you can install this with the following:

# apt-get install libnagios-plugin-perl libreadonly-perl

On RPM-based systems, such as CentOS or Fedora Core, the following command should work:

# yum install perl-Nagios-Plugin perl-Readonly

This will be a rather long recipe that ties in a lot of Nagios Core concepts. You should be familiar with all the following concepts:

Defining new hosts and services and how they relate to one another
Defining new commands and how they relate to the plugins they call
Installing, testing, and using Nagios Core plugins

Some familiarity with Perl would also be helpful, but it is not required. We'll include comments to explain what each block of code is doing in the plugin.

How to do it...

We can write, test, and implement our example plugin as follows:

Change to the directory containing the plugin binaries for Nagios Core. The default location is /usr/local/nagios/libexec:
# cd /usr/local/nagios/libexec
Start editing a new file called check_vuln_kernel:

# vi check_vuln_kernel
Include the following code in it. Take note of the comments, which explain what each block of code is doing.

#!/usr/bin/env perl

 

# Use strict Perl style

use strict;

use warnings;

use utf8;

 

# Require at least Perl v5.10

use 5.010;

 

# Require a few modules, including Nagios::Plugin

use Nagios::Plugin;

use POSIX;

use Readonly;

 

# Declare some constants with patterns that match bad kernels

Readonly::Scalar my $CRITICAL_REGEX => qr/^2[.]6[.]39[^d]/msx;

Readonly::Scalar my $WARNING_REGEX =>

  qr/^(?:2[.]6[.](?:19|24)|3[.]0[.]1)[^d]/msx;

 

# Run POSIX::uname() to get the kernel version string

my @uname   = uname();

my $version = $uname[2];

 

# Create a new Nagios::Plugin object

my $np = Nagios::Plugin->new();

 

# If we couldn't get the version, bail out with UNKNOWN

if ( !$version ) {

    $np->nagios_die('Could not read kernel version 
    string');

}

 

# Exit with CRITICAL if the version string matches the critical pattern

if ( $version =~ $CRITICAL_REGEX ) {

    $np->nagios_exit( CRITICAL, $version );

}

 

# Exit with WARNING if the version string matches the warning pattern

if ( $version =~ $WARNING_REGEX ) {

    $np->nagios_exit( WARNING, $version );

}

 

# Exit with OK if neither of the patterns matched

$np->nagios_exit( OK, $version );

Make the plugin owned by the nagios group and executable with chmod(1):
# chown root.nagios check_vuln_kernel
# chmod 0770 check_vuln_kernel
Run the plugin directly to test it:
# sudo -s -u nagios
$ ./check_vuln_kernel

VULN_KERNEL OK: 3.16.0-4-amd64
We should now be able to use the plugin in a command, and hence in a service check just like any other command.

How it works...

The code we added in the new plugin file, check_vuln_kernel,earlier is actually quite simple:

It runs Perl's POSIX uname implementation to get the version number of the kernel
If that didn't work, it exits with the UNKNOWN status
If the version number matches anything in a pattern containing critical version numbers, it exits with the CRITICAL status
If the version number matches anything in a pattern containing warning version numbers, it exits with the WARNING status
Otherwise, it exits with the OK status

It also prints the status as a string along with the kernel version number, if it was able to retrieve one.

We might set up a command definition for this plugin, as follows:

define command {

    command_name  check_vuln_kernel

    command_line  $USER1$/check_vuln_kernel

}

In turn, we might set up a service definition for that command, as follows:

define service {

    use                  local-service

    host_name            localhost

    service_description  VULN_KERNEL

    check_command        check_vuln_kernel

}

If the kernel was not vulnerable, the service's appearance in the web interface might be something like this:

However, if the monitoring server itself happened to be running a vulnerable kernel, it might look more like this (and send consequent notifications, if configured to do so):

There's more...

This may be a simple plugin, but its structure can be generalised to all sorts of monitoring tasks. If we can figure out the correct logic to return the status we want in an appropriate programming language, then we can write a plugin to do basically anything.

A plugin like this can just as effectively be written in C or for improved performance, but we'll assume for simplicity's sake that high performance for the plugin is not required, we can instead use a language that's better suited for quick ad hoc scripts like this one, in this case, Perl. The utils.shfile,also in /usr/local/nagios/libexec, allows us to write in shell script if we'd prefer that.

If you prefer Python, the nagiosplugin library should meet your needs for both Python 2 and Python 3. Ruby users may like the nagiosplugin gem.

If you write a plugin that you think could be generally useful for the Nagios community at large, consider putting it under a free software license and submitting it to the Nagios Exchange so that others can benefit from your work. Community contribution and support is what has made Nagios Core such a great monitoring platform in such wide use.

Any plugin you publish in this way should confirm to the Nagios Plugin Development Guidelines. At the time of writing, these are available at https://nagios-plugins.org/doc/guidelines.html.

You may find older Nagios Core plugins written in Perl using the utils.pm file instead of Nagios::Plugin or Monitoring::Plugin. This will work fine, but Nagios::Plugin is recommended, as it includes more functionality out of the box and tends to be easier to use.

Summary

In this article, we learned about how to install a custom plugin that we retrieved from Nagios Exchange onto a Nagios Core server so that we can use it in a Nagios Core command, removing a plugin that we no longer need as part of our Nagios Core installation, creating new command, writing and customizing commands.