You're reading from Linux Networking Cookbook
While a bit less common for home networks, monitoring is one of the key responsibilities of a systems administrator in the business world. A good sysadmin
should be aware of failures in the systems that they're responsible for before the end user notices the problems. In fact, they are often aware of issues before they occur, due to monitoring resources to detect bottlenecks before they trigger any service degradation.
Monitoring can fall into a number of categories, including graphing, alerting, and in some cases, automated fixes.
Nagios is an industry standard for open source monitoring and reporting. It is incredibly flexible and extendable, for better or worse. Getting it set up and running is not too difficult, but additional thought and understanding both Nagios and the systems which you would like to monitor will be necessary in order to create a configuration which is easy to understand and maintain.
Install
nagios
:sudo apt-get install nagios3
Select a password when prompted.
Visit the web UI at
http://YOURSERVER/nagios3/
. You can log in usingnagiosadmin
as a username, and the password, which you selected in the previous step. Since this system requires you to log in, you'll want to follow the instructions in the Apache chapter to configure and require SSL/TLS for the system.
Nagios automatically creates the nagiosadmin
user with full access rights to the system, but if you're operating in a larger environment, you will likely want to provide additional user accounts for other users to connect to. This will allow you to use a finer grained access control as well as making your life easier, as employees come and go in the company.
Create the user account:
htpasswd /etc/nagios3/htpasswd.users user
Alternatively, you can reconfigure Apache to use system authentication for Nagios by editing
/etc/apache2/conf-available/nagios3
to read:<IfModule mod_authnz_external.c> AddExternalAuth pwauth /usr/sbin/pwauth SetExternalAuthMethod pwauth pipe </IfModule> <DirectoryMatch (/usr/share/nagios3/htdocs|/usr/lib/cgi-bin/nagios3|/etc/nagios3/stylesheets)> Options FollowSymLinks DirectoryIndex index.php index.html AllowOverride AuthConfig Order Allow,Deny Allow From All...
Monitoring the local system is different than monitoring remote systems. A big part of this is that while monitoring your local system, you have full access to information regarding number of processes, amount of memory, CPU usage, and so on. When you're looking at remote systems, you're limited to accessing remotely accessible information like if a remote port is listening, ping ability, and so on. If you require the ability to collect more in depth information, you'll need to configure something to make the additional information available.
You can configure additional hosts to be monitored by Nagios by creating additional host entry in a .cfg
file within /etc/nagios3/conf.d/
.
The content should be:
define host { use generic-host host_name testbox hostgroups http-servers,ssh-servers }
While multiple machines may be defined within the same .cfg
file, separate files per machine may make more...
A service in nagios
defines a particular test which should be run. At a minimum you need to define a name for the service and the command to run in order to monitor it.
Similar to hosts, it is defined within .cfg
files in /etc/nagios3/conf.d
or a subdirectory. At a technical level, there is no difference between a .cfg
file that defines a host versus one that defines a service. They are split in Ubuntu's default configuration just for ease of management. If you wanted to, you could have a single flat .cfg
that defines all hosts, services, and users.
Again I like to split my services into a subdirectory, so let's look at defining a service to monitor HP Jetdirect printers by creating /etc/nagios3/conf.d/services/printer.cfg
containing:
define hostgroup { hostgroup_name printers } define service { hostgroup_name printers service_description jetdirect check_command check_hpjd ...
The commands that you may use for a given service need to be defined as well. The commands are defined within /etc/nagios-plugins/config
, which is also included by /etc/nagios3/nagios.cfg
.
This is a useful place to look if you want to see how an existing command is defined, or if you want to define your own custom command.
Let's create a custom command that uses an existing plugin to monitor a new service. Plex media servers are configured by default to use a web server configured on port 32400. So let's define a check_plex
service that uses check_http
on port 32400.
To do this, we're going to create /etc/nagios-plugins/config/plex.cfg
:
define command{ command_name check_plex command_line /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -p 32400 '$ARG1$' }
As I mentioned earlier, a number of plugins, such as check_memory
, collect information from the system itself, which means that they cannot be directly used for monitoring remote systems. As these are often critical things to monitor, there are ways available to indirectly collect that information from remote systems using the Nagios Remote Plugin Executer (NRPE).
NRPE runs on the machine that you'd like to monitor and executes the same commands/plugins which Nagios itself would have. Nagios is then configured to collect data from NRPE rather than collecting data directly.
Install
nrpe
on your monitoring target:sudo apt-get install nagios-nrpe-server
Restrict access to the NRPE service:
sed -i 's|allowed_hosts=.*|allowed_hosts=192.168.1.0/24|g' /etc/nagios/nrpe.cfg
Define any additional checks to run by adding them into
/etc/nagios/nrpe.cfg
:command[check_raid]=/usr/lib/nagios/plugins/check_raid
Configure your
nagios
server to collect data vianrpe
by creating...
In addition to using NRPE to collect data, Nagios can also collect data via SNMP (Simple Network Management Protocol). This is especially useful for monitoring network equipment like routers and switches, which often have SNMP agents built into them.
Install the Nagios SNMP plugins:
sudo apt-get install nagios-snmp-plugins
Define some SNMP checks using SNMPv2:
define hostgroup { hostgroup_name snmp-hosts } define service { hostgroup_name snmp-hosts service_description Load Average check_command \ check_snmp_load_v2!netsc!30!40!!public use generic-service notification_interval 0 } define service { hostgroup_name snmp-hosts service_description Interface Status check_command \ check_snmp_int_v2!!!public use generic-service...