Mastering Zabbix - Second Edition

4.7 (6 reviews total)
By Andrea Dalle Vacche
  • Instant online access to over 8,000+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Deploying Zabbix

About this book

Nowadays monitoring systems play a crucial role in any IT environment. They are extensively used to not only measure your system’s performance, but also to forecast capacity issues. This is where Zabbix, one of the most popular monitoring solutions for networks and applications, comes into the picture. With an efficient monitoring system in place you’ll be able to foresee when your infrastructure runs under capacity and react accordingly. Due to the critical role a monitoring system plays, it is fundamental to implement it in the best way from its initial setup. This avoids misleading, confusing, or, even worse, false alarms which can disrupt an efficient and healthy IT department.

This new edition will provide you with all the knowledge you need to make strategic and practical decisions about the Zabbix monitoring system. The setup you’ll do with this book will fit your environment and monitoring needs like a glove. You will be guided through the initial steps of choosing the correct size and configuration for your system, to what to monitor and how to implement your own custom monitoring component. Exporting and integrating your data with other systems is also covered.

By the end of this book, you will have a tailor-made and well configured monitoring system and will understand with absolute clarity how crucial it is to your IT environment.

Publication date:
September 2015
Publisher
Packt
Pages
412
ISBN
9781785289262

 

Chapter 1. Deploying Zabbix

If you are reading this book, you have, most probably, already used and installed Zabbix. Most likely, you did so on a small/medium environment, but now things have changed, and your environment today is a large one with new challenges coming in regularly. Nowadays, environments are rapidly growing or changing, and it is a difficult task to be ready to support and provide a reliable monitoring solution.

Normally, an initial deployment of a system, a monitoring system, is done by following a tutorial or how-to, and this is a common error. This kind of approach is valid for smaller environments, where the downtime is not critical, where there are no disaster recovery sites to handle, or, in short, where things are easy.

Most likely, these setups are not done by looking forward to the possible new quantity of new items, triggers, and events that the server should elaborate. If you have already installed Zabbix and you need to plan and expand your monitoring solution, or, instead, you need to plan and design the new monitoring infrastructure, this chapter will help you.

This chapter will also help you to perform the difficult task of setting up/upgrading Zabbix in large and very large environments. This chapter will cover every aspect of this task, starting with the definition of a large environment until using Zabbix as a capacity planning resource. The chapter will introduce all the possible Zabbix solutions, including a practical example with an installation ready to handle a large environment, and go ahead with possible improvements.

At the end of this chapter, you will understand how Zabbix works, which tables should be kept under special surveillance, and how to improve the housekeeping on a large environment, which, with a few years of trends to handle, is a really heavy task.

This chapter will cover the following topics:

  • Knowing when you are in front of a large environment and defining when an environment can be considered a large environment

  • Setting up/upgrading Zabbix on a large environment and a very large environment

  • Installing Zabbix on a three-tier system and having a readymade solution to handle a large environment

  • Database sizing and finally knowing the total amount of space consumed by the data acquired by us

  • Knowing the database's heavy tables and tasks

  • Improving the housekeeping to reduce the RDBMS load and improving the efficiency of the whole system

  • Learning fundamental concepts about capacity planning bearing in mind that Zabbix is a capacity-planning tool

 

Defining the environment size


Since this book is focused on a large environment, we need to define or at least provide basic fixed points to identify a large environment. There are various things to consider in this definition; basically, we can identify an environment as large when:

  • There are more than one different physical locations

  • The number of monitored devices is high (hundreds or thousands)

  • The number of checks and items retrieved per second is high (more than 500)

  • There are lots of items, triggers, and data to handle (the database is larger than 100 GB)

  • The availability and performance are both critical

All of the preceding points define a large environment; in this kind of environment, the installation and maintenance of Zabbix infrastructure play a critical role.

The installation, of course, is a task that is defined well on a timely basis and, probably, is one of the most critical tasks; it is really important to go live with a strong and reliable monitoring infrastructure. Also, once we go live with the monitoring in place, it will not be so easy to move/migrate pieces without any loss of data. There are certain other things to consider: we will have a lot of tasks related to our monitoring system, most of which are daily tasks, but in a large environment, they require particular attention.

In a small environment with a small database, a backup will keep you busy for a few minutes, but if the database is large, this task will consume a considerable amount of time to be completed.

The restore and relative-restore plans should be considered and tested periodically to be aware of the time needed to complete this task in the event of a disaster or critical hardware failure.

Between maintenance tasks, we need to consider testing and putting into production upgrades with minimal impact, along with the daily tasks and daily checks.

 

Zabbix architectures


Zabbix can be defined as a distributed monitoring system with a centralized web interface (on which we can manage almost everything). Among its main features, we will highlight the following ones:

  • Zabbix has a centralized web interface

  • The server can be run on most Unix-like operating systems

  • This monitoring system has native agents for most Unix, Unix-like, and Microsoft Windows operation systems

  • The system is easy to integrate with other systems, thanks to the API available in many different programming languages and the option that Zabbix itself provides

  • Zabbix can monitor via SNMP (v1, v2, and v3), IPMI, JMX, ODBC, SSH, HTTP(s), TCP/UDP, and Telnet

  • This monitoring system gives us the possibility of creating custom items and graphs and interpolating data

  • The system is easy to customize

The following diagram shows the three-tier system of a Zabbix architecture:

The Zabbix architecture for a large environment is composed of three different servers/components (that should be configured on HA as well). These three components are as follows:

  • A web server

  • A database server

  • A Zabbix server

The whole Zabbix infrastructure in large environments allows us to have two other actors that play a fundamental role. These actors are the Zabbix agents and the Zabbix proxies. An example is represented in the following figure:

On this infrastructure, we have a centralized Zabbix server that is connected to different proxies, usually one for each server farm or a subnetwork.

The Zabbix server will acquire data from Zabbix proxies, the proxies will acquire data from all the Zabbix agents connected to it, all the data will be stored on a dedicated RDBMS, and the frontend will be exposed with a web interface to the users. Looking at the technologies used, we see that the web interface is written in PHP and that the server, proxies, and agents are written in C.

Note

The server, proxies, and agents are written in C to give the best performance and least resource usage possible. All the components are deeply optimized to achieve the best performance.

We can implement different kinds of architecture using proxies. There are several types of architectures and, in the order of complexity, we find the following ones:

  • The single-server installation

  • One server and many proxies

  • Distributed installation (available only until 2.3.0)

The single-server installation is not suggested in a large environment. It is the basic installation, where single servers do the monitoring, and it can be considered a good starting point.

Most likely, in our infrastructure, we might already have a Zabbix installation. Zabbix is quite flexible, and this permits us to upgrade this installation to the next step: proxy-based monitoring.

Proxy-based monitoring is implemented with one Zabbix server and several proxies, that is, one proxy per branch or data center. This configuration is easy to maintain and offers the advantage to have a centralized monitoring solution. This kind of configuration is the right balance between large environment monitoring and complexity. From this point, we can (with a lot of effort) expand our installation to a complete and distributed monitoring architecture. The installation consisting of one server and many proxies is the one shown in the previous diagram.

Starting from the 2.4.0 version of Zabbix, the distributed scenarios that include nodes are no longer a possible setup. Indeed, if you download the source code of the Zabbix distribution discussed in this book, and then Zabbix 2.4.3, you'll see that the branch of code that was managing the nodes has been removed.

All the possible Zabbix architectures will be discussed in detail in Chapter 2, Distributed Monitoring.

Installing Zabbix

The installation that will be covered in this chapter is the one consisting of a server for each of the following base components:

  • A web frontend

  • A Zabbix server

  • A Zabbix database

We will start describing this installation because:

  • It is a basic installation that is ready to be expanded with proxies and nodes

  • Each component is on a dedicated server

  • This kind of configuration is the starting point to monitor large environments

  • It is widely used

  • Most probably, it will be the starting point of your upgrade and expansion of the monitoring infrastructure.

Actually, this first setup for a large environment, as explained here, can be useful if you are looking to improve an existing monitoring infrastructure. If your current monitoring solution is not implemented in this way, the first thing to do is plan the migration on three different dedicated servers.

Once the environment is set up on three tiers but is still giving poor performance, you can plan and think which kind of large environment setup will be a perfect fit for your infrastructure.

When you monitor your large environment, there are some points to consider:

  • Use a dedicated server to keep things easy to extend

  • Keep things easy to extend and implement a high-availability setup

  • Keep things easy to extend and implement a fault-tolerant architecture

On this three-layer installation, the CPU usage of the server component will not be really critical at least for the Zabbix server. The CPU consumption is directly related to the number of items to store and the refresh rate (number of samples per minute) rather than the memory.

Indeed, the Zabbix server will not consume excessive CPU but is a bit greedier for memory. We can consider that four CPU cores with 8 GB of RAM can be used for more than 1,000 quad hosts without any issues.

Basically, there are two ways to install Zabbix:

  • Downloading the latest source code and compiling it

  • Installing it from packages

There is also another way to have a Zabbix server up and running, that is, by downloading the virtual appliance, but we don't consider this case as it is better to have full control of our installation and be aware of all the steps. Also, the major concern about the virtual appliance is that Zabbix itself defines the virtual appliance that is not production ready directly on the download page http://www.zabbix.com/download.php.

The installation from packages gives us the following benefits:

  • It makes the process of upgrading and updating easier

  • Dependencies are automatically sorted

The source code compilation also gives us benefits:

  • We can compile only the required features

  • We can statically build the agent and deploy it on different Linux flavors

  • We can have complete control over the update

It is quite usual to have different versions of Linux, Unix, and Microsoft Windows in a large environment. These kinds of scenarios are quite diffused on a heterogeneous infrastructure, and if we use the agent distribution package of Zabbix on each Linux server, we will, for sure, have different versions of the agent and different locations for the configuration files.

The more standardized we are across the server, the easier it will be to maintain and upgrade the infrastructure. --enable-static gives us a way to standardize the agent across different Linux versions and releases, and this is a strong benefit. The agent, if statically compiled, can be easily deployed everywhere, and, for sure, we will have the same location (and we can use the same configuration file apart from the node name) for the agent and their configuration file. The deployment will be standardized; however, the only thing that may vary is the start/stop script and how to register it on the right init runlevel.

The same kind of concept can be applied to commercial Unix bearing in mind its compilation by vendors, so the same agent can be deployed on different versions of Unix released by the same vendor.

Prerequisites

Before compiling Zabbix, we need to take a look at the prerequisites. The web frontend will need at least the following versions:

  • Apache (1.3.12 or later)

  • PHP (5.3.0 or later)

Instead, the Zabbix server will need:

  • An RDBMS: The open source alternatives are PostgreSQL and MySQL

  • zlib-devel

  • mysql-devel: This is used to support MySQL (not needed on our setup)

  • postgresql-devel: This is used to support PostgreSQL

  • glibc-devel

  • curl-devel: This is used in web monitoring

  • libidn-devel: The curl-devel depends on it

  • openssl-devel: The curl-devel depends on it

  • net-snmp-devel: This is used on SNMP support

  • popt-devel: net-snmp-devel might depend on it

  • rpm-devel: net-snmp-devel might depend on it

  • OpenIPMI-devel: This is used to support IPMI

  • iksemel-devel: This is used for the Jabber protocol

  • Libssh2-devel

  • sqlite3: This is required if SQLite is used as the Zabbix backend database (usually on proxies)

To install all the dependencies on a Red Hat Enterprise Linux distribution, we can use yum (from root), but first of all, we need to include the EPEL repository with the following command:

# yum install epel-release

Using yum install, install the following package:

# yum install zlib-devel postgresql-devel glibc-devel curl-devel gcc automake postgresql libidn-devel openssl-devel net-snmp-devel rpm-devel OpenIPMI-devel iksemel-devel libssh2-devel openldap-devel

Tip

The iksemel-devel package is used to send a Jabber message. This is a really useful feature as it enables Zabbix to send chat messages, Furthermore, Jabber is managed as a media type on Zabbix, and you can also set your working time, which is a really useful feature to avoid the sending of messages when you are not in the office.

Setting up the server

Zabbix needs a user and an unprivileged account to run. Anyway, if the daemon is started from root, it will automatically switch to the Zabbix account if this one is present:

# groupadd zabbix
# useradd –m –s /bin/bash -g zabbix zabbix
# useradd –m –s /bin/bash -g zabbix zabbixsvr

Note

The server should never run as root because this will expose the server to a security risk.

The preceding lines permit you to enforce the security of your installation. The server and agent should run with two different accounts; otherwise, the agent can access the Zabbix server's configuration. Now, using the Zabbix user account, we can download and extract the sources from the tar.gz file:

# wget  http://sourceforge.net/projects/zabbix/files/ZABBIX%20Latest%20Stable/2.4.4/zabbix-2.4.4.tar.gz/download -O zabbix-2.4.4.tar.gz 
# tar -zxvf zabbix-2.4.4.tar.gz

Now, we will configure the sources where help is available:

# cd zabbix-2.4.3
# ./configure -–help

To configure the source for our server, we can use the following options:

# ./configure --enable-server --enable-agent --with-postgresql --with-libcurl --with-jabber --with-net-snmp --enable-ipv6 --with-openipmi --with-ssh2 --with-ldap

Note

The zabbix_get and zabbix_send commands are generated only if --enable-agent is specified during server compilation.

If the configuration is complete without errors, we should see something similar to this:

config.status: executing depfiles commands


Configuration:

  Detected OS:           linux-gnu
  Install path:          /usr/local
  Compilation arch:      linux

  Compiler:              gcc
  Compiler flags:        -g -O2    -I/usr/include      -I/usr/include/rpm -I/usr/local/include -I/usr/lib64/perl5/CORE -I. -I/usr/include -I/usr/include -I/usr/include -I/usr/include

  Enable server:         yes
  Server details:
    With database:         PostgreSQL
    WEB Monitoring:        cURL
    Native Jabber:         yes
    SNMP:                  yes
    IPMI:                  yes
    SSH:                   yes
    ODBC:                  no
    Linker flags:          -rdynamic       -L/usr/lib64      -L/usr/lib64 -L/usr/lib -L/usr/lib -L/usr/lib
    Libraries:             -lm -ldl -lrt  -lresolv      -lpq  -liksemel    -lnetsnmp -lssh2 -lOpenIPMI -lOpenIPMIposix -lldap -llber   -lcurl

  Enable proxy:          no

  Enable agent:          yes
  Agent details:
    Linker flags:          -rdynamic    -L/usr/lib
    Libraries:             -lm -ldl -lrt  -lresolv   -lldap -llber   -lcurl

  Enable Java gateway:   no

  LDAP support:          yes
  IPv6 support:          yes

***********************************************************
*            Now run 'make install'                       *
*                                                         *
*            Thank you for using Zabbix!                  *
*              <http://www.zabbix.com>                    *
***********************************************************

We will not run make install but only the compilation with # make. To specify a different location for the Zabbix server, we need to use a -- prefix on the configure options, for example, --prefix=/opt/zabbix. Now, follow the instructions as explained in the Installing and creating the package section.

Setting up the agent

To configure the sources to create the agent, we need to run the following command:

# ./configure --enable-agent
# make

Tip

With the make command followed by the --enable-static option, you can statically link the libraries, and the compiled binary will not require any external library; this is very useful to distribute the agent across different dialects of Linux.

Installing and creating the package

In both the previous sections, the command line ends right before the installation; indeed, we didn't run the following command:

# make install

I advise you not to run the make install command but use the checkinstall software instead. This software will create the package and install the Zabbix software.

You can download the software from ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/ikoinoba/CentOS_CentOS-6/x86_64/checkinstall-1.6.2-3.el6.1.x86_64.rpm.

Note that checkinstall is only one of the possible alternatives that you have to create a distributable system package.

Note

We can also use a prebuild checkinstall. The current release is checkinstall-1.6.2-20.4.i686.rpm (on Red Hat/CentOS); the package will also need rpm-build; then, from root, we need to execute the following command:

# yum install rpm-build rpmdevtools

We also need to create the necessary directories:

# mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}

The package made things easy; it is easy to distribute and upgrade the software, plus we can create a package for different versions of a package manager: RPM, deb, and tgz.

Tip

checkinstall can produce a package for Debian (option –D), Slackware (option –S), and Red Hat (option –R). This is particularly useful to produce the Zabbix's agent package (statically linked) and to distribute it around our server.

Now, we need to convert to root or use the sudo checkinstall command followed by its options:

# checkinstall --nodoc -R --install=no -y 

If you don't face any issue, you should get the following message:

******************************************************************
 Done. The new package has been saved to
 /root/rpmbuild/RPMS/x86_64/zabbix-2.4.4-1.x86_64.rpm
 You can install it in your system anytime using:
      rpm -i zabbix-2.4.4-1.x86_64.rpm
******************************************************************

Now, to install the package from root, you need to run the following command:

# rpm -i zabbix-2.4.4-1.x86_64.rpm

Finally, Zabbix is installed. The server binaries will be installed in <prefix>/sbin, utilities will be in <prefix>/bin, and the man pages will be under the <prefix>/share location.

Installing from packages

To provide a complete picture of all the possible install methods, you need to be aware of the steps required to install Zabbix using the prebuilt rpm packages.

The first thing to do is install the repository:

# rpm -ivh http://repo.zabbix.com/zabbix/2.4/rhel/6/x86_64/zabbix-2.4.4-1.el6.x86_64.rpm

This will create the yum repo file, /etc/yum.repos.d/zabbix.repo, and will enable the repository.

Tip

If you take a look at the Zabbix repository, you can see that inside the "non-supported" tree: http://repo.zabbix.com/non-supported/rhel/6/x86_64/, you have available these packages: iksemel, fping, libssh2, and snmptt.

Now, it is easy to install our Zabbix server and web interface; you can simply run this command on the server:

# yum install zabbix-server-pgsql

And in the web server, bear in mind to first add the yum repository:

# yum install zabbix-web-pgsql

To install the agent, you only need to run the following command:

# yum install zabbix-agent

Note

If you have decided to use the RPM packages, please bear in mind that the configuration files are located under /etc/zabbix/. The book anyway will continue to refer to the standard configuration: /usr/local/etc/.

Also, if you have a local firewall active where you're deploying your Zabbix agent, you need to properly configure iptables to allow the traffic against Zabbix's agent port with the following command that you need to run as root:

# iptables -I INPUT 1 -p tcp --dport 10050 -j ACCEPT
# iptables-save

Configuring the server

For the server configuration, we only have one file to check and edit:

/usr/local/etc/zabbix_server.conf

The configuration files are located inside the following directory:

/usr/local/etc

We need to change the /usr/local/etc/zabbix_server.conf file and write the username, relative password, and the database name there; note that the database configuration will be done later on in this chapter and that, by now, you can write the planned username/password/database name. Then, in the zabbix account, you need to edit:

# vi /usr/local/etc/zabbix_server.conf

Change the following parameters:

DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=<write-here-your-password>

Note

Now, our Zabbix server is configured and almost ready to go. zabbix_server.conflocation depends on the sysconfdir compile-time installation variable. Don't forget to take appropriate measures to protect access to the configuration file with the following command:

chmod 600/usr/local/etc/zabbix_server.conf

The location of the default external scripts will be as follows:

/usr/local/share/zabbix/externalscripts 

This depends on the datadir compile-time installation variable. The alertscripts directory will be in the following location:

/usr/local/share/zabbix/alertscripts 

Tip

This can be changed during compilation, and it depends on the datadir installation variable.

Now, we need to configure the agent. The configuration file is where we need to write the IP address of our Zabbix server. Once done, it is important to add two new services to the right runlevel to be sure that they will start when the server enters on the right runlevel.

To complete this task, we need to install the start/stop scripts on the following:

  • /etc/init.d/zabbix-agent

  • /etc/init.d/zabbix-proxy

  • etc/init.d/zabbix-server

There are several scripts prebuilt inside the misc folder located at the following location:

zabbix-2.4.4/misc/init.d

This folder contains different startup scripts for different Linux variants, but this tree is not actively maintained and tested, and may not be up to date with the most recent versions of Linux distributions, so it is better to take care and test it before going live.

Once the start/stop script is added inside the /etc/init.d folder, we need to add them to the service list:

# chkconfig --add zabbix-server
# chkconfig --add zabbix-agentd

Now, all that is left is to tell the system which runlevel it should start them on; we are going to use runlevels 3 and 5:

# chkconfig --level 35 zabbix-server on
# chkconfig --level 35 zabbix-agentd on

Also, in case you have a local firewall active in your Zabbix server, you need to properly configure iptables to allow traffic against Zabbix's server port with the following command that you need to run as root:

# iptables -I INPUT 1 -p tcp --dport 10051 -j ACCEPT
# iptables-save

Currently, we can't start the server; before starting up our server, we need to configure the database.

Installing the database

Once we complete the previous step, we can walk through the database server installation. All those steps will be done on the dedicated database server. The first thing to do is install the PostgreSQL server. This can be easily done with the package offered from the distribution, but it is recommended that you use the latest 9.x stable version.

Red Hat is still distributing the 8.x on RHEL6.4. Also, its clones, such as CentOS and ScientificLinux, are doing the same. PosgreSQL 9.x has many useful features; at the moment, the latest stable, ready-for-production environment is Version 9.2.

To install PostgreSQL 9.4, there are some easy steps to follow:

  1. Locate the .repo files:

    • Red Hat: This is present at /etc/yum/pluginconf.d/rhnplugin.conf [main]

    • CentOS: This is present at /etc/yum.repos.d/CentOS-Base.repo, [base] and [updates]

  2. Append the following line on the section(s) identified in the preceding step:

    exclude=postgresql* 
  3. Browse to http://yum.postgresql.org and find your correct RPM. For example, to install PostgreSQL 9.4 on RHEL 6, go to http://yum.postgresql.org/9.4/redhat/rhel-6-x86_64/pgdg-redhat94-9.4-1.noarch.rpm.

  4. Install the repo with yum localinstall http://yum.postgresql.org/9.4/redhat/rhel-6-x86_64/pgdg-centos94-9.4-1.noarch.rpm.

  5. Now, to list the entire postgresql package, use the following command:

    # yum list postgres*
    
  6. Once you find our package in the list, install it using the following command:

    # yum install postgresql94 postgresql94-server postgresql94-contrib
    
  7. Once the packages are installed, we need to initialize the database:

    # service postgresql-9.4 initdb
    

    Alternatively, we can also initialize this database:

    # /etc/init.d/postgresql-9.4 initdb
    
  8. Now, we need to change a few things in the configuration file /var/lib/pgsql/9.4/data/postgresql.conf. We need to change the listen address and the relative port:

    listen_addresses = '*'
    port = 5432

    We also need to add a couple of entries for zabbix_db right after the following lines:

    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    # "local" is for Unix domain socket connections only
    local   all             all                                     trust 
    in /var/lib/pgsql/9.4/data/pg_hba.conf
    # configuration for Zabbix
    local   zabbix_db   zabbix                        md5
    host    zabbix_db   zabbix      <CIDR-address>    md5
    

    The local keyword matches all the connections made in the Unix-domain sockets. This line is followed by the database name (zabbix_db), the username (zabbix), and the authentication method (in our case, md5).

    The host keyword matches all the connections that are coming from TCP/IP (this includes SSL and non-SSL connections) followed by the database name (zabbix_db), username (zabbix), network, and mask of all the hosts that are allowed and the authentication method (in our case md5).

  9. The network mask of the allowed hosts in our case should be a network mask because we need to allow the web interface (hosted on our web server) and the Zabbix server that is on a different dedicated server, for example, 10.6.0.0/24 (a small subnet) or even a large network. Most likely, the web interface as well as the Zabbix server will be in a different network, so make sure that you express all the network and relative masks here.

  10. Finally, we can start our PosgreSQL server using the following command:

    # service postgresql-9.4  start
    

    Alternatively, we can use this command:

    # /etc/init.d/postgresql-9.4  start
    

To create a database, we need to be a postgres user (or the user that in your distribution is running PostgreSQL). Create a user for the database (our Zabbix user) and log in as that user to import the schema with the relative data.

The code to import the schema is as follows:

# su - postgres

Once we become postgres users, we can create the database (in our example, it is zabbix_db):

-bash-4.1$ psql 
postgres=#  CREATE USER zabbix WITH PASSWORD '<YOUR-ZABBIX-PASSWORD-HERE>';
CREATE ROLE
postgres=# CREATE DATABASE zabbix_db WITH OWNER zabbix ENCODING='UTF8';
CREATE DATABASE
postgres=# \q

The database creation scripts are located in the /database/postgresql folder of the extracted source files. They need to be installed exactly in this order:

# cat schema.sql |  psql –h <DB-HOST-IP-ADDRESS> -W -U zabbix zabbix_db
# cat images.sql |  psql –h <DB-HOST-IP-ADDRESS> -W -U zabbix zabbix_db
# cat data.sql |  psql –h <DB-HOST-IP-ADDRESS> -W -U zabbix zabbix_db 

Tip

The –h <DB-HOST-IP-ADDRESS> option used on the psql command will avoid the use of the local entry contained in the standard configuration file /var/lib/pgsql/9.4/data/pg_hba.conf.

Now, finally, it is the time to start our Zabbix server and test the whole setup for our Zabbix server/database:

# /etc/init.d/zabbix-server start
Starting Zabbix server:                                   [  OK  ]

A quick check of the log file can give us more information about what is currently happening in our server. We should be able to get the following lines from the log file (the default location is /tmp/zabbix_server.log):

  26284:20150114:034537.722 Starting Zabbix Server. Zabbix 2.4.4 (revision 51175).
 26284:20150114:034537.722 ****** Enabled features ******
 26284:20150114:034537.722 SNMP monitoring:           YES
 26284:20150114:034537.722 IPMI monitoring:           YES
 26284:20150114:034537.722 WEB monitoring:            YES
 26284:20150114:034537.722 VMware monitoring:         YES
 26284:20150114:034537.722 Jabber notifications:      YES
 26284:20150114:034537.722 Ez Texting notifications:  YES
 26284:20150114:034537.722 ODBC:                      YES
 26284:20150114:034537.722 SSH2 support:              YES
 26284:20150114:034537.722 IPv6 support:              YES
 26284:20150114:034537.725 ******************************
 26284:20150114:034537.725 using configuration file: /usr/local/etc/zabbix/zabbix_server.conf
 26284:20150114:034537.745 current database version (mandatory/optional): 02040000/02040000
 26284:20150114:034537.745 required mandatory version: 02040000
 26284:20150114:034537.763 server #0 started [main process]
 26289:20150114:034537.763 server #1 started [configuration syncer #1]
 26290:20150114:034537.764 server #2 started [db watchdog #1]
 26291:20150114:034537.764 server #3 started [poller #1]
 26293:20150114:034537.765 server #4 started [poller #2]
 26294:20150114:034537.766 server #5 started [poller #3]
 26296:20150114:034537.770 server #7 started [poller #5]
 26295:20150114:034537.773 server #6 started [poller #4]
 26297:20150114:034537.773 server #8 started [unreachable poller #1]
 26298:20150114:034537.779 server #9 started [trapper #1]
 26300:20150114:034537.782 server #11 started [trapper #3]
 26302:20150114:034537.784 server #13 started [trapper #5]
 26301:20150114:034537.786 server #12 started [trapper #4]
 26299:20150114:034537.786 server #10 started [trapper #2]
 26303:20150114:034537.794 server #14 started [icmp pinger #1]
 26305:20150114:034537.790 server #15 started [alerter #1]
 26312:20150114:034537.822 server #18 started [http poller #1]
 26311:20150114:034537.811 server #17 started [timer #1]
 26310:20150114:034537.812 server #16 started [housekeeper #1]
 26315:20150114:034537.829 server #20 started [history syncer #1]
 26316:20150114:034537.844 server #21 started [history syncer #2]
 26319:20150114:034537.847 server #22 started [history syncer #3]
 26321:20150114:034537.852 server #24 started [escalator #1]
 26320:20150114:034537.849 server #23 started [history syncer #4]
 26326:20150114:034537.868 server #26 started [self-monitoring #1]
 26325:20150114:034537.866 server #25 started [proxy poller #1]
 26314:20150114:034537.997 server #19 started [discoverer #1]

Actually, the default log location is not the best ever as /tmp will be cleaned up in the event of reboot and, for sure, the logs are not rotated and managed properly.

We can change the default location by simply changing an entry in /etc/zabbix_server.conf. You can change the file as follows:

### Option: LogFile
LogFile=/var/log/zabbix/zabbix_server.log

Create the directory structure with the following command from root:

# mkdir –p /var/log/zabbix
# chown zabbixsvr:zabbixsvr /var/log/zabbix

Another important thing to change is logrotate as it is better to have an automated rotation of our log file. This can be quickly implemented by adding the relative configuration in the logrotate directory /etc/logrotate.d/.

To do that, create the following file by running the command from the root account:

# vim  /etc/logrotate.d/zabbix-server

Use the following content:

/var/log/zabbix/zabbix_server.log {
        missingok
        monthly
        notifempty
        compress
        create 0664 zabbix zabbix
}

Once those changes have been done, you need to restart your Zabbix server with the following command (run it using root):

# /etc/init.d/zabbix-server restart
Shutting down Zabbix server:                              [  OK  ]
Starting Zabbix server:                                   [  OK  ]

Another thing to check is whether our server is running with our user:

# ps aux | grep "[z]abbix_server"
502 28742 1  0 13:39 ?    00:00:00 /usr/local/sbin/zabbix_server
502 28744 28742 0 13:39 ? 00:00:00 /usr/local/sbin/zabbix_server
502 28745 28742 0 13:39 ? 00:00:00 /usr/local/sbin/zabbix_server
...

The preceding lines show that zabbix_server is running with the user 502. We will go ahead and verify that 502 is the user we previously created:

# getent passwd 502
zabbixsvr:x:502:501::/home/zabbixsvr:/bin/bash

The preceding lines show that all is fine. The most common issue normally is the following error:

28487:20130609:133341.529 Database is down. Reconnecting in 10 seconds.

There are different actors that cause this issue:

  • Firewall (local on our servers or an infrastructure firewall)

  • The postgres configuration

  • Wrong data in zabbix_server.conf

Tip

We can try to isolate the problem by running the following command on the database server:

serverpsql -h <DB-HOST-IP> -U zabbix zabbix_dbPassword for user zabbix:psql (9.4)Type "help" for help

If we have a connection, we can try the same command from the Zabbix server; if it fails, it is better to check the firewall configuration. If we get the fatal identification-authentication failed error, it is better to check the pg_hba.conf file.

Now, the second thing to check is the local firewall and then iptables. You need to verify that the PostgreSQL port is open on the database server. If the port is not open, you need to add a firewall rule using the root account:

# iptables -I INPUT 1 -p tcp --dport 5432 -j ACCEPT
# iptables-save

Now, it is time to check how to start and stop your Zabbix installation. The scripts that follow are a bit customized to manage the different users for the server and the agent.

Note

The following startup script works fine with the standard compilation without using a -- prefix or the zabbixsvr user. If you are running on a different setup, make sure that you customize the executable location and the user:

exec=/usr/local/sbin/zabbix_server
zabbixsrv=zabbixsvr

For zabbix-server, create the zabbix-server file at /etc/init.d with the following content:

#!/bin/sh
#
# chkconfig: - 85 15
# description: Zabbix server daemon
# config: /usr/local/etc/zabbix_server.conf
#

### BEGIN INIT INFO
# Provides: zabbix
# Required-Start: $local_fs $network
# Required-Stop: $local_fs $network
# Default-Start:
# Default-Stop: 0 1 2 3 4 5 6
# Short-Description: Start and stop Zabbix server
# Description: Zabbix server
### END INIT INFO

# Source function library.
. /etc/rc.d/init.d/functions

exec=/usr/local/sbin/zabbix_server
prog=${exec##*/}
lockfile=/var/lock/subsys/zabbix
syscf=zabbix-server

The next parameter, zabbixsvr, is specified inside the start() function, and it determines which user will be used to run our Zabbix server:

zabbixsrv=zabbixsvr
[ -e /etc/sysconfig/$syscf ] && . /etc/sysconfig/$syscf

start()
{
    echo -n $"Starting Zabbix server: "

In the preceding code, the user (who will own our Zabbix's server process) is specified inside the start function:

    daemon --user $zabbixsrv $exec

Remember to change the ownership of the server log file and configuration file of Zabbix. This is to prevent a normal user from accessing sensitive data that can be acquired with Zabbix. Logfile is specified as follows:

/usr/local/etc/zabbix_server.conf
On 'LogFile''LogFile' properties    rv=$?
    echo
    [ $rv -eq 0 ] && touch $lockfile
    return $rv
}

stop()
{
    echo -n $"Shutting down Zabbix server: "

Here, inside the stop function, we don't need to specify the user as the start/stop script runs from root, so we can simply use killproc $prog as follows:

    killproc $prog
    rv=$?
    echo
    [ $rv -eq 0 ] && rm -f $lockfile
    return $rv
}

restart()
{
    stop
    start
}

case "$1" in
    start|stop|restart)
        $1
        ;;
    force-reload)
        restart
        ;;
    status)
        status $prog
        ;;
    try-restart|condrestart)
        if status $prog >/dev/null ; then
            restart
        fi
        ;;
    reload)
        action $"Service ${0##*/} does not support the reload action: " /bin/false
        exit 3
        ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}"
        exit 2
        ;;
esac

Note

The following startup script works fine with the standard compilation without using a -- prefix or the zabbix_usr user. If you are running on a different setup, make sure that you customize the executable location and the user:

exec=/usr/local/sbin/zabbix_agentd
zabbix_usr=zabbix

For zabbix_agent, create the following zabbix-agent file at /etc/init.d/zabbix-agent:

#!/bin/sh
#
# chkconfig: - 86 14
# description: Zabbix agent daemon
# processname: zabbix_agentd
# config: /usr/local/etc/zabbix_agentd.conf
#

### BEGIN INIT INFO
# Provides: zabbix-agent
# Required-Start: $local_fs $network
# Required-Stop: $local_fs $network
# Should-Start: zabbix zabbix-proxy
# Should-Stop: zabbix zabbix-proxy
# Default-Start:
# Default-Stop: 0 1 2 3 4 5 6
# Short-Description: Start and stop Zabbix agent
# Description: Zabbix agent
### END INIT INFO

# Source function library.
. /etc/rc.d/init.d/functions

exec=/usr/local/sbin/zabbix_agentd
prog=${exec##*/}
syscf=zabbix-agent
lockfile=/var/lock/subsys/zabbix-agent

The following zabbix_usr parameter specifies the account that will be used to run Zabbix's agent:

zabbix_usr=zabbix
[ -e /etc/sysconfig/$syscf ] && . /etc/sysconfig/$syscf

start()
{
    echo -n $"Starting Zabbix agent: "

The next command uses the value of the zabbix_usr variable and permits us to have two different users, one for the server and one for the agent, preventing the Zabbix agent from accessing the zabbix_server.conf file that contains our database password:

    daemon --user $zabbix_usr $exec
    rv=$?
    echo
    [ $rv -eq 0 ] && touch $lockfile
    return $rv
}

stop()
{
    echo -n $"Shutting down Zabbix agent: "
    killproc $prog
    rv=$?
    echo
    [ $rv -eq 0 ] && rm -f $lockfile
    return $rv
}

restart()
{
    stop
    start
}

case "$1" in
    start|stop|restart)
        $1
        ;;
    force-reload)
        restart
        ;;
    status)
        status $prog
        ;;
    try-restart|condrestart)
        if status $prog >/dev/null ; then
            restart
        fi
        ;;
    reload)
        action $"Service ${0##*/} does not support the reload action: " /bin/false
        exit 3
        ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}"
        exit 2
        ;;
esac

With that setup, we have the agent that is running with zabbix_usr and the server with Unix accounts of zabbixsvr:

zabbix_usr_ 4653 1 0 15:42 ?        00:00:00 /usr/local/sbin/zabbix_agentd
zabbix_usr 4655 4653  0 15:42 ?    00:00:00 /usr/local/sbin/zabbix_agentd 
zabbixsvr 4443 1  0 15:32 ?    00:00:00 /usr/local/sbin/zabbix_server
zabbixsvr 4445 4443  0 15:32 ? 00:00:00 /usr/local/sbin/zabbix_server

Some considerations about the database

Zabbix uses an interesting way to keep the database the same size at all times. The database size indeed depends upon:

  • The number of processed values per second

  • The housekeeper settings

Zabbix uses two ways to store the collected data:

  • History

  • Trends

While on history, we will find all the collected data (it doesn't matter what type of data will be stored in history); trends will collect only numerical data. Its minimum, maximum, and average calculations are consolidated by hour (to keep the trend a lightweight process).

Tip

All the strings items, such as character, log, and text, do not correspond to trends since trends store only values.

There is a process called the housekeeper that is responsible for handling the retention against our database. It is strongly advised that you keep the data in history as small as possible so that you do not overload the database with a huge amount of data, and store the trends for as long as you want.

Now, since Zabbix will also be used for capacity planning purposes, we need to consider using a baseline and keeping at least a whole business period. Normally, the minimum period is one year, but it is strongly advised that you keep the trend history on for at least 2 years. These historical trends will be used during the business opening and closure to have a baseline and quantify the overhead for a specified period.

Note

If we indicate 0 as the value for trends, the server will not calculate or store trends at all. If history is set to 0, Zabbix will be able to calculate only triggers based on the last value of the item itself as it does not store historical values at all.

The most common issue that we face when aggregating data is the presence of values influenced by positive spikes or fast drops in our hourly trends, which means that huge spikes can produce a mean value per hour that is not right.

Trends in Zabbix are implemented in a smart way. The script creation for the trend table is as follows:

CREATE TABLE trends(
itemid bigin NOT NULL, clock integer DEFAULT '0'
NOT NULL, num integer DEFAULT '0'
NOT NULL, value_min numeric(16, 4) DEFAULT '0.0000'
NOT NULL, value_avg numeric(16, 4) DEFAULT '0.0000'
NOT NULL, value_max numeric(16, 4) DEFAULT '0.0000'
NOT NULL, PRIMARY KEY(itemid, clock));

CREATE TABLE trends_uint(
Itemid bigint NOT NULL, Clock integer DEFAULT '0'
NOT NULL, Num integer DEFAULT '0'
NOT NULL, value_min numeric(20) DEFAULT '0'
NOT NULL, value_avg numeric(20) DEFAULT '0'
NOT NULL, value_max numeric(20) DEFAULT '0'
NOT NULL, PRIMARY KEY(itemid, clock));

As you can see, there are two tables showing trends inside the Zabbix database:

  • Trends

  • Trends_uint

The first table, Trends, is used to store the float value. The second table, trends_uint, is used to store the unsigned integer. Both tables own the concept of keeping the following for each hour:

  • Minimum value (value_min)

  • Maximum value (value_max)

  • Average value (value_avg)

This feature permits us to find out and display the trends graphically by using the influence of spikes and fast drops against the average value and understanding how and how much this value has been influenced. The other tables used for historical purposes are as follows:

  • history: This is used to store numeric data (float)

  • history_log: This is used to store logs (for example, the text field on the PostgreSQL variable has unlimited length)

  • history_str: This is used to store strings (up to 255 characters)

  • history_text: This is used to store the text value (again, this is a text field, so it has unlimited length)

  • history_uint: This is used to store numeric values (unsigned integers)

Sizing the database

Calculating the definitive database size is not an easy task because it is hard to predict how many items and the relative rate per second we will have on our infrastructure and how many events will be generated. To simplify this, we will consider the worst-case scenario, where we have an event generated every second.

In summary, the database size is influenced by:

  • Items: The number of items in particular

  • Refresh rate: The average refresh rate of our items

  • Space to store values: This value depends on RDBMS

The space used to store the data may vary from database to database, but we can simplify our work by considering mean values that quantify the maximum space consumed by the database. We can also consider the space used to store values on history to be around 50 bytes per value, the space used by a value on the trend table to be around 128 bytes, and the space used for a single event to be normally around 130 bytes.

The total amount of used space can be calculated with the following formula:

Configuration + History + Trends + Events

Now, let's look into each of the components:

  • Configuration: This refers to Zabbix's configuration for the server, the web interface, and all the configuration parameters that are stored in the database; this is normally around 10 MB

  • History: The history component is calculated using the following formula:

    History retention days* (items/refresh rate)*24*3600* 50 bytes (History bytes usage average) 
  • Trends: The trends component is calculated using the following formula:

    days*(items/3600)*24*3600*128 bytes (Trend bytes usage average)
  • Events: The event component is calculated using the following formula:

    days*events*24*3600*130 bytes (Event bytes usage average)

Now, coming back to our practical example, we can consider 5,000 items to be refreshed every minute, and we want to have 7 days of retention; the used space will be calculated as follows:

History: retention (in days) * (items/refresh rate)*24*3600* 50 bytes

Note

50 bytes is the mean value of the space consumed by a value stored on history.

Considering a history of 30 days, the result is the following:

  • History will be calculated as:

    30 * 5000/60 * 24*3600 *50 = 10.8GB
    
  • As we said earlier, to simplify, we will consider the worst-case scenario (one event per second) and will also consider keeping 5 years of events

  • Events will be calculated using the following formula:

    retention days*events*24*3600* Event bytes usage (average)

    When we calculate an event, we have:

    5*365*24*3600* 130 = 15.7GB
    

    Note

    130 bytes is the mean value for the space consumed by a value stored on events.

  • Trends will be calculated using the following formula:

    retention in days*(items/3600)*24*3600*Trend bytes usage (average)

    When we calculate trends, we have:

    5000*24*365* 128 = 5.3GB per year or 26.7GB for 5 years.

    Note

    128 bytes is the mean value of the space consumed by a value stored on trends.

The following table shows the retention in days and the space required for the measure:

Type of measure

Retention in days

Space required

History

30

10.8 GB

Events

1825 (5 years)

15.7 GB

Trends

1825 (5 years)

26.7 GB

Total

N.A.

53.2 GB

The calculated size is not the initial size of our database, but we need to keep in mind that this one will be our target size after 5 years. We are also considering a history of 30 days, so keep in mind that this retention can be reduced if there are issues since the trends will keep and store our baseline and hourly trends.

The history and trend retention policy can be changed easily for every item. This means that we can create a template with items that have a different history retention by default. Normally, the history is set to 7 days, but for some kind of measure, such as in a web scenario or an other measures, we may need to keep all the values for more than a week. This permits us to change the value for each item.

In our example, we considered a worst-case scenario with 30 days of retention, but it is a piece of good advice to keep the history only for 7 days or even less in large environments. If we perform a basic calculation of an item that is updated every 60 seconds and has its history preserved for 7 days, it will generate (update interval) * (hours in a day) * (number of days in history) =60*24*7=10,080.

This mean that, for each item, we will have 10,080 lines in a week, and that gives us an idea of the number of lines that we will produce on our database.

The following screenshot represents the details of a single item:

Some considerations about housekeeping

Housekeeping can be quite a heavy process. As the database grows, housekeeping will require more and more time to complete his/her work. This issue can be sorted using the delete_history() database function.

There is a way to deeply improve the housekeeping performance and fix this performance drop. The heaviest tables are: history, history_uint, trends, and trends_uint.

A solution is PostgreSQL table partitioning and the partitioning of the entire table on a monthly basis. The following figure displays the standard and nonpartitioned history table on the database:

The following figure shows how a partitioned history table will be stored in the database:

Partitioning is basically the splitting of a large logical table into smaller physical pieces. This feature can provide several benefits:

  • The performance of queries can be improved dramatically in situations where there is heavy access of the table's rows in a single partition.

  • The partitioning will reduce the index size, making it more likely to fit in the memory of the parts that are being used heavily.

  • Massive deletes can be accomplished by removing partitions, instantly reducing the space allocated for the database without introducing fragmentation and a heavy load on index rebuilding. The delete partition command also entirely avoids the vacuum overhead caused by a bulk delete.

  • When a query updates or requires access to a large percentage of the partition, using a sequential scan is often more efficient than using the index with random access or scattered reads against that index.

All these benefits are only worthwhile when a table is very large. The strongpoint of this kind of architecture is that the RDBMS will directly access the needed partition, and the delete will simply be a delete of a partition. Partition deletion is a fast process and requires few resources.

Unfortunately, Zabbix is not able to manage the partitions, so we need to disable the housekeeping and use an external process to accomplish the housekeeping.

The partitioning approach described here has certain benefits compared to the other partitioning solutions:

  • This does not require you to prepare the database to partition it with Zabbix

  • This does not require you to create/schedule a cron job to create the tables in advance

  • This is simpler to implement than other solutions

This method will prepare partitions under the desired partitioning schema with the following convention:

  • Daily partitions are in the form of partitions.tablename_pYYYYMMDD

  • Monthly partitions are in the form of partitions.tablename_pYYYYMM

All the scripts here described are available at https://github.com/smartmarmot/Mastering_Zabbix.

To set up this feature, we need to create a schema where we can place all the partitioned tables; then, within a psql section, we need to run the following command:

CREATE SCHEMA partitions AUTHORIZATION zabbix;

Now, we need a function that will create the partition. So, to connect to Zabbix, you need to run the following code:

CREATE OR REPLACE FUNCTION trg_partition()
RETURNS TRIGGER AS
$BODY$
DECLARE
    prefix text:= 'partitions.';
    timeformat text;
    selector text;
    _interval INTERVAL;
    tablename text;
    startdate text;
    enddate text;
    create_table_part text;
    create_index_part text;
BEGIN
selector = TG_ARGV[0];
IF selector = 'day'
    THEN
    timeformat:= 'YYYY_MM_DD';
ELSIF selector = 'month'
    THEN
    timeformat:= 'YYYY_MM';
END IF;

_interval:= '1 ' || selector;
tablename:= TG_TABLE_NAME || '_p' || TO_CHAR(TO_TIMESTAMP(NEW.clock), timeformat);

EXECUTE 'INSERT INTO ' || prefix || quote_ident(tablename) || ' SELECT ($1).*'
USING NEW;
RETURN NULL;

EXCEPTION
    WHEN undefined_table THEN

startdate:= EXTRACT(epoch FROM date_trunc(selector, TO_TIMESTAMP(NEW.clock)));
enddate:= EXTRACT(epoch FROM date_trunc(selector, TO_TIMESTAMP(NEW.clock) + _interval));

create_table_part:= 'CREATE TABLE IF NOT EXISTS ' || prefix || quote_ident(tablename) || ' (CHECK ((clock >= ' || quote_literal(startdate) || ' AND clock < ' || quote_literal(enddate) || '))) INHERITS (' || TG_TABLE_NAME || ')';
create_index_part:= 'CREATE INDEX ' || quote_ident(tablename) || '_1 on ' || prefix || quote_ident(tablename) || '(itemid,clock)';

EXECUTE create_table_part;
EXECUTE create_index_part;

--insert it again
EXECUTE 'INSERT INTO ' || prefix || quote_ident(tablename) || ' SELECT ($1).*'
USING NEW;
RETURN NULL;

END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION trg_partition()
OWNER TO zabbix;

Note

Please ensure that your database has been set up with the user Zabbix. If you're using a different role/account, please change the last line of the script accordingly:

ALTER FUNCTION trg_partition()
OWNER TO <replace with your database owner here>;

Now, we need a trigger connected to each table that we want to separate. This trigger will run an INSERT statement, and if the partition is not ready or created yet, the function will create the partition right before the INSERT statement:

CREATE TRIGGER partition_trg BEFORE INSERT ON historyFOR EACH ROW EXECUTE PROCEDURE trg_partition('day');
CREATE TRIGGER partition_trg BEFORE INSERT ON history_syncFOR EACH ROW EXECUTE PROCEDURE trg_partition('day');
CREATE TRIGGER partition_trg BEFORE INSERT ON history_uintFOR EACH ROW EXECUTE PROCEDURE trg_partition('day');
CREATE TRIGGER partition_trg BEFORE INSERT ON history_str_syncFOR EACH ROW EXECUTE PROCEDURE trg_partition('day');
CREATE TRIGGER partition_trg BEFORE INSERT ON history_logFOR EACH ROW EXECUTE PROCEDURE trg_partition('day');
CREATE TRIGGER partition_trg BEFORE INSERT ON trendsFOR EACH ROW EXECUTE PROCEDURE trg_partition('month');
CREATE TRIGGER partition_trg BEFORE INSERT ON trends_uintFOR EACH ROW EXECUTE PROCEDURE trg_partition('month');

At this point, we miss only the housekeeping function that will replace the one built in Zabbix and disable Zabbix's native one. The function that will handle housekeeping for us is as follows:

CREATE OR REPLACE FUNCTION delete_partitions(intervaltodelete INTERVAL, tabletype text)
  RETURNS text AS
$BODY$
DECLARE
result RECORD ;
prefix text := 'partitions.';
table_timestamp TIMESTAMP;
delete_before_date DATE;
tablename text;
BEGIN
    FOR result IN SELECT * FROM pg_tables WHERE schemaname = 'partitions' LOOP
        table_timestamp := TO_TIMESTAMP(substring(result.tablename FROM '[0-9_]*$'), 'YYYY_MM_DD');
        delete_before_date := date_trunc('day', NOW() - intervalToDelete);
        tablename := result.tablename;
        IF tabletype != 'month' AND tabletype != 'day' THEN
      RAISE EXCEPTION 'Please specify "month" or "day" instead of %', tabletype;
        END IF;
     --Check whether the table name has a day (YYYY_MM_DD) or month (YYYY_MM) format
        IF LENGTH(substring(result.tablename FROM '[0-9_]*$')) = 10 AND tabletype = 'month' THEN
            --This is a daily partition YYYY_MM_DD
            -- RAISE NOTICE 'Skipping table % when trying to delete "%" partitions (%)', result.tablename, tabletype, length(substring(result.tablename from '[0-9_]*$'));
            CONTINUE;
        ELSIF LENGTH(substring(result.tablename FROM '[0-9_]*$')) = 7 AND tabletype = 'day' THEN
            --this is a monthly partition
            --RAISE NOTICE 'Skipping table % when trying to delete "%" partitions (%)', result.tablename, tabletype, length(substring(result.tablename from '[0-9_]*$'));
            CONTINUE;
        ELSE
            --This is the correct table type. Go ahead and check if it needs to be deleted
      --RAISE NOTICE 'Checking table %', result.tablename;
        END IF;
  IF table_timestamp <= delete_before_date THEN
         RAISE NOTICE 'Deleting table %', quote_ident(tablename);
         EXECUTE 'DROP TABLE ' || prefix || quote_ident(tablename) || ';';
  END IF;
    END LOOP;
RETURN 'OK';
 END;
 $BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION delete_partitions(INTERVAL, text)
  OWNER TO zabbix;

Now you have the housekeeping ready to run. To enable housekeeping, we can use crontab by adding the following entries:

@daily psql –h<your database host here> -d zabbix_db -q -U zabbix -c "SELECT delete_partitions('7 days', 'day')"@daily psql  –h<your database host here> -d zabbix_db -q -U zabbix -c "SELECT delete_partitions('24 months', 'month')"

Those two tasks should be scheduled on the database server's crontab. In this example, we will keep the history of 7 days and trends of 24 months.

Now, we can finally disable the Zabbix housekeeping. To disable the housekeeping on Zabbix 2.4, the best way is use the web interface by selecting Administration | General | Housekeeper, and there, you can disable the housekeeping for the Trends and History tables, as shown in the following screenshot:

Now the built-in housekeeping is disabled, and you should see a lot of improvement in the performance. To keep your database as lightweight as possible, you can clean up the following tables:

  • acknowledges

  • alerts

  • auditlog

  • events

  • service_alarms

Once you have chosen your own retention, you need to add a retention policy; for example, in our case, it will be 2 years of retention. With the following crontab entries, you can delete all the records older than 63072000 (2 years expressed in seconds):

@daily psql -q -U zabbix -c "delete from acknowledges where clock < (SELECT (EXTRACT( epoch FROM now() ) - 63072000))"
@daily psql -q -U zabbix -c "delete from alerts where clock < (SELECT (EXTRACT( epoch FROM now() ) - 63072000))"
@daily psql -q -U zabbix -c "delete from auditlog where clock < (SELECT (EXTRACT( epoch FROM now() ) - 62208000))"
@daily psql -q -U zabbix -c "delete from events where clock < (SELECT (EXTRACT( epoch FROM now() ) - 62208000))"
@daily psql -q -U zabbix -c "delete from service_alarms where clock < (SELECT (EXTRACT( epoch FROM now() ) - 62208000))"

To disable housekeeping, we need to drop the triggers created:

DROP TRIGGER partition_trg ON history;
DROP TRIGGER partition_trg ON history_sync;
DROP TRIGGER partition_trg ON history_uint;
DROP TRIGGER partition_trg ON history_str_sync;
DROP TRIGGER partition_trg ON history_log;
DROP TRIGGER partition_trg ON trends;
DROP TRIGGER partition_trg ON trends_uint;

All those changes need to be tested and changed/modified as they fit your setup. Also, be careful and back up your database.

The web interface

The web interface installation is quite easy; there are certain basic steps to execute. The web interface is completely written in PHP, so we need a web server that supports PHP; in our case, we will use Apache with the PHP support enabled.

The entire web interface is contained inside the php folder at frontends/php/ that we need to copy on our htdocs folder:

/var/www/html

Use the following commands to copy the folders:

# mkdir <htdocs>/zabbix
# cd frontends/php
# cp -a . <htdocs>/zabbix

Note

Be careful—you might need proper rights and permissions as all those files are owned by Apache and they also depend on your httpd configuration.

The web wizard – frontend configuration

Now, from your web browser, you need to open the following URL:

http://<server_ip_or_name>/zabbix

The first screen that you will meet is a welcome page; there is nothing to do there other than to click on Next. When on the first page, you may get a warning on your browser that informs you that the date / time zone is not set. This is a parameter inside the php.ini file. All the possible time zones are described on the official PHP website at http://www.php.net/manual/en/timezones.php.

The parameter to change is the date/time zone inside the php.ini file. If you don't know the current PHP configuration or where it is located in your php.ini file, and you need detailed information about which modules are running or the current settings, then you can write a file, for example, php-info.php, inside the Zabbix directory with the following content:

<?phpphpinfo();phpinfo(INFO_MODULES);
?>

Now point your browser to http://your-zabbix-web-frontend/zabbix/php-info.php.

You will have your full configuration printed out on a web page. The following screenshot is more important; it displays a prerequisite check, and, as you can see, there is at least one prerequisite that is not met:

On standard Red-Hat/CentOS 6.6, you only need to set the time zone; otherwise, if you're using an older version, you might have to change the following prerequisites that most likely are not fulfilled:

PHP option post_max_size           8M    16M    Fail
PHP option max_execution_time      30    300    Fail
PHP option max_input_time          60    300    Fail
PHP bcmath                         no           Fail
PHP mbstring                       no           Fail
PHP gd  unknown                          2.0    Fail
PHP gd PNG support                 no           Fail
PHP gd JPEG support                no           Fail
PHP gd FreeType support            no           Fail
PHP xmlwriter                      no           Fail
PHP xmlreader                      no           Fail

Most of these parameters are contained inside the php.ini file. To fix them, simply change the following options inside the /etc/php.ini file:

[Date]
; Defines the default timezone used by the date functions
; http://www.php.net/manual/en/datetime.configuration.php#ini.date.timezone
date.timezone = Europe/Rome

; Maximum size of POST data that PHP will accept.
; http://www.php.net/manual/en/ini.core.php#ini.post-max-size
post_max_size = 16M

; Maximum execution time of each script, in seconds
; http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time
max_execution_time = 300

; Maximum amount of time each script may spend parsing request data. It's a good
; idea to limit this time on productions servers in order to eliminate unexpectedly
; long running scripts.
; Default Value: -1 (Unlimited)
; Development Value: 60 (60 seconds)
; Production Value: 60 (60 seconds)
; http://www.php.net/manual/en/info.configuration.php#ini.max-input-time
max_input_time = 300

; Maximum amount of time each script may spend parsing request data. It's a good
; idea to limit this time on productions servers in order to eliminate unexpectedly
; long running scripts.
; Default Value: -1 (Unlimited)
; Development Value: 60 (60 seconds)
; Production Value: 60 (60 seconds)
; http://www.php.net/manual/en/info.configuration.php#ini.max-input-time
max_input_time = 300

To solve the issue of the missing library, we need to install the following packages:

  • php-xml

  • php-bcmath

  • php-mbstring

  • php-gd

We will use the following command to install these packages:

# yum install php-xml php-bcmath php-mbstring php-gd

The whole list or the prerequisite list is given in the following table:

Prerequisite

Min value

Solution

PHP Version

5.3.0

 

PHP memory_limit

128M

In php.ini, change memory_limit=128M.

PHP post_max_size

16M

In php.ini, change post_max_size=16M.

PHP upload_max_filesize

2M

In php.ini, change upload_max_filesize=2M.

PHP max_execution_time option

300 Seconds

In php.ini, change max_execution_time=300.

PHP max_input_time option

300 seconds

In php.ini, change max_input_time=300.

PHP session.auto_start

Disabled

In php.ini, change session.auto_start=0.

bcmath

 

Use php-bcmath extension

mbstring

 

Use php-mbstring extension

PHP mbstring.func_overload

Must be disabled

In php.ini change

mbstring.func_overload = 0.

PHP always_populate_raw_post_data

Must be set to -1

In php.ini change

always_populate_raw_post_data = -1.

sockets

 

This extension is required for user script support:

php-net-socket module

gd

 

The PHP GD extension must support PNG images (--with-png-dir), JPEG (--with-jpeg-dir) images, and FreeType 2 (--with-freetype-dir)

libxml

2.6.15

Use php-xml or php5-dom

xmlwriter

 

Use php-xmlwriter

xmlreader

 

Use php-xmlreader

ctype

 

Use php-ctype

session

 

Use php-session

gettext

 

Use php-gettext. Since 2.2.1 is not a mandatory requirement, anyway, you can have issues with the GUI translations

Every time you change a php.ini file or install a PHP extension, the httpd service needs a restart to get the change. Once all the prerequisites are met, we can click on Next and go ahead. On the next screen, we need to configure the database connection. We simply need to fill out the form with the username, password, IP address, or hostname and specify the kind of database server we are using, as shown in the following screenshot:

If the connection is fine (this can be checked with a test connection), we can proceed to the next step. Here, you only need to set the proper database parameters to enable the web GUI to create a valid connection, as shown in the following screenshot:

There is no check for the connection available on this page, so it is better to verify that it is possible to reach the Zabbix server from the network. In this form, it is necessary to fill Host (or IP address) of our Zabbix server. Since we are installing the infrastructure on three different servers, we need to specify all the parameters and verify that the Zabbix server port is available on the outside of the server.

Once we fill this form, we can click on Next. After this, the installation wizard prompts us to view Pre-Installation summary, which is a complete summary of all the configuration parameters. If all is fine, just click on Next; otherwise, we can go back and change our parameters. When we go ahead, we see that the configuration file has been generated (for example, in this installation the file has been generated in /usr/share/zabbix/conf/zabbix.conf.php).

It can happen that you may get an error instead of a success notification, and most probably, it is about the directory permission on our conf directory at /usr/share/zabbix/conf. Remember to make the directory writable to the httpd user (normally, Apache is writable) at least for the time needed to create this file. Once this step is completed, the frontend is ready and we can perform our first login.

Capacity planning with Zabbix

Quite often, people mix up the difference between capacity planning and performance tuning. Well, the scope of performance tuning is to optimize the system you already have in place for better performance. Using your current performance acquired as a baseline, capacity planning determines what your system needs and when it is needed. Here, we will see how to organize our monitoring infrastructure to achieve this goal and provide us a with useful baseline. Unfortunately, this chapter cannot cover all the aspects of this argument; we should have one whole book about capacity planning, but after this section, we will look at Zabbix with a different vision and will be aware of what to do with it.

The observer effect

Zabbix is a good monitoring system because it is really lightweight. Unfortunately, every observed system will spend a bit of its resources to run the agent that acquires and measures data and metrics against the operating system, so it is normal if the agent introduces a small (normally very small) overhead on the guest system. This is known as the observer effect. We can only accept this burden on our server and be aware that this will introduce a slight distortion in data collection, bearing in mind that we should keep it lightweight to a feasible extent while monitoring the process and our custom checks.

Deciding what to monitor

The Zabbix agent's job is to collect data periodically from the monitored machine and send metrics to the Zabbix server (that will be our aggregation and elaboration server). Now, in this scenario, there are certain important things to consider:

  • What are we going to acquire?

  • How are we going to acquire these metrics (the way or method used)?

  • What is the frequency with which this measurement is performed?

Considering the first point, it is important to think what should be monitored on our host and the kind of work that our host will do; or, in other words, what function it will serve.

There are some basic metrics of operating systems that are, nowadays, more or less standardized, and those are: the CPU workload, percentage of free memory, memory usage details, usage of swap, the CPU time for a process, and all this family of measure, all of them are built-in on the Zabbix agent.

Having a set of items with built-in measurement means that they are optimized to produce as little workload as possible on the monitored host; the whole of Zabbix's agent code is written in this way.

All the other metrics can be divided by the service that our server should provide.

Note

Here, templates are really useful! (Also, it is an efficient way to aggregate our metrics by type.)

Doing a practical example and considering monitoring the RDBMS, it will be fundamental to acquire:

  • All the operating system metrics

  • Different custom RDBMS metrics

Our different custom RDBMS metrics can be: the number of users connected, the use of cache systems, the number of full table scans, and so on.

All those kinds of metrics will be really useful and can be easily interpolated and compared against the same time period in a graph. Graphs have some strongpoints:

  • They are useful to understand (also from the business side)

  • It is often nice to present and integrate on slides to enforce our speech

Coming back to our practical example, well, currently we are acquiring data from our RDBMS and our operating system. We can compare the workload of our RDBMS and see how this reflects the workload against our OS. Now?

Most probably, our core business is the revenue of a website, merchant site, or a web application. We assume that we need to keep a website in a three-tier environment under control because it is quite a common case. Our infrastructure will be composed of the following actors:

  • A web server

  • An application server

  • The RDBMS

In real life, most probably, this is the kind of environment that Zabbix will be configured in. We need to be aware that every piece and every component that can influence our service should be measured and stored inside our Zabbix monitoring system. Generally, we can consider it to be quite normal to see people with a strong system administration background to be more focused on operating system-related items as well. We also saw people writing Java code that needs to be concentrated on some other obscure measure, such as the number of threads. The same kind of reasoning can be done if the capacity planner talks with a database administrator or a specific guy from every sector.

This is a quite important point because the Zabbix implementer should have a global vision and should remember that, when buying new hardware, the interface will most likely be a business unit.

This business unit very often doesn't know anything about the number of threads that our system can support but will only understand customer satisfaction, customer-related issues, and how many concurrent users we can successfully serve.

Having said that, it is really important to be ready to talk in their language, and we can do that only if we have certain efficient items to graph.

Defining a baseline

Now, if we look at the whole infrastructure from a client's point of view, we can think that if all our pages are served in a reasonable time, the browsing experience will be pleasant.

Our goal in this case is to make our clients happy and the whole infrastructure reliable. Now, we need to have two kinds of measures:

  • The one felt from the user's side (the response time of our web pages)

  • Infrastructure items related to it

We need to quantify the response time related to the user's navigation, and we need to know how much a user can wait in front of a web page to get a response, keeping in mind that the whole browsing experience needs to be pleasant. We can measure and categorize our metrics with these three levels of response time:

  • 0.2 seconds: It gives the feel of an instantaneous response. The user feels the browser reaction was caused by him/her and not from a server with a business logic.

  • 1-2 seconds: The user feels that the browsing is continuous, without any interruption. The user can move freely rather than waiting for the pages to load.

  • 10 seconds: The likes for our website will drop. The user will want better performance and can definitely be distracted by other things.

Now, we have our thresholds and we can measure the response of a web page during normal browsing, and in the meantime, we can set a trigger level to warn us when the response time is more than two seconds for a page.

Now we need to relate that to all our other measures: the number of users connected, the number of sessions in our application server, and the number of connections to our database. We also need to relate all our measures to the response time and the number of users connected. Now, we need to measure how our system is serving pages to users during normal browsing.

This can be defined as a baseline. It is where we currently are and is a measure of how our system is performing under a normal load.

Load testing

Now that we know how we are, and we have defined the threshold for our goal, along with the pleasant browsing experience, let's move forward.

We need to know which one is our limit and, more importantly, how the system should reply to our requests. Since we can't hire a room full of people that can click on our website like crazy, we need to use software to simulate this kind of behavior. There is interesting open source software that does exactly this. There are different alternatives to choose from—one of them is Siege (https://www.joedog.org/2013/07/siege-3-0-3-url-encoding/).

Seige permits us to simulate a stored browser history and load it on our server. We need to keep in mind that users, real users, will never be synchronized between them. So, it is important to introduce a delay between all the requests. Remember that if we have a login, then we need to use a database of users because application servers cache their object, and we don't want to measure how good the process is in caching them. The basic rule is to create a real browsing scenario against our website, so users who login can log out with just a click and without any random delay.

The stored scenarios should be repeated x times with a growing number of users, meaning Zabbix will store our metrics, and, at a certain point, we will pass our first threshold (1-2 seconds per web page). We can go ahead until the response time reaches the value of our second threshold. There is no way to see how much load our server can take, but it is well known that appetite comes with eating, so I will not be surprised if you go ahead and load your server until it crashes one of the components of your infrastructure.

Drawing graphs that relate the response time to the number of users on a server will help us to see whether our three-tier web architecture is linear or not. Most probably, it will grow in a linear pattern until a certain point. This segment is the one on which our system is performing fine. We can also see the components inside Zabbix, and from this point, we can introduce a kind of delay and draw some conclusions.

Now, we know exactly what to expect from our system and how the system can serve our users. We can see which component is the first that suffers the load and where we need to plan a tuning.

Capacity planning can be done without digging and going deep into what to optimize. As we said earlier, there are two different tasks—performance tuning and capacity planning—that are related, of course, but different. We can simply review our performance and plan our infrastructure expansion.

Tip

A planned hardware expansion is always cheaper than an unexpected, emergency hardware improvement.

We can also perform performance tuning, but be aware that there is a relation between the time spent and the performance obtained, so we need to understand when it is time to stop our performance tuning, as shown in the following graph:

Forecasting the trends

One of the most important features of Zabbix is the capacity to store historical data. This feature is of vital importance during the task of predicting trends. Predicting our trends is not an easy task and is important considering the business that we are serving, and when looking at historical data, we should see whether there are repetitive periods or whether there is a sort of formula that can express our trend.

For instance, it is possible that the online web store we are monitoring needs more and more resources during a particular period of the year, for example, close to public holidays if we sell travels. While doing a practical example, you can consider the used space on a specific server disk. Zabbix gives us the export functionality to get our historical data, so it is quite easy to import them in a spreadsheet. Excel has a curve fitting option that will help us a lot. It is quite easy to find a trend line using Excel that will tell us when we are going to exhaust all our disk space. To add a trend line into Excel, we need to create, at first, a "scatter graph" with our data; here, it is also important to graph the disk size. After this, we can try to find a mathematical equation that is more close to our trend. There are different kinds of formulae that we can choose; in this example, I used a linear equation because the graphs are growing with a linear relation.

Note

The trend line process is also known as the curve fitting process.

The graph that comes out from this process permits us to know, with a considerable degree of precision, when we will run out of space.

Now, it is clear how important it is to have a considerable amount of historical data, bearing in mind the business period and how it influences data.

Tip

It is important to keep track of the trend/regression line used and the relative formula with the R-squared value so that it is possible to calculate it with precision and, if there aren't any changes in trends, when the space will be exhausted.

The graph obtained is shown in the following screenshot, and from this graph, it is simple to see that if the trends don't change, we are going to run out of space on June 25, 2015:

 

Summary


In this chapter, we completed a Zabbix setup in a three-tier environment. This environment is a good starting point to handle all the events generated from a large or very large environment.

In the next chapter, you will go deep into nodes, proxies, and all possible infrastructure evolution, and, as you will see, all of them are an improvement on the initial setup. This does not mean that the extensions described in the next chapter are easy to implement, but all the infrastructural improvements use this three-tier setup as a starting point. Basically, in the next chapter, you will learn how to expand and evolve this setup and also see how the distributed scenarios can be integrated into our installation. The next chapter will also include an important discussion about security in a distributed environment, making you aware of the possible security risks that may arise in distributed environments.

About the Author

  • Andrea Dalle Vacche

    Andrea Dalle Vacche is a highly skilled IT professional with over 15 years of industry experience.

    He graduated from Univeristà degli Studi di Ferrara with an information technology certification. This laid the technology foundation that Andrea has built on ever since. He has acquired various other industry-respected accreditations from big players in the IT industry, which include Cisco, Oracle, ITIL, and of course, Zabbix. He also has a Red Hat Certified Engineer certification. Throughout his career, he has worked on many large-scale environments, often in roles that have been very complex, on a consultant basis. This has further enhanced his growing skillset, adding to his practical knowledge base and concreting his appetite for theoretical technical studying.

    Andrea's love for Zabbix came from the time he spent in the Oracle world as a database administrator/developer. His time was mainly spent on reducing "ownership costs" with specialization in monitoring and automation. This is where he came across Zabbix and the technical and administrative flexibility that it offered. With this as a launch pad, Andrea was inspired to develop Orabbix, the first piece of open source software to monitor Oracle that is completely integrated with Zabbix. He has published a number of articles on Zabbix-related software, such as DBforBIX. His projects are publicly available on his website at http://www.smartmarmot.com.

    Currently, Andrea is working as a senior architect for a leading global investment bank in a very diverse and challenging environment. His involvement is very wide ranging, and he deals with many critical aspects of the Unix/Linux platforms and pays due diligence to the many different types of third-party software that are strategically aligned to the bank's technical roadmap.

    Andrea also plays a critical role within the extended management team for the security awareness of the bank, dealing with disciplines such as security, secrecy, standardization, auditing, regulator requirements, and security-oriented solutions.

    In addition to this book, he has also authored the following books:

    • Mastering Zabbix, Packt Publishing
    • Zabbix Network Monitoring Essentials, Packt Publishing

    Browse publications by this author

Latest Reviews

(6 reviews total)
Jcjcjcjcjcjcjvjcjcjcjcjcjcjcjcjcjcjcjc
It's very good. What else I can say?
Good book and a good start to Zabbix monitoring

Recommended For You

Book Title
Unlock this full book with a FREE 10-day trial
Start Free Trial