Triggers in Zabbix 1.8

Exclusive offer: get 50% off this eBook here
Zabbix 1.8 Network Monitoring

Zabbix 1.8 Network Monitoring — Save 50%

Monitor your network hardware, servers, and web performance effectively and efficiently

$26.99    $13.50
by Rihards Olups | March 2010 | Networking & Telephony Open Source

In this article by Rihards Olups, author of Zabbix 1.8 Network Monitoring, we will discuss triggers in detail which will include Trigger dependencies, Constructing trigger expressions, Event details, and Event generation and hysteresis.

Triggers are things that "fire". They are the ones that look at item data and raise a flag when the data does not fit whatever condition is defined. As we discussed before, simply gathering data is nice, but awfully inadequate. If you want anything past historical data gathering, including notifications—there would have to be a person looking at all the data all the time, so we have to define thresholds at which we want the condition to be considered worth looking into. Triggers provide a way to define what those conditions are.

Earlier, we created a single trigger that was checking the system load on "A Test Host". It checks whether the returned value is larger than a defined threshold. Now, let's check for some other possible problems with a server—for example, when a service is down. The SMTP service going down can be significant, so we will try to look for such an event now. Navigate to Configuration | Hosts, choose Triggers in the first dropdown and click on the Create Trigger button. In the form that opens, we will fill in some values.

  1. Name: The contents of this field will be used to identify the trigger in most places, so it should be human-readable. This time, enter SMTP service is down. Notice how we are describing what the problem actually is. As opposed to an item, which is gathering statuses, a trigger has a specific condition to check, thus the name reflects it. If we had a host that should not ever have a running SMTP service, we could create a trigger named "SMTP service should not be running".
  2. Expression: This is probably the most important factor of a trigger. What is being checked, and for what conditions, will be specified here. Trigger expressions can vary from very simple to complex ones. This time we will create a simple one, and we will also use some help with that. Click the Select button next to the Expression fi eld to open expression building dialog. It has several fields to fill as well, so let's look at what those are.
    • Item: Here, we can specify which item data should be checked. To do that click on the Select button. Another pop up opens. Select Linux servers in the Group dropdown, then select Another Host in the Host dropdown. We are interested in the SMTP service, so click on SMTP server status in the Description column. Pop up will close, and Item field will be populated with the chosen name.
    • Function: Here we can choose the actual check to be performed. Maybe we can try remembering what the SMTP server status item values were—right, "1" was for server running, and "0" was for server down. If we want to check whenever the last value is 0, the default function seems to fit quite nicely, so we won't change it.
    • N: This field allows us to set the constant used in the function above. We want to find out whenever server goes down (or status is "0"), so here the default fits as well.

      zabbix

      With the values set as above, click the Insert button. Expression field is now populated with a trigger expression {Another Host:smtp.last(0)}=0.

    • Severity: There are five severity levels in Zabbix, and an additional "Not classified" severity.

      zabbix

      We will consider this problem to be of an average severity, so choose Average from the dropdown

zabbixz

Before continuing, make sure the SMTP server is running on "Another Host", then click Save. Let's find out how it looks in the overview now—open Monitoring | Overview and make sure Type dropdown has Triggers selected.

zabbix

Great, we can see both hosts now have a trigger defined. As the triggers differ, we also have two unused cells. A newly added trigger will be flashing, thus indicating a recent change.

Let's look at the trigger expression in more detail. It starts with an opening curly brace, and the first parameter is the hostname. Separated with a colon is the item key—smtp here. After the dot comes the more interesting and trigger specific thing—the trigger function. Used here is one of the most common functions, last. It always returns single value from the item history. Here it also has a parameter passed, 0, enclosed in parenthesis. For this particular function, a parameter passed with such syntax is ignored, while it could mean seconds, that would not make much sense (the function returns a single value only). Still, a parameter has to be provided even for functions that ignore it.

But that's not the only parameter syntax this function supports—if the value is prefixed with a hash, it is not ignored. In that case it works like an Nth value  specifier. For example, last(#9) would retrieve the 9th most recent value. As wecan see, last(#1) is equal to last(0). Another overlapping function is prev. As the name might suggest, it returns the previous value, thus prev(0) is the same as last(#2).

Continuing with the trigger expression, curly braces close to represent a string that retrieves some value. Then we have an operator, which in this case is a simple equal sign. Comparison is done with a constant number, zero.

Trigger dependencies

We now have one service being watched. Though there are some more monitored and now we can try to create a trigger for a HTTP server. Go to Configuration | Hosts, click on Triggers next to Another Host, then click on Create Trigger. Fill in the following values:

  • Name: Enter WEB service is down.
  • Expression: Click on Select, then again on Select next to the Item field. Make sure Linux servers is selected in the Group field and Another Host in the host field, then click on WEB server status in the Description column. Both function and its parameter are fine, so click on Insert.

    zabbix

    That inserts the expression {Another Host:net.tcp.service[http,,80]. last(0)}=0.

  • The trigger depends on: Our host runs software that is a bit weird—the web service is a web e-mail frontend, and it goes down whenever the SMTP server is unavailable. This means the web service depends on SMTP service. To configure that, click on Add next to the New dependency. In the resulting window, make sure Linux servers is selected in the Group dropdown and Another Host is selected in the Host dropdown, then click on the only entry in the Description column—SMTP service is down.
  • Severity: Select Average.
  • Comments: Trigger expressions can get very complex. Sometimes the complexity can make it impossible to understand what a trigger is supposed to do without serious dissection. Comments provide a way to help somebody else, or yourself, to understand the thinking behind such complex triggers later. While our trigger still is very simple, we might want to explain the reason for the dependency, so enter something like Web service goes down if SMTP is inaccessible.

zabbix

When you are done, click Save. Notice how, in the trigger list, trigger dependencies are listed in the Name column. This allows for a quick overview of any dependent triggers without opening the details of each trigger individually.

zabbix

With the dependency set up, let's find out whether it changes anything in the frontend. Navigate to Monitoring | Overview.

zabbix

Indeed, the difference is visible immediately. Triggers involved in the dependency have arrows drawn over them. So an upwards arrow means something depends on this trigger or was it the other way around? Luckily, you don't have to memorize that. Move the mouse cursor over the SMTP service is down trigger for Another Host, the upper cell with the arrow.

zabbix

A pop up appears, informing you that there are other triggers dependent on this one. Dependent triggers are listed in the pop up. Now move the mouse cursor one cell below, over the downwards pointing arrow.

zabbix

Let's see what effect other than the arrows does this provide. Open Monitoring | Triggers and make sure both Host and Group dropdowns say all, then bring down web server on "Another Host". Wait for the trigger to fire, look at the entry. Notice how an arrow indicating dependency is displayed here as well. Move the mouse cursor over it again, and the dependency details are displayed in a pop up.

zabbix

Hey, what's up with the Show link in the Comments column? Let's find out—click on it. As can be seen, the comment we provided when creating the trigger is displayed. This allows for easy access to comments from the trigger list both for finding out more information about the trigger and updating the comment as well. Click on Cancel to return to the trigger list. Now, stop the SMTP service on the "Another Host".

Zabbix 1.8 Network Monitoring Monitor your network hardware, servers, and web performance effectively and efficiently
Published: March 2010
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

Wait for the trigger list to update and look at it again. The web server trigger has disappeared from the list, and is replaced by the SMTP server one.  That's because Zabbix does not show dependent triggers if the dependency upstreamtrigger is active. This helps to keep the list short and concentrate on the problems that actually cause other downtime.

zabbix

Trigger dependencies are not limited to a single level. We will now add another trigger to the mix. Before we do that, we'll also create an item that will allow an easy  manual condition change without affecting system services. In the frontend, navigateto Configuration | Hosts, click on Items next to Another Host, then click on Create Item. Fill in the following values:

  • Description: Enter Testfile exists
  • Key: Enter vfs.file.exists[/tmp/testfile].

When you are done, click Save. As the key might reveal, this item simply checks whether a particular file exists and returns 1 if it does, 0 if it does not.

In the bar above the item list, click on Triggers, then click on Create Trigger button. Enter these values:

  • Name: Enter Testfile is missing.
  • Expression: Click on Select, then again Select next to the Item field. In the item list for Another Host, click on Testfile exists in the Description column, then click on Insert (again, the default condition works for us). The Expression field is filled with the following expression:{Another Host: vfs.file.exists[/tmp/testfile].last(0)}=0.
  • Severity: Select Warning.

zabbix

When you are done, click Save. Let's complicate the trigger chain a bit now. Click on the SMTP service is down trigger in the Name column, then click on Add next to the New dependency entry. In the upcoming dialog, click on the Testfile is missing entry in the Name column. This creates a new dependency for the SMTP service trigger.

zabbix

Click Save. Now we have created a dependency chain, consisting of three triggers. "WEB service is down" depends on "SMTP service is down", which in turn depends on "Testfile is missing". Zabbix calculates chained dependencies, so all upstream dependencies are also taken into account when determining the state of a particular trigger—in this case, "WEB service is down" depends on both the other triggers. With Zabbix versions 1.8.2 and latter, this will mean only single trigger being displayed in Monitoring | Triggers section. Now we should get to fixing the problems the monitoring system has identified. Let's start with the one at the top of the dependency chain—the missing file problem. On "Another Host", execute:

$ touch /tmp/testfile

This should deal with the only trigger currently on the trigger list. Wait for the trigger list to update. You will see two triggers, with their status flashing.

Remember, by default Zabbix shows triggers that have recently changed state flashing, and that includes also triggers in the "OK" state.

zabbix

Looking at the list, we see one large difference this time—the SMTP trigger now has two arrows, one pointing up, and the other pointing down. Moving your mouse cursor over them you will discover that they denote the same thing as before—the triggers that this particular trigger depends on or that depend on this trigger. If a trigger is in the middle of the dependency chain, two arrows will appear.

Our testfile trigger worked as expected for the chained dependencies, so we can remove it now. Open Configuration | Hosts, click on Triggers next to Another Host and click on the SMTP service is down trigger in the Name column. Mark the checkbox next to the Testfile is missing entry in the dependencies list, then click Delete selected button. Now click the Save button. Note that you always have to save your changes for the editing form of any entity. In this case, simply removing the dependency would not be enough. If we navigate to some other section without explicitly saving the changes, any modifications will be lost. Now you can also restart any stopped services on "Another Host".

Constructing trigger expressions

So far we have used only very simple trigger expressions, comparing the last value to some constant. Fortunately, that's not all trigger expressions can do. We will now try to create a slightly more complex check.

Let's say we have two servers, "A Test Host" and "Another Host", providing a redundant SFTP service. We would be interested in any one of the services going down. Navigate to Configuration | Hosts and click on Triggers next to either A Test Host or Another Host, then click on the Create Trigger button. Enter these values:

  • Name: Enter One SSH service is down.
  • Expression: Click on the Select button. In the resulting pop up, click Select next to the Item field. Make sure Another Host is selected in the Host dropdown, click on SSH server status item in the Description column, then click Insert.

    Now, position the cursor at the end of the inserted expression and enter "|" without quotes (that's space, the vertical pipe character, space). Again, click on the Select button. In the resulting pop up, click Select next to the Item field. Select A Test Host in the Host dropdown, click on SSH server status item in the Description column. Before inserting the expression, take a look at the N field, though.

    zabbix

    Notice how there's a space and vertical pipe added. Zabbix tried to fill in values as in our first expression and interpreted the previous change as the value to compare to. That's surely incorrect, so remove these additions so there is only a 0 in that field, then click Insert.

  • Severity: Select Average (remember, these are redundant services).

zabbix

When you are done, click Save.

The process we did with the expression (the insertion of two expressions) allowed us to create a more complex expression than simply comparing the value of a singleitem. Instead, two values are compared, and the trigger fires if any one of them matches the condition. That's what the | (vertical pipe character) is for. It works as a logical OR, allowing a single matched condition to fire the trigger. Another logical operator is &, which works like a logical AND. Using the SSH server example trigger, we could create another trigger that would fire whenever both SSH instances go down. Getting the expression is simple, as we just have to modify single symbol, that is change | to &, so that expression looks like this:

{Another Host:net.tcp.service[ssh].last(0)}=0 & {A Test Host:net.tcp.
service[ssh].last(0)}=0

Trigger expressions also support other operators. In all the triggers we created, we used the most common one—the equality operator =. We could also be using a non-equality check – #. That would allow us to reverse the expression like this:

{A Test Host:net.tcp.service[ssh].last(0)}#1

While not useful in these cases, such a check is helpful when the item can have many values and we want it to fire whenever the value isn't the expected one.

Trigger expressions also support the standard mathematical operators +, -, *, /, <, and > so complex calculations and comparisons can be used between item data and constants.

With the service checks we wrote, triggers would fire right away as soon as the service goes down for a single check. That can be undesirable if we know some software will be down for a moment during an upgrade, because of log rotation or backup requirements. We can use a different function to achieve delayed reaction in such cases. Replacing function last with max allows to specify a parameter, and thus react only when the problem has been active for some time. For the trigger to fire only when service has not responded for 5 minutes, we could use an expression like this:

{A Test Host:net.tcp.service[ssh].max(300)}=0

Remember, for functions that accept seconds as a parameter, we can also use the count of returned values by prefixing the number with :

{A Test Host:net.tcp.service[ssh].max(#5)}=0

In this case trigger would always check last five returned values. Such an approach allows the trigger period to scale along in case item interval is changed, but it should not be used for items that can stop sending in data.

Let's create another trigger using a different function. In frontend section Configuration | Hosts choose SNMP Devices in the Group dropdown, click on Triggers next to snmptraps and click on the Create Trigger button, then enter these values:

  • Name: Enter Critical error from SNMP trap
  • Expression: Enter {snmptraps:snmptraps.str(Critical Error)}=1
  • Severity: Choose High

zabbix

When you are done, click Save.

This time we used another function, str. It searches for the specified string in the item data, and returns 1 if found. The match is case sensitive.

This trigger will change into the OK state whenever the last entry for the item does not contain the string specified as the parameter. If we would want to force this trigger to the OK state manually, we could just send a trap that would not contain string the trigger is looking for. A trick like that can also be useful when some other system is sending SNMP traps. In a case when the enabling trap is received successfully, but the disabling trap is lost (because of network connectivity issues, or for any other reason), you might want to use such a fake trap to disable the trigger in question, though in that case you might have to use zabbix_sender so that you can fake the host.

    

Triggers that time out

There are systems that send a trap upon failure, but no recovery trap. In such a case, manually resetting isn't an option. Fortunately, we can construct a trigger expression  that times out by using another function—nodata. This function kicks in when item has received no data for the time specified as the parameter, so the expression that would time out after 10 minutes looks like this:

{snmptraps:snmptraps.str(Critical Error)}=1 & {snmptraps:snmptraps.
nodata(600)}=0

For now we will want to have a more precise control over how this trigger fires, so we won't change the trigger expression.

Human-readable constants

Using plain numeric constants is fine while we deal with small values. When an item collects data that is larger, like disk space or network traffic, such an approach becomes very tedious. You have to calculate desired value, and from looking at it later it is usually not obvious how large it really is. To help here, Zabbix supports so-called suffix multipliers in expressions—abbreviations K, M, and G are supported. That results in shorter and way more easier-to-read trigger expressions. For example, checking a non-existent host disk space goes from:


{host:vfs.fs.size[/,free].last(0)}<16106127360

to

{host:vfs.fs.size[/,free].last(0)}<15G

That's surely easier to read and modify if such a need arises.

We have now covered the basics of triggers in Zabbix. There are many more functions, allowing evaluation of various conditions, that you will want to use later on. Frontend function selector does not contain all of them, so sometimes you will have to look them up and construct expression manually. For a full and up to date function list, refer to the official documentation.

Event details

After we have configured triggers, they generate events, which in turn are acted upon by actions.

But can we see more details about them somewhere? In the frontend, open Monitoring | Events and click on date and time in the Time column for the latest entry with PROBLEM status.

This opens up the event details window, which allows to determine with more confidence event flow. It includes things such as event and trigger details and  action history. The event list that includes previous 20 events itself acts as a control,allowing you to click any of these events and see previous 20 events from the chosen event on. As this list only shows events for a single trigger, it is very handy if one needs to figure out the timeline of one, isolated problem.

zabbix

Event generation and hysteresis

Events are generated whenever a trigger changes state. That's simple, right? But not all trigger state changes can be trapped by an action. A trigger can be in one of the following states:

  • OK – Normal state when trigger expression evaluates to false. Can be acted upon by actions.
  • PROBLEM – A problem state when trigger expression evaluates to true. Can be acted upon by actions.
  • UNKNOWN – A state when Zabbix cannot evaluate trigger expression, usually when there is missing data or trigger has been just modified. Can not be acted upon.

No matter whether the trigger goes from OK to PROBLEM, or UNKNOWN or any other direction, an event is generated.

We found out before that we can use some trigger functions to avoid changing trigger state upon every change in data. By accepting a time period as a parameter these functions allow us to only react if a problem has been going on for a while. But what if we would like to be notified as soon as possible, while still avoiding trigger flapping if values fluctuate near our threshold? Here a specific Zabbix macro helps and allows to construct trigger expressions that have some sort of hysteresis—remembering of a state.

A common case is measuring temperatures. For example, a very simple trigger expression would read:

A Test Host:temp.last(0)>20

It would fire when the temperature was 21, and go to the OK state when it's 20 and  so on. Sometimes temperature fluctuates around the set threshold value, thus trigger goes on and off all the time. That is undesirable, so an improved expression would look like:


({TRIGGER.VALUE}=0&{server:temp.last(0)}>20) | ({TRIGGER.
VALUE}=1&{server:temp.last(0)}>15)

A new macro, TRIGGER.VALUE is used. If the trigger is in the OK state it is 0, if trigger is in the PROBLEM state it is 1. Using the logical operator | (OR), we are stating that this trigger should change to (or remain at) PROBLEM state if:

  • Trigger is currently in the OK state and the temperature exceeds 20 or when
  • Trigger is currently in PROBLEM state and temperature exceeds 15

How does that change the situation when compared to the simple expression that only checked for temperatures over 20 degrees?

zabbix

In this example case we have avoided two unnecessary PROBLEM states, and usually that means at least two notifications as well.

Summary

In this article, we have learnt quite a lot of triggers and its usage in Zabbix 1.8.

If you have read this article you may be interested to view :

Zabbix 1.8 Network Monitoring Monitor your network hardware, servers, and web performance effectively and efficiently
Published: March 2010
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

About the Author :


Rihards Olups

Rihards Olups has over 10 years of experience in IT. He has had a chance to work with various systems, and most of that time has been spent with open source solutions. Exposure to Zabbix, one of the leading open source enterprise class monitoring solutions, was with the first public releases more than nine years ago, which has allowed to gain practical knowledge on the subject.

Previously employed by a government agency, Rihards was mostly involved in open source software deployments ranging from server to desktop grade software, with a big emphasis on Zabbix. More recently the author has joined Zabbix SIA, the company behind the software that this book is about, which has allowed him to gain even more experience with the subject.

Books From Packt

 

Building Telephony Systems with OpenSIPS 1.6
Building Telephony Systems with OpenSIPS 1.6

Funambol Mobile Open Source
Funambol Mobile Open Source

Beginning OpenVPN 2.0.9
Beginning OpenVPN 2.0.9

FreePBX 2.5 Powerful Telephony Solutions
FreePBX 2.5 Powerful Telephony Solutions

Cacti 0.8 Network Monitoring
Cacti 0.8 Network Monitoring

Spring Security 3 [RAW]
Spring Security 3 [RAW

CodeIgniter 1.7 professional development
CodeIgniter 1.7 professional development

Asterisk 1.6
Asterisk 1.6

 

Your rating: None Average: 4.3 (4 votes)

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
9
W
d
M
P
d
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software