Zabbix supports many different ways of monitoring, including agentless, SNMP and IPMI. Zabbix also provides a monitoring agent, which has a great set of built-in items for monitoring diskspace, processes, memory usage and many other things.
While the list of built-in items is growing with each release, there will always be something else we will want to monitor. Luckily, Zabbix agent is very easy to extend with new items by using a feature called userparameters. Zabbix userparameters are commands that the agent runs and expects an item value to be returned.
The built-in agent keys allow to…
- check whether a file exists – key vfs.file.exists
- get memory statistics – key vm.memory.size
- check whether a process is running – key proc.num
Agent has a lot more supported – there are about 80 built-in keys, many supporting several parameters to gather even more various detail. But that won’t always suffice – let’s look at userparameters, a way to add new items into the Zabbix agent.
Adding Zabbix agent userparameters
Let’s start with a trivial example and move forward from that. We will look at the following examples in this article:
- returning a static value
- returning the mail queue length
- returning file ownership information and other detail
As we progress through these, we will discover how there’s nothing Zabbix agent could not gather information about.
Returning a static value
Let’s start with a very simple thing – returning a static value for our new item key. On a system with Zabbix agent daemon installed, edit its configuration file and add the following line:
Restart the Zabbix agent after adding this line. Any changes in the agent configuration file require a restart of the agent. Userparameter syntax is simple:
After the configuration file parameter called UserParameter and an equal sign follows an item key. As we are making a custom item, we come up with this key ourselves. Then, separated by a comma, follows the command that will be executed when this item key will be queried.
From a system that is allowed to query this agent – usually Zabbix server – run:
$ zabbix_get -s <agent_hostname_or_IP> -k userparam.test
This should return 1. Notice how this new custom item looks like any other item from the server side – it could be a new agent version implementing a new built-in item, server would have no idea. Let’s move forward with a bit more advanced custom item.
Gathering mail queue length
On email servers, received but not yet delivered emails pile up in a queue. This queue getting too long indicates some problem. Luckily, we can easily monitor the mail queue with the mailq command. When there are messages, it looks like this:
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient------- 9C4A615F6B* 2634 Tue Jun 13 22:00:23 firstname.lastname@example.org email@example.com 40EA515F72 3051 Tue Jun 13 22:00:24 firstname.lastname@example.org email@example.com -- 6 Kbytes in 2 Requests.
We could get the number of the queued messages by grabbing the lines that start with a number or an uppercase letter (the -c parameter for grep tells it to return the count of matched lines):
$ mailq | grep -c '^[0-9A-Z]'
But when the queue is empty, it would look like this:
$ mailq Mail queue is empty
Our previous command would perceive this as a single message. We could expand the command to exclude this status message:
$ mailq | grep -v "Mail queue is empty" | grep -c '^[0-9A-Z]'
If we take a close look at the mail queue IDs in the output (the alpha-numeric strings for each mail), we will notice that the letters in them do not pass F – it is not just a plain alpha-numeric string but a hex number string. Thus we can simplify our command to:
$ mailq | grep -c '^[0-9A-F]'
With this knowledge in hand, we can add a new userparameter in the agent daemon configuration file:
UserParameter=mail.queue,mailq | grep -c '^[0-9A-F]'
Restart the agent and query it with zabbix_get as before – it will return the number of the messages in the queue.
While some versions of mailq return the number of entries at the end of the output, not all of them do – and the format differs. Counting the mail queue IDs is a more portable solution.
Obtaining file ownership information
Let’s say we’d like to be sure that a critical file is owned by a specific user and group. We could get the ownership information from ls:
$ ls -l /etc/zabbix/zabbix-agentd.conf -rw-r----- 1 root zabbix 7516 aug 11 2016 /etc/zabbix/zabbix-agentd.conf
Parsing that is an extra effort, though. We can get just the interesting information with stat (the -c parameter defines the output format):
$ stat -c '%U' /etc/zabbix/zabbix-agentd.conf root $ stat -c '%G' /etc/zabbix/zabbix-agentd.conf zabbix
We could add two userparameters, one for the user, one for the group. That would be hard to maintain if we wanted to add more values for files, though. Zabbix supports passing values to userparameters same as it is done for the built-in keys (such keys are called flexible as opposed to the ones that do not accept any values or parameters of their own). For that, userparameter definition needs two additional things:
- [*] after the key.
- $1, $2 (and so on) positional references wherever the values should be placed in the command.
UserParameter=vfs.file.stat[*],stat -c '%$1' /etc/zabbix/zabbix-agentd.conf
Even more important, we would want to make the file configurable instead of hardcoding it:
UserParameter=vfs.file.stat[*],stat -c '%$1' '$2'
Now we can query the agent for user and group information for any arbitrary file:
zabbix_get -s <agent_hostname_or_IP> -k vfs.file.stat[G,/etc/zabbix/zabbix-agentd.conf] zabbix
It works the same way for directories:
zabbix_get -s <agent_hostname_or_IP> -k vfs.file.stat[U,/] root
Querying additional file information
We could have added a translation for U and G parameters so that user and group can be passed to the key – but our current approach allows to pass any format code stat supports. We can retrieve file (or directory) permissions, time since last modification, SELinux security context and a lot of other information. A few selected format codes:
- %a access rights in octal
- %A access rights in human readable form
- %g group ID of owner
- %u user ID of owner
- %Y time of last modification, seconds since Epoch
And a few examples of querying those:
$ zabbix_get -s <agent_hostname_or_IP> -k vfs.file.stat[a,/path/file] 644 $ zabbix_get -s <agent_hostname_or_IP> -k vfs.file.stat[A,/path/file] -rw-r--r-- $ zabbix_get -s <agent_hostname_or_IP> -k vfs.file.stat[Y,/path/file] 1497387353
See man stat for a full list of supported format codes.
Important to remember
- Zabbix agent must be restarted after changing its configuration file.
- While our examples were very simple and contained in the agent daemon configuration file, in many cases the data collection will be more complicated and will be split out in a separate script. If we modify a script called by the userparameter, no restart is needed – the next time agent runs the script, the changes would be picked up.
- Same as with built-in items, access rights have to be taken into the account. For example, if filesystem permissions do not allow the zabbix user to access some file, we won’t be able to get file information.
- Environment variables are not initialised. If your command or script relies on any environment variables, make sure to set those.
- Similarly, be aware of what is the default shell on your system. If your script needs a different one, include shebang.
- Make sure your script executes quickly. If it takes more than a second, it is probably not suited to be used as a userparameter.
- Multiple values cannot be sent from userparameters. Single invocation, single value. Ability to return multiple values might or might not appear in Zabbix 3.4.
- A temporary failure (or “no value this time”) cannot be returned – at best, ZBX_NOTSUPPORTED string can be returned, which makes the item “not supported” and disables that item for 10 minutes by default. Follow the ZBX-12050 bugreport if you are interested in a feature that would allow returning an error message without disabling the item.
While there are some other gotchas with userparameters, the above covers the most popular issues.