Zabbix network discovery is a way to scan a network range and automatically start monitoring the discovered devices and services. Network discovery can look for basic services like HTTP, IMAP or SMTP. It can also scan the specified network range for SNMP, Zabbix agent or ICMP ping responses.
Constant network scanning is not desirable, thus the discovery is usually set to run not too frequently – once every few hours to once every few days. If you have added a new Zabbix action for network discovery that links a fresh template, there is no way to see when will the discovery rule run next time, and no way to force it to run sooner.
Zabbix agent is easy to extend for data collection with a feature called userparameters. We figured out how they work in the article Create your own items – extend the agent with userparameters. Unfortunately, userparameters sometimes don’t work right away, and not always the cause is obvious. In this article we will explore the most common issues in more detail and learn to use simple methods to debug Zabbix agent userparameters.
Zabbix supports many different ways of monitoring, including agentless, SNMP and IPMI. Zabbix also provides a monitoring agent, which has a great set of built-in items for monitoring diskspace, processes, memory usage and many other things.
While the list of built-in items is growing with each release, there will always be something else we will want to monitor. Luckily, Zabbix agent is very easy to extend with new items by using a feature called userparameters. Zabbix userparameters are commands that the agent runs and expects an item value to be returned.
Applications – software applications – mean a specific thing in IT. Usually, that’s a user-oriented piece of software. A web browser, word processor, game. Lately, mobile applications have somewhat lowered the bar, even down to I Am Rich applications.
And then there’s Zabbix with it’s own definition for applications. So what are applications in Zabbix?
Let’s talk bugs. The important Zabbix bugs. What are those? The ones that have the most votes in the Zabbix issue tracker.
There are currently 1308 open bugreports. When we looked at this number back in November last year, it was a hundred less. That’s a pretty huge number, is everything bad? Not really, as some might be duplicates and some might be incorrect reports. Not many, though, as there’s constant grooming going on. Most of the remaining are valid bugreports, but not too critical – some are even as minor as an offset of a few pixels in some page. Still a bug, but something we can live with, mostly. We already looked at the top-voted bugreport, now is the time to glance at others same as we did with feature requests.
The bug must be still unfixed to be important. If a new version of Zabbix comes out and the server crashes for all the users, that is the most important bug. Until it is fixed, hopefully, soon.
But there are some long-standing bugs that linger around just below the “fix-it” surface – they’re not terrible enough to be fixed right away, and somewhat complicated usually. Such bugs can be around for many years, sometimes not even being fixed, but going away because a feature gets dropped completely. We’d need a way to measure which of all those known bugs is the most important. And there is a way to find out – same as with features, users can vote on bugreports. The bugreport with the most votes is titled deadlock between server and frontend.
So you had a cluster monitored. As is common with clusters, you wanted to have some cluster-wide parameter adding. Average CPU load, number of nodes online – something not tied to a single cluster node, thus you created a special host to denote the whole cluster. Then you went to that host, clicked “Create trigger”, specified all the items on individual cluster hosts, clicked “Add”… and the trigger was not there. Mysteriously missing.
Oh, wait. That trigger actually appeared on all the cluster hosts. Is this a bug?
Returning to the events of the Open Source Monitoring Conference 2016, Avishai Ish-Shalom discussed an engineer’s approach to monitoring. David Hustace from OpenNMS told positive stories about this true-opensource monitoring tool.
Zabbix was this reliable friend, always sending you an email, SMS or both when something went down. It sometimes sent you a lot of emails, but you never got angry at Zabbix about that – it was just eager to help you, make sure you did not miss the weekly disaster. But then… last week… Zabbix did not send you an SMS. It did not send you an email. It did not telepathically inform you. But things were DOWN. Server was not RESPONDING.
Zabbix knew about this. As you review the data, sitting in a dark room, the graphs clearly show the downtime. But there was-no-alert. How is that possible? Wait, what, this is impossible. You can see on the glowing screen that the main action, a crucial piece in getting those alerts, is disabled. That just cannot be, as nobody, NOBODY would ever disable that. How, oh how. Why, oh why.
Monitoring most often deals with IT infrastructure. Sometimes it diverges a bit and starts caring about temperature and humidity, but in most cases that’s still limited to datacentre monitoring. In a talk at the Open Source Monitoring Conference 2016, Antony Stone covered some real world monitoring that goes a bit further than temperature monitoring. On a more classic-IT note, Shlomi Zadok covered system management with Foreman and security/compliance reporting by integrating with OpenSCAP. Let’s see what these fine gentlemen talked about.