Zabbix agent is easy to extend for data collection with a feature called userparameters. We figured out how they work in the article Create your own items – extend the agent with userparameters. Unfortunately, userparameters sometimes don’t work right away, and not always the cause is obvious. In this article we will explore the most common issues in more detail and learn to use simple methods to debug Zabbix agent userparameters.
If Zabbix keeps on surprising you with its notifications, you might want to try the Action Simulator! The Action Simulator is a community patch that helps you to figure out whether your actions really do as you intend. It first came out for Zabbix 2.0 in 2013 and was downloaded by hundreds of users from all around the world.
The following article gives a brief introduction to the Action Simulator and explains the challenges of developing it for Zabbix 3.2.
Testing of new Zabbix items, triggers, actions, etc is always easier on a separate test instance, which is the reason why we have a few test Zabbix servers. These test servers are usually behind our firewall, but a few weeks ago we found that one test instance wasn’t. To make things even worse, it had the default admin credentials. This wasn’t a big issue, because it was isolated from the rest of our hosts, but it was interesting what happened on that server.
The way we found out that the server was compromised was that it was using 100% CPU. The process which was using all the CPU was a process which we never seen before, nor did any of us ever configure it, and of course it was run by the zabbix user. We killed it instantly, and after some digging around we found out that the executable file was used as an agent for some data mining service on which you can rent computing power to do some tasks.
Zabbix was this reliable friend, always sending you an email, SMS or both when something went down. It sometimes sent you a lot of emails, but you never got angry at Zabbix about that – it was just eager to help you, make sure you did not miss the weekly disaster. But then… last week… Zabbix did not send you an SMS. It did not send you an email. It did not telepathically inform you. But things were DOWN. Server was not RESPONDING.
Zabbix knew about this. As you review the data, sitting in a dark room, the graphs clearly show the downtime. But there was-no-alert. How is that possible? Wait, what, this is impossible. You can see on the glowing screen that the main action, a crucial piece in getting those alerts, is disabled. That just cannot be, as nobody, NOBODY would ever disable that. How, oh how. Why, oh why.