Looking back at the Zabbix Conference 2016, day 1

It’s been a few weeks since the Zabbix Conference 2016. If you are considering attending next year, you might want to know – how was it? In one word, great. But that doesn’t tell much, so let’s briefly explore how it went.

The conference started with a talk by Alexei Vladishev, the original author of Zabbix. He shared the improvements in the soon-to-be-released Zabbix 3.2 and the usually-interesting statistics on the conference itself. This year the 3rd biggest number of participants was from the Netherlands, second from France and Russia had the first place. Importantly, he assured all the participants that Zabbix will always be true open source software – also commonly known as Free software.

Zabbix is a True Open Source software and will always be

Alexei reminded that Zabbix 3.2 is a non-LTS release. This means that it is expected to be supported for 7 months only – if you start using 3.2, plan to upgrade early next year. Zabbix 3.2 changes quite a few of core things, and I won’t expand on that here – we’ll return to 3.2 changes at a later point.

Next, Mikhail Serkov from EPAM Systems talked about monitoring High Performance Computing (HPC) clusters, used for biomedical research. He started with an overview of the used tools, including Java, Python and R that help to run more than 500 various scientific solutions. I liked this line:

No magic. Linux boxes, shell scripts on a low level

The flexibility of Zabbix was mentioned as an important factor when monitoring a system with 70TB of RAM, and there were some things being monitored that’s not common in a “normal” environment – for example, statistics of all the individual CPU and GPU cores.

It is often useful to monitor not just low-level system statistics, and at EPAM Systems higher level system health is monitored by adding empty scheduled computing jobs, and checking for how long are they in a pending state.

Zabbix graph showing the pending time for jobs

Mikhail also stressed the importance of having one monitoring tool to keep support and maintenance expenses low.

Next up was a technical topic – Zabbix developer Gleb Ivanovsky explained Zabbix loadable modules. After covering the reasons why somebody would want to write modules (mostly performance), he dived right into showing how would one write and compile a Zabbix module.

While Zabbix has supported loadable modules for data collection since version 2.2, Glebs was in a perfect position to talk about a new feature in Zabbix 3.2 – modules that allow duplicating the collected values on the server side (and yes, on the proxy side as well). And I don’t mean the podium or upright being the perfect position, Glebs was the one who implemented this feature.

Such a feature would be mostly of use for users who not only want to have the monitoring system, but also analyse the collected data on their own. Until now, that was often done by replicating the primary database and then extracting the data from the replica. The ability to directly mirror the collected data in Cassandra or any other tool should make this job much easier.

At this time there were no available modules, though. If you would like to play with this feature, you would have to write such a module yourself from the scratch. I have to wonder whether this image illustrates how easy it is supposed to be.

Zabbix module adding, illustrated by pictures of satellites with solar modules

After a longer-than-usual coffee (tea) break I briefed on the challenges and solutions when using Zabbix in a corporate environment that’s rife with legacy, custom and legacy custom solutions. I’ll expand on those topics in much more detail in further posts here, so won’t cover it right now.

Conference continued with two short talks. In the first one, Sumit Goel from Salesforce discussed cloud application monitoring. Agreeing with Mikhail, he explained the importance of a high level monitoring by having Jenkins-Selenium-driven scenarios to log in and perform various user-like operations.

The top 3 things Sumit stressed as being important to a cloud/SaaS monitoring solution were:

  • security
  • flexibility
  • scalability

The most important features for SaaS: security, flexibility, scalability

There was also the fourth entry of “user experience”, but it looks more impressive this way. He also expressed a lot of love towards the Zabbix API and the ways it allows to automate the configuration.

In the second short talk Wolfgang Alper from IntelliTrend explained the advantages of integrating Zabbix with Rundeck. While Zabbix can execute remote commands to automatically resolve various issues, Wolfgang demonstrated the more advanced features of Rundeck, including “rolling-restart” support (running commands on nodes one-by-one), a detailed log of actions taken and the ability to define workflows (a set of commands) once in Rundeck, then execute them manually or automatically from Zabbix – all while having a way to give limited permissions to various workflows. Besides the on-demand execution, Rundeck also supports scheduled actions, becoming sort of a centralised cron daemon.

An extra touch was Rundeck acknowledging the Zabbix events when starting a workflow, and including a link to the job on Rundeck.

Zabbix and Rundeck integration, showing executed job details in Zabbix

After a good lunch Erik Skytthe from DBC A/S took the stage and shared the experience with monitoring Docker containers and Apache Mesos. While most IT people are somewhat familiar with Docker, Mesos is a bit less popular for now. Mesos promises to be an abstraction layer and API for resources like CPU, memory and storage. It’s like Plan9, but on a bit higher level.

Mesos architecture

Erik covered Mesos framework monitoring, where they use collectd to grab the data, pass it to Graphite and then pass it to Zabbix. The main reason for the use of collectd was the already available data collection methods from Mesos.

He also discussed various approaches to container monitoring, and concluded that for their use-case treating them as hosts in Zabbix is perfectly fine.

Alexander Naydenko followed with a talk, detailing a migration project from Nagios to Zabbix. He discussed starting with scripts, and evolving through Nagios and MOM, and ending up with Centreon. He listed various issues they had, including performance problems and the lack of decent monitoring agents, but I related with the last item on his list – having too many individual components to integrate manually. That is something most Zabbix users might not recall anymore – how unpleasant it is to have a bunch of separate components that barely work together and have compatibility issues at every upgrade. Alexander also disliked the lack of central problem definition, where each individual script decides what’s normal and what’s a problem. His list of pleasant surprises with Zabbix is something we Zabbix users have grown used to – we would surely lack them a lot with some other solution.

Zabbix benefits over Nagios

Oh, and I have to give extra credit for the Dr. Strangelove reference. The talk was titled “How we learned to stop worrying and love Zabbix”.

How we learned to stop worrying and love Zabbix

We then shared another quite long break… but let’s talk about why longer coffee or tea breaks are great some time later 🙂

There were two more short talks at the end of the first day. In the first one, Alain Ganuchaud from Cool Monitoring shared their experience integrating Zabbix with ticketing systems in a large environment. More specifically, he covered a SwissLife migration from IBM Tivoli to Zabbix. In this context, a large environment was one that had a thousand incidents reported daily. While a thousand of incidents can be filed manually, it probably should not be the goal. Instead they provided a two-way integration solution that both creates tickets in the used system, ServiceNow, and passes back ticket status changes to Zabbix. Additional touch was an ability to link Zabbix events to open incidents by adding an appropriate acknowledgement comment.

Zabbix acknowledgement form, showing ServiceNow integration

Among the benefits of Zabbix Alain mentioned lower support costs and a great community.

He also mentioned that there are 10 Zabbix administrators in this environment. Does that sound a lot? Wait, actually that’s wrong – he said there are 100 Zabbix administrators. Ouch.

The day was concluded by Volker Fröhlich, who presented his Zabbix addon Action simulator. When Zabbix actions don’t fire, it can be caused by quite a lot of reasons – permissions, action conditions, maintenance, user media settings… While an experienced Zabbix administrator will be able to go through them all, for less experienced users it can be very painful to find the reason why that notification was not sent. Volker’s addon allows to “simulate” an action – see what would happen when a trigger fires without actually sending out anything.

The new Zabbix action simulator

Action simulator was available for Zabbix 2.0 and 2.2, but not for 2.4 or 3.0 – until now. Thanks to a lot of work by Volker and very valuable help from Mikhail Okhotin, the simulator is available for Zabbix 3.0 now (but not 3.2). The simulator has also been improved over the 2.x versions. It now supports custom expressions for conditions, is easier to use for colourblind and has become easier to use in general. Sounds interesting? Go grab the action simulator and try it out 🙂

The talks for the first day were over, and we had learned quite a lot of useful things about Zabbix. But there was still another day to follow.

One thought on “Looking back at the Zabbix Conference 2016, day 1”

Leave a Reply