Tuesday, September 29, 2015

4 open-source monitoring tools that deserve a look


Network monitoring is a key component in making sure your network is running smoothly. However, it is important to distinguish between network monitoring and network management. For the most part, network monitoring tools report issues and findings, but as a rule provide no way to take action to solve reported issues.

We found all four products to be capable network monitoring tools that performed well in our basic monitoring tasks such as checking for host availability and measuring bandwidth usage. Beyond the basics, there were quite a few differences in terms of features, granularity and configuration options.

  • Overall we liked Zabbix, which was easy to install, has an intuitive user interface and enough granularity to perform most network monitoring tasks.
  • Cacti is great for what it does, has excellent graphing capabilities and is relatively easy to configure and use. But Cacti is somewhat limited in features. It does not provide a dashboard with infrastructure status and alerts, nor does it have the ability to provide alerts.
  • Observium is another capable product, but we did not like having to map everything to host names without the ability to use IP addresses directly. However, it has a modern interface and, like Cacti, offers graphing capabilities that provide good information at-a-glance.


All of the products offered basic network monitoring, using common protocols like ping, without requiring agents. Diving deeper required agents or SNMP, which must be installed and/or configured on the devices to be monitored. Zabbix offers both agent and agent-less configuration options. Since all of the host servers run on Linux, to keep the playing field level we used a fresh install of Ubuntu 14.04 LTS prior to installing each product. The hardware was a quad-core, 64-bit, 8GB RAM server with adequate storage. Here are the individual reviews:


Observium

Observium is a Linux-based, command-line driven product with a web-based monitoring interface. Released under the QPL Open Source license, Observium is currently in version 0.14. Observium is available in both a community edition, which we tested, and a professional edition. Observium uses the RRDTool for certain features, such as buffer storage and graphing capabilities. It provides auto-discovery of a wide variety of devices from servers and switches to printers and power devices.

Observium is installed and configured through a set of command line inputs. Prerequisites include MySQL, PHP and Apache. We found a useful step by step installation guide on the Observium website which saved us time in performing the install. After installation, the server is accessible from a browser.

After completing the basic installation we loaded the Web interface, which displayed a large, blank Google map and a summary of devices, ports and sensors, all showing zero values. We decided to add a new device from the Web interface by entering the host name and the SNMP community name. This provided no results.

After some online searching we realized we needed to add our devices to the ‘hosts’ file in order for Observium to correctly resolve the host names. We were not running DNS on our test network and you cannot add devices by using IP addresses. Since Observium is set up using a configuration file, the Web interface provides essentially a read-only overview of the infrastructure. We added our first device with a simple command from the command line and then logged back into the Web interface, where we could then see our newly added Windows host. The map was populated with a location quite distant from our actual location, but we attribute this to using internal subnets.

Observium uses several protocols such as CDP, FDP, EDP and LLDP to discover new devices. When encountering a new device, it will attempt to contact it on a SNMP community name supplied in the configuration file. Once one or multiple devices have been added, the information for each device needs to be added using the discovery and polling commands from the command line.

This task can be automated by creating a Linux ‘crontab’ file that is called at set intervals. Most configuration changes are accomplished through editing the configuration file. We found this a bit cumbersome at first, but once the initial configurations and inevitable tweaks have been completed there should be no need to revisit this file on a daily basis. The configuration file content is available to view read-only from the global configuration link, which is helpful in getting a bird's eye view of the setup.

With our new devices configured and added, we re-loaded the Observium Web interface again. The device list displayed our three hosts with some basic information about each (platform, OS type and uptime). Mousing over each device displays previews of various performance graphs such as processor and memory use. To drill down in more detail, you can click any device which displays a secondary screen with additional information about the device and the ability to view collected data in different ways such as general information, a graph view that includes a myriad of performance data, plus an event and system log view.

Observium has no direct export or reporting capabilities, which would be a nice addition for documenting performance or outputting usage data to hard copy. However, the on-screen reporting is very good and numerous filters are available to customize views. Although it doesn't aid much in actual configuration tasks, the Web interface has a modern, easy-to-read display and the navigation is intuitive with a horizontal, drop-down style menu across the top. We also like the start page overview with the ability to mouse over various items to see graphs for that item.

The Observium professional edition is available for an annual subscription fee and provides users with real-time updates and various support options. The professional version also has added features such as threshold alerting and traffic accounting, which can be helpful for organizations like ISPs that need to calculate and bill client bandwidth usage.

Cacti

Like the other products, Cacti is a Web-based application that runs on PHP/Apache with a MySQL backend database. Currently in version 0.8.8, it provides a custom front-end GUI to the RRDTool, an open source round robin database tool. It collects data via SNMP and there is also a selection of data collection scripts available for download from the Cacti website.

Although the Cacti server can be installed on Windows, it does require software mostly associated with Linux, such as PHP, Apache (although you can use IIS) and MySQL. This can be accomplished using WAMP server or by configuring each component individually using the Cacti installation guide.

Regardless of OS type, there are a number of configuration requirements and Cacti assumes the installer is fairly familiar with the aforementioned components. The user manual provides general guidelines for installation, but it did not provide specifics for our particular environment. As is often the case, we found an online third-party source that had a good step-by-step guide for our OS (Ubuntu).

Once the installation and initial configuration have been completed, you access the Cacti Web GUI from a Web browser. We found the Web interface clean and fairly easy to navigate once we became familiar with the overall layout. Cacti’s bread and butter is its graphing capabilities and it provides users with the tools to create custom graphs for various devices and their performance using SNMP. Devices can range from servers and routers to printers, essentially any networked devices with an IP address.

To set up a new device and indicate which values to monitor, you follow a short wizard-like step by step process where you first specify the basics, such as the IP address and type of device. To determine whether the device is available, Cacti can use a simple Ping command or a combination of Ping and SNMP.

Once a device has been created, it is time to create the graphs you want to monitor for this particular device. The graph setup uses a simple one-page with a set of options based on the type of device being configured. You can select items such as interface traffic and memory usage to CPU utilization or number of users logged in. We created a number of graphs for a couple of devices and once a graph has been saved, it can take a while before data starts showing, but found that generally within a few minutes it started displaying data. What we found helpful when creating graphs is that Cacti will inform you right away if a data query responds with any data before proceeding. That way you don’t end up with a bunch of empty graphs.

Cacti uses three types of XML templates for configuration purposes, data, graph and host templates. These allow administrators to create custom configurations that can be reused across multiple devices. The templates can be applied as you create a new device, graph or host. Settings may include values such as display settings for a graph or information on how data is to be collected for a certain host type.

Although Cacti does not require an agent to be installed on a device, SNMP needs to be installed and configured in order to take advantage of all features available in Cacti. As often is the case with open source software, Cacti does provide more options for Linux/UNIX without the need to install additional templates. In order to better monitor Windows servers we needed to install additional templates. Some of the online third-party tutorials are very good, but it should be noted that these are not one-click operations and require a steady hand to get everything configured properly. (Also read: Cacti Makes Device Monitoring Simple. )

From the graph console you can call up any graph by filtering by device, custom date and time range or you can even do a search. We found this interface to be very flexible as you can essentially display anything from one custom graph to literally thousands, however displaying too many graphs per page will slow down the load time. The time/date range is very flexible with a drop-down that allows for granular selections from ‘last 30 minutes’ to ‘this year’. You can zoom in on any graph as well as export the graph values to a CSV file.

One feature commonly used by ISPs is the bandwidth measurement, especially the usage at the 95th percentile, which is often how bandwidth is measured and billed.

Cacti provides custom user management that allows administrators to determine what information users can view and also what actions they can take from the console. These items include ability to import/export data, change templates and various graph settings. We found the granularity to be flexible enough without providing so many settings it becomes cluttered.

Compared to the other products we tested, Cacti is somewhat limited in features. It does not provide a dashboard with infrastructure status and alerts, nor does it have the ability to provide alerts. However, that should not preclude you from considering Cacti as what it does, it does well. The interface is efficient and quick to navigate, no need to sit around for minutes while pages load. Also, with no agents to be deployed to hosts, it is an unobtrusive monitoring product that gives administrators a good overview of network topology with little overhead.

Zabbix

Zabbix is an open-source network management solution released under the GPL2 license and is currently in version 2.4. It provides a Web interface for monitoring and stores collected data to one of several common databases such as MySQL, Oracle, SQLite or PostgreSQL. The Zabbix server itself runs only on Linux/UNIX and not on Windows; however, Zabbix agents are available for most Windows and Linux/UNIX server and desktop operating systems.

We installed Zabbix using one of the many available installation packages. The product can also be installed by compiling the source code or downloading a virtual appliance in formats such as VirtualBox, Hyper-V, ISO and VMWare. In addition to the regular install, we also took a quick look at the available VM, a good option for those looking to evaluate Zabbix. The install was simple and straightforward using instructions available from the Zabbix website. We especially liked the condensed installation package, requiring just a few command line inputs and including the Apache/PHP/MySQL setup into the main install, with no need for separate configuration unless there are special circumstances to consider.

When loading the Web interface for the first time, there is a short wizard that confirms that the pre-requisites and database connection are properly configured before loading the main dashboard. The first screen is the personal dashboard, which provides a general overview of the IT infrastructure with a list of hosts, system and host status. On a new installation this screen is largely blank with the exception of information related to the Zabbix server itself. This dashboard is customizable; you can add preferred graphs, maps and screens as you configure them.

Zabbix collects data in three different ways; by installing agents on a Linux/Windows host or by using a variety of protocols such as SNMP, ICMP, TCP and SSH. Basic network health information can also be collected over HTTP and SMTP. Zabbix can use auto-discovery network devices and also had the capability to perform low-level discovery. We started by configuring a discovery rule to map out our test network. The granularity of this configuration is very good and you can specify IP ranges, protocols and other criteria to determine how a network is mapped out. After a few minutes we started to see a list of devices, ranging from routers and printers to servers and desktops. The discovery provides a general network overview, but does not provide any in-depth information until you add the individual host/device to Web interface. We added several hosts using both the Zabbix agent for Linux and Windows, together with a few utilizing SNMP.

We installed the agents using a single command on our Linux hosts. There are some configuration options that can be set in the ‘zabbix_agentd.conf ‘configuration file, such as the server IP and server name along with other custom options. The agents can perform either passive checks, where certain data (memory use, disk space, bandwidth) is essentially polled from the Zabbix server, or active checks, where the agent retrieves a ‘to-do’ list from the server and sends update data to the server periodically. Installing as a Windows service is also fairly straightforward using an executable and making a few tweaks to a configuration file to let Windows know where the Zabbix server resides.

The Web interface is a bit complex and looks intimidating at first, but once you become a bit more familiar with the various screens and terminology we found it easy to navigate. We also wish the fonts and graphics were a bit more prominent as some of the information can be difficult to read. One of features we liked is the dynamic link history that shows which section you recently used, allowing you to quickly navigate back. The online user manual is comprehensive and up to date, and the Zabbix website has lots of comprehensive information on features, installation and configuration options.

Administrators can either use built-in templates or create their own triggers to build rules that send messages and/or perform commands when certain conditions are met. For instance, we created a rule that sent us a message when there was a general problem with one of the hosts and also restarted the agent on that host. Rules provide a lot of granularity and this was one of the few areas where we wish the online manual had a bit more detail on configuration options.

Most of the reporting is to the screen with the ability to print. The print option essentially displays what is on the screen minus the navigation header and other extraneous information. This is a not necessarily a bad configuration, but it does not make for the most elegant printouts. We would have liked to have seen some ‘save-as-PDF’ and export capabilities. That being said, the online reporting and graphs are excellent, with multiple customization options. As mentioned earlier, the custom graphs and screens can be added to the main dashboard and called up with a simple click.

Zabbix is all open source. There is no separate paid Enterprise version. This means all of the source code is open source and available, which should be attractive to both small and large enterprises. Although Zabbix does not offer a separate commercial version, commercial support contracts are available in five different levels ranging from ‘Bronze’ to ‘Enterprise’. Zabbix also offers other paid services such as integration, turnkey solutions and general consulting.

Icinga

Initially created as a fork from Nagios in 2009, Icinga’s latest version has been developed per the vendor "free from the constraints of a fork." Version 2 sports a new rules-driven, object-based configuration format. Icinga is still open source under GPL and the current releases include Core and Web 1.11.x versions along with a 2.x version. Icinga can monitor devices in both Linux and Windows, but the server itself runs only on Linux. Since the 2.x Web GUI was still in beta, we installed the core server version 2.x and used the latest 1.1x version as the Web GUI.

Icinga has a modular design where you select the core server, your preferred GUI and add any desired plug-ins such as reporting and graphing tools. We installed the basic server using only two commands. Overall, we found the Icinga online documentation to be good; however, a quick start guide would have been helpful as there is no guidance from the get-go on which of the many configuration files needs to be tweaked, even for a basic installation.

We determined that we needed either a MySQL or PostgreSQL database in order to run the Web interface; in addition Apache or NginX plus PHP are also required. There are a few steps involved in this install and configuration process, depending on how many incremental upgrade files are available for the Icinga 2 DB IDO module and also how the Web server is configured. We went through the list of commands and after a fair bit of trial and error we were able to access the Web GUI from a browser.

After logging in, a dashboard type overview is displayed with navigation organized into groups along the left, Icinga calls them ‘cronks’ with the main part of the screen used to display information. Along the top there is a section that provides an overview of the infrastructure using color coded counts with healthy hosts shown in green, warnings in yellow, critical problems in orange and hosts that are unavailable in red.

Icinga maximizes the use of most screens and even if the first impression is a bit cluttered, overall we found the navigation and organization of data to be intuitive. Many of the screens are list based, displaying information about hosts and host issues sorted by various criteria such as availability, length of down time or severity of the issue. Icinga provides the ability to sort each list ascending or descending on each column, something we found very helpful. Furthermore you can select which columns to display for each list on the fly, this provides a nice level of customization.

Icinga takes advantage of several common protocols for monitoring the network infrastructure, from simple PING commands to SNMP, POP3, HTTP and NNTP. Information can also be gathered from other devices such as temperature sensors and other network-accessible devices. Configuring which hosts to monitor and what to monitor is accomplished using the configuration files and the granularity to which you can customize this is overwhelming, the ‘Basic Monitoring’ section of the user manual runs 50 pages. Luckily, you can use templates and re-useable definitions to streamline this process.

For our environment we defined a few hosts by linking a name to the IP addresses and then added what is known as ‘check commands’. These are essentially protocol definitions such as PING and HTTP that instruct Icinga what to monitor for each host. You can then expand these configurations to include how often to query a host, when to escalate warnings and where to send email notifications of pending issues.

Configuration files are the core of Icinga; we counted 14 main files and some of these include additional files for more specific configurations. Some configuration files can be left with default values, but others must be configured specifically for the environment such as hosts, email addresses for notifications and enabling/disabling services used for host monitoring. The configuration files can be modified/created using an editor like VI or nano, but there are also configuration add-ons available plus third-party tools such as Chef and Puppet. In future releases Icinga will be adding the ability to configure via GUI, API as well as CLI, something that would be helpful for items that may require ongoing changes, such as changing host configurations.

Icinga provides native support for Graphite for graphing purposes and an Icinga Reporting module is available; it is based on the open source Jasper Reports.

There is no paid version of Icinga available, but there are several organizations worldwide that offer paid support at different levels. Icinga also hosts several camps every year where users, developers and sponsors get together to discuss best practices and further development goals of the product.

Conclusion

When selecting monitoring tools it is important to have a clear goal from the outset. Are you looking to just send a ping to each device every 15 minutes to make sure the device responds? Or, do you need more comprehensive information such as CPU, RAM, disk and bandwidth usage? Installing agents and configuring SNMP to access more advanced features should be a consideration as this can be a time-consuming task that may not be practical in larger organizations. A workable hybrid approach could be to install agents on critical devices that need deep-dive monitoring, while monitoring other devices in agent-less mode.


No comments:

Post a Comment