« May 2013 | Main | March 2013 »

Monday, April 29, 2013

Migrating Nagios Configuration from Nagmin to Check_MK's WATO


When I first set up a Nagios server, many years ago, in the days of Nagios 1.x, the best configuration tool I could find was Fred Reimers' Nagmin. That has since turned into abandonware, but there is a fork, NagminV, under development.

I'd patched Nagmin to support Nagios 2.x and 3.x, and added a few fields to its database, but it was still buggy and quirky.

So, after I'd installed Mathias Kettner's Check_MK for its livestatus broker module for use with PNP4Nagios, I started investigating Check_MK's broader features.

What soon caught my eye was Check_MK's use of rulesets based on host tags. The temptation of editing text files (python scripts in disguise) was too great for me, so I started converting my Nagios service checks into Check_MK format.

These are very rough notes, proceed with caution. Back up everything first!!!

First thing to do was get Check_MK's agent on all our hosts.

Then, to list all of them in /etc/check_mk/main.mk:

# don't generate host config yet
# comment this out when Nagmin is decommissioned
generate_hostconf = False
all_hosts = [
host1,
host2,
]

Then I added the obvious tags, win, linux, etc, and created config files for legacy checks in /etc/check_mk/conf.d, until eventually all service checks were defined in Check_MK and not in Nagmin.

After each check was migrated to Check_MK, I'd run

check_mk -U
nagios -v /etc/nagios/nagios.cfg

The first command generates a new Nagios config in /etc/nagios/check_mk.d/check_mk_objects.cfg

The second validates the resulting config. Here I'd find duplicate definitions, reminding me to delete them from Nagmin.

So, after a while plodding away at this - in my case this meant over a year of coexistence - all that was left in Nagmin was hosts, host groups, contacts, contact groups, and timeperiods.

Time to abandon Nagmin and get serious.

Do not, under any circumstances, try to generate a Nagios config using Nagmin again. It will always fail!

Comment out the Command.cfg include in nagios.cfg

#cfg_file=/etc/nagios/Command.cfg

Try validating the nagios config again, with nagios -v. If it fails, you've forgotten to define some commands as legacy checks

Rinse, repeat until you've got it right.

Check the contents of /etc/nagios/Services.cfg - if it contains any service definitions, you've forgotten something.

Do the same with all the service-related .cfg files; if things fail, change your config to use Check_MK's default service templates.

Get your custom Time Periods into WATO - check_mk has a default, hidden timeperiod 24X7 (Nagmin's is 24x7, case matters), and comment out TimePeriods.cfg in nagios.cfg. Revalidate.

And so on till you're left only with hosts, host groups, contacts, and contact groups from the original Nagmin.

So now was the time to take the leap and use Check_MK's WATO to configure nagios.

I created contacts and contact groups manually in WATO. There's no conflict with your existing Nagios config until you associate a contact group with hosts and/or services in WATO. That was the last thing I did.

Now comes the fun bit.

We need to import our hosts into WATO.

Run the attached hosts.py which will generate a list of hosts in the format hostname;alias;parents;ipaddress

python hosts.py >wato.csv

Our import script requires a file containing

wato folder;hostname;alias;parents;ipaddress;tags

where tags is a list of check_mk tags separated by |

So, we'll have to manually edit it. I didn't use WATO folders so just preceded each line with a ;

Tags are the fun bits. By now you may have been using them in your check_mk config. WATO comes with some predefined ones, which we need to list in our import file (wato.csv).

You'll need one each of the 'agent', 'criticality', and 'networking' tags. As well as your custom tags, which should also be entered into WATO.

Download and edit my_wato_import.py adding your tag definitions into tagz, run:

python my_wato_import.py wato.csv

and watch the output scroll by. If there are any errors reported, then you've forgotten to add a tag definition to tagz, or misspelt a tag in wato.csv. Rinse and repeat until all's well.

Set testing = False near the top of the script, and run again.

Comment out generate_hostconf = False in your main.mk, and check_mk -U

Comment out Host.cfg in /etc/nagios/nagios.cfg

Validate your nagios configuration. It should be OK.

A look in WATO will show all your hosts, with appropriate tags.

Hidden away in /usr/share/doc/check_mk/treasures is a script wato_host_svc_groups.py

I ran this against my original Nagmin Hosts.cfg file, which produced output which was easily massaged into the form needed for WATO. I could have amended the script to produce output of the following form, but a few regular-expression search and replaces got me there quickly enough.

host_groups = [
( 'group1', []. ['server1', 'server2']),
( 'group2', []. ['server3', 'server3']),
]

Place that code into /etc/check_mk/conf.d/wato/rules.mk and create the host groups (with descriptions) in WATO. Before applying changes in WATO, run check_mk -U

Comment out the Hostgroup.cfg include in nagios.cfg, and validate config once more.

#cfg_file=/etc/nagios/HostGroup.cfg

Now, if you've done everything properly, the Nagios config validation will succeed, and on a restart of Nagios your host groups will be there as before.

That just leaves Contacts, Contact Groups and notifications.

I'll leave that as an exercise for the reader. Hint: don't try any of the above until you've figured out how to apply contact groups to hosts and services.

And you'll also need to adjust host and service check intervals and retries in WATO too, otherwise everything gets polled every minute, which probably isn't what you want.


Posted by Phil at 9:59 PM
Edited on: Wednesday, April 01, 2015 9:24 PM
Categories: IT, Software