User Tools

Site Tools


nagios

Monitoring Bacula with Nagios

1. Introduction

This document, Monitoring Bacula with Nagios is copyrighted © 2009 by Kevin Keane. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is available at http://www.gnu.org/copyleft/fdl.html .

1.2 Disclaimer

No liability for the contents of this document can be accepted. Use the concepts, examples and information at your own risk. There may be errors and inaccuracies which could damage to your system. Though this is highly unlikely, proceed with caution. The author(s) do not accept responsibility for your actions.

All copyrights are held by their respective owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark. Naming of particular products or brands should not be seen as endorsements.

1.3 Credits / Contributors

This solution was loosely inspired by R.I.Pienaar's monitoring script at http://www.devco.net/archives/2006/07/19/monitoring_bacula_jobs_using_nagios.php

1.4 Overview

The goal is to see the status of each bacula job in nagios, including alerts etc. Each backup job should be listed as a service under the host named BACKUPS. The service name is the bacula job name.

Any questions? Please post to the bacula-users mailing list, or visit http://www.4nettech.com and use the Contact Us form.

This example guide is using nagios passive checks, if you wish to use nagios active checks, please see Monitoring Bacula with Nagios active checks

2. What you need

  • Nagios (obviously). Test to make sure that the nagios server is working properly.
  • Bacula (obviously). We will only touch the director configuration.
  • Nagios NSCA. Make sure that send_nsca is working properly from the machine that the bacula director is running on.

3. Modifications to Nagios

Create a configuration file backups.cfg in the appropriate location, and include it in your nagios.cfg file (or put it in a directory that is already included in nagios.cfg). All the configuration will go into this file.

Create a service template for the backup jobs in backups.cfg. Since the backup checks will be using passive service checks, we derive from the passive_service template. The backups will only run once a day, and in some cases may take a couple of hours to complete. I actually skip the day after a full backup, in case the full backup takes more than 24 hours. As a result, we have to wait 72 hours until we can be sure that a backup didn't run. 72 hours is 259200 seconds. We will use that as the freshness_threshold, and later also as the notification_interval.

If the backup didn't run, we should consider it a critical error. Thus, we use check_dummy and have it return 2 (for CRITICAL) as the check_command. With passive checks, the check_command is only called when the host has not reported for the freshness_threshold.

Assuming that you already have the standard nagios passive_service template working:

define service {
        name                            backup_service
        use                             passive_service
        freshness_threshold             259200
        check_command                   check_dummy!2
        register                        0
}

Next, we need to define the host. This is fairly standard. Since this is not an actual host, the address does not really matter, and we can always return OK from the host check command.

define host {
      host_name                       BACKUPS
      alias                           Our backup
      address                         <bacula-dir host name>
      use                             generic-host
      check_command                   check_dummy!0
      max_check_attempts              10
      notification_interval           259200
      notification_period             24x7
      notification_options            d,u,r
      contact_groups                  mainoffice
}

Finally, we need to add the bacula jobs. For each job, add a service definition (substituting the correct bacula job name, of course):

define service {
      service_description             <bacula job name>
      use                             backup_service
      host_name                       BACKUPS
}

Run the nagios pre-flight check. Depending on your Linux distribution, it is probably something similar to this:

rcnagios check

or

service nagios check

If it reports any errors, fix them.

3. Modifications to Bacula

Put the following script, called bacula2nagios, into the /usr/local/sbin directory on the machine where your bacula-dir is running. Substitute the correct name of your Nagios server, of course.

PITFALL WARNING: be sure to use the TAB character in the line following the send_nsca command. If you use spaces, it *will not* work.

#!/bin/bash
# Inform nagios about the success (or lack thereof) of the most recent
# attempt of each backup job
#
# args:
# $1: job name
# $2: status (0 for success, anything else for failure)
# $3: whatever you want to appear as the plugin output

if [ $2 -eq 0 ]
then
    status=0
else
    status=2
fi

send_nsca -H <FQDN of your nagios server> -c /etc/nagios/send_nsca.cfg <<END
BACKUPS $1      $status $3
END

Make this script executable by the bacula user.

Now edit the JobDefs resource in your bacula-dir.conf file. Add the following two lines:

Run After Job = "/usr/local/sbin/bacula2nagios \"%n\" 0 \"%e %l %v\""
Run After Failed Job = "/usr/local/sbin/bacula2nagios \"%n\" 1 \"%e %l %v\""
nagios.txt · Last modified: 2010/01/12 22:28 by mator