SURVIVOR: check.cf
About check.cf
syntax
  • Whitespace is generally ignored, unless otherwise noted.
  • boldface denotes keywords.
  • [brackets] denote optional keywords.
  • Separators (|) denote mutually exclusive keywords.
  • Comments begin with a # symbol, and may appear anywhere except on the same line as a named argument. Comments must be terminated with newlines.
     <argname> : [a-zA-Z0-9\._-]+
     <argvalue> : [^ \n\t][^\n]*
     <interval> : "second[s]"|"minute[s]"|"hour[s]"|"day[s]"|"week[s]"
     <name> : [a-zA-Z0-9\.+%@_-]+
     <number> : [0-9]+
  
For argvalue, trailing whitespace may be silently removed.

check.cf defines Checks, which are used to monitor services that run on hosts, Composite Checks, which are Checks defined in terms of other Checks, Fixes, which are used to attempt to correct problems on hosts reported by Checks, and Transports, which are used to execute Checks and Fixes directly on remotely monitored hosts.

Checks
syntax
     check <name> {
         module <name> {
	     [<argname> <argvalue>]
	     [...]
	 }
	 [via scheduler | via <name>]
	 [check on <name> schedule]
	 [alert on <name> alertplan]
	 [fix with <name>]
	 [check all hosts]
	 [helpfile <name>]
	 [timeout <number> <interval>]
	 [result text significant]
     }

     default timeout <number> <interval>

     default check via (<name> | scheduler)
 

About Checks

A check stanza defines the module to be executed for that Check, the arguments (if any) to be passed to that module, and some optional information related to it. A Check is executed for the hosts defined in a group with the same name.

Dependencies

  • The Schedule specified by the check keyword must be defined in schedule.cf.
  • The Alert Plan specified by the alert keyword must be defined in schedule.cf.
  • The Transport specified by the via keyword must be defined first in a transport stanza.
  • The Fix specified by the fix with keyword must be defined first in a fix stanza.

Check Stanza Keywords

check <name> Define a new Check. When a host Group is defined with the same name as a Check, then the Check will be run for those hosts. (But see check all hosts, below.)
module <name> Specify the name of the module to run. Check modules are expected to be found in $MODDIR/check/modulename (where $MODDIR is by default $INSTDIR/mod).

Arguments passed to the module are specified as name/value pairs, with one pair per line. The specific arguments for each module are described in the module's documentation. # symbols are treated as part of the argument, not as a comment.

via scheduler

via <name>

Indicate where the Check runs. Checks run via scheduler are executed on the scheduler host. This is useful for Checks that follow a client-server model, such as testing an HTTP server or querying an SNMP daemon. This is the default.

Checks may also be run remotely, directly on the host being monitored. This is useful for Checks that need to examine specific files on a host, among other reasons. To run a Check remotely, use the via keyword to specify the name of a Transport to use. The Transport must be defined before the Check is defined. Transports are described below.

check on <name> schedule Specify a Schedule to use for this Check. By default, Checks are executed according to the Check Schedules supplied for each HostClass. However, a Check can override the default Schedule for all hosts to be monitored for that Check.

The Schedule named must be defined in schedule.cf.

alert on <name> alertplan Specify an Alert Plan to use for this Check. By default, alert notifications are transmitted according to the Alert Plan supplied for each HostClass. However, a Check can override the default Alert Plan for all hosts to be monitored for that Check.

The Alert Plan named must be defined in schedule.cf.

fix with <name> Specify a Fix to use with this Check. If enabled in the appropriate Alert Plan, a predefined Fix may be attempted to correct an error state reported by a Check. The Fix must be defined before the Check is defined. Fixes are described below.
check all hosts Run the Check on all defined hosts. The Check will run on every host defined in every HostClass, regardless of group memberships.

(If multiple Instances are in use, the Check will only be run on the hosts in the same Instance as the Check.)

helpfile <name> Associate a helpfile with this Check. When provided, the contents of this file may be transmitted along with the alert. This is useful to provide hints or reminders along with an error message. Not all format modules support helpfiles. If not begun with /, filenames are relative to helpdir, as defined in instance.cf.
timeout <number> <interval> Override the default Check timeout. The timeout indicates how long the scheduler should wait for the Check to complete. A timeout of 0 seconds is exactly that: an instantaneous timeout.
result text significant Ordinarily, the textdescription portion of a check result (as defined in the check module specification) is treated as a comment, and simply recorded. If this flag is set, the scheduler will look for changes to the text description in addition to changes in the check return code when determining whether to clear alert state, including acknowledgements.

default timeout The default module timeout is 45 seconds. The default may be changed at any point within the configuration file. The new default will apply to any Checks (or Fixes) defined after it, unless overridden within a stanza or until the default is changed again. For example:
  # Change module timeout
  default timeout 3 minutes
  
default check transport module The default Transport is via scheduler, which is not actually a Transport but rather an indication to execute the module directly on the scheduler host. The default may be changed at any point within the configuration file. The new default will apply to any Checks defined after it (unless overridden within the check stanza) until the end of the file or until the default is changed again. For example:
  # Run all subsequent check modules via the plaintext transport stanza
  default check via remote

  # Except in this module
  check imap {
    module protocol {
      service imap
    }
    via scheduler
  }
  
Changing the default value should be done with care, as some modules will not run correctly remotely, and others will not run correctly on the scheduler. Achieving this using default check via rather than via statements within a check stanza may reduce the understandability of the configuration file.

Examples

  1. A simple Check, using the protocol module. The module is passed the service argument http to instruct it to verify that protocol. This Check will be run for any host found in the corresponding host Group http, as defined in host.cf.
         check http {
           module protocol {
             service http
           }
         }
         
  2. A Check using the ping module to monitor every host defined in every HostClass.
         check ping {
           module ping {}
           check all hosts
         }
         
  3. An IMAP Check (using the protocol module) that sends out the contents of $helpdir/imap-problems if the relevant format module supports it.
         check imap {
           module protocol {
             service imap
           }
           helpfile imap-problems
         }
         
  4. A ping test with a very quick timeout.
         check fastping {
           module ping {}
           timeout 5 seconds
         }
         
  5. In order to demonstrate how the override of Schedules and AlertPlans work, here is an excerpt from a sample schedule.cf.
         # From host.cf
         class class1 {
           hosts {
             server1
           }
    
           check on hourly schedule
           alert on standard alertplan
         }
    
         class class2 {
           hosts {
             server2
           }
    
           check on halfhourly schedule
           alert on standard alertplan
         }
    
         group imap {
           server1
           server2
         }
         
    Back in check.cf, server1 will be checked for imap hourly and server2 will be checked every 30 minutes.
         # From check.cf
         check imap {
           module protocol {
             service imap
           }
         }
         
    However, if this next stanza were defined instead, both server1 and server2 would check on the halfhourly schedule and alert on the critical alertplan.
         check imap {
           module protocol {
             service imap
           }
           
           check on halfhourly schedule
           alert on critical alertplan
         }
         

Composite Checks
syntax
     check <name> {
         [required checks {
	     <name>
	     [...]
	 }]
         [optional checks {
	     <name>
	     [...]
	 }]
	 [check on <name> schedule]
	 [alert on <name> alertplan]
	 [fix with <name>]
	 [check all hosts]
	 [helpfile <name>]
	 [timeout <number> <interval>]
     }

     default timeout <number> <interval>
 

About Composite Checks

A composite check is a Check whose results are determined by the execution of its member or component Checks. When a composite Check is scheduled, all of its component Checks are simultaneously run for all hosts to be examined. The results are then accumulated and sorted to determine the status of the the Composite Check.

Composite Checks cannot be composed of other Composite Checks. This restriction may be lifted in a future release.

Composite Checks are more resource intensive than standard Checks, and so should not be used when a standard Check can accomplish the same task.

Composite Check Result Accumulation

There are two types of components: required and optional. Required components behave similarly to boolean and, optional components behave similarly to boolean or. At least one required or one optional component must be specified. The same component cannot be both required and optional.

The return value of the Composite Check is determined as follows:

  • If any component, whether required or optional, does not complete, then the composite exits with MODEXEC_TIMEDOUT.
  • If required components were specified and any failed, then the Composite Check exits with the highest exit code in the order
    • MODEXEC_MISCONFIG
    • MODEXEC_PROBLEM
    • MODEXEC_WARNING
    • MODEXEC_OK
    • For any other return value, larger takes priority over smaller
  • If required components were specified and none failed, then the Composite Check exits with MODEXEC_OK regardless of the status of any optional components.
  • If no required components were specified and any optional components succeeded, then the Composite Check exits with MODEXEC_OK.
  • If no required components were specified and no optional component succeeded, then the Composite Check exits with the lowest exit code in the order
    • MODEXEC_OK
    • MODEXEC_WARNING
    • MODEXEC_PROBLEM
    • MODEXEC_MISCONFIG
    • For any other return value, smaller takes priority over larger

Comments generated by each component are concatenated together in the order they are received, delimited by a semi-colon (;).

Scalar values, as defined in the check module specification and recorded in the history files as specified in instance.cf, consist of the number of completed component Checks. The scalar values returned by each component Check are ignored.

Composite Checks vs Check Dependencies

All components of a Composite Check are executed simultaneously. This is useful when a test cannot properly be executed by one module alone, or when multiple Checks can trigger execution of the same fix (see below).

Check Dependencies prevent other Checks from executing. This is useful when a host becomes unavailable. Only one alert notification (say, for ping) will be transmitted, instead of one alert notification for every service configured for that host.

Dependencies

  • The component Checks specified by the required and optional keywords must be defined first in check stanzas.
  • All dependencies that apply to Checks also apply here.

Composite Check Stanza Keywords

Except where noted below, these keywords have the same behavior as for simple Checks.

check <name> Define a new Composite Check.
required Specify the names of the required components. Each name corresponds to a previously defined check stanza.
optional Specify the names of the optional components. Each name corresponds to a previously defined check stanza.
check on <name> schedule Specify a Schedule to use for this Check.

Whether or not this keyword is provided, any Schedules that would otherwise apply to the component Checks are ignored.

alert on <name> alertplan Specify an Alert Plan to use for this Check.

Whether or not this keyword is provided, any Alert Plans that would otherwise apply to the component Checks are ignored.

fix with <name> Specify a Fix to use with this Check.

Whether or not this keyword is provided, any Fixes that would otherwise apply to the component Checks are ignored.

check all hosts Run the Check on all defined hosts.
helpfile <name> Associate a helpfile with this Check.
timeout <number> <interval> Override the default Check timeout.

Whether or not this keyword is provided, any timeouts that would otherwise apply to the component Checks are ignored.

Examples

  1. A Composite Check to monitor an application that has both a web interface and a directory interface. Both Checks must complete successfully for the composite Check to be successful. Only the gizmo group would be defined in host.cf.
         check gizmoldap {
           module ldap {
             port     6389
    	 filter   uid=testuser
    	 response objectclass=candidate
           }
         }
    
         check gizmohttp {
           module httpurl {
             path     /cgi-bin/welcome
    	 query    newuser=true
           }
         }
    
         check gizmo {
           required checks {
             gizmoldap
    	 gizmohttp
           }
    
           check on critical schedule
           alert on critical alertplan
         }
         

Fixes
syntax
     fix <name> {
         module <name> {
	     [<argname> <argvalue>]
	     [...]
	 }
	 [via scheduler | via <name>]
	 [require host locking | require service locking]
	 [expire lock after <number> <interval>]
	 [timeout <number> <interval>]
     }

     default expire fix lock after <number> <interval>
     
     default fix via (<name> | scheduler)

     default timeout <number> <interval>
 

About Fixes

A fix stanza defines the module to be executed for a Fix. Fixes are run in an attempt to automatically restore a service. Generally, fix modules must execute via transport modules. If multiple Checks can trip a Fix, it may make sense to composite them.

A Fix will be attempted automatically when the following criteria are met:

  1. A Fix has been defined in check.cf.
  2. An Alert Plan has been defined in schedule.cf that includes an attempt fix try.
  3. A Check has been defined which uses the Fix and Alert Plan described above.
  4. The Check failure status meets the requirements of the Alert Plan.
  5. There is no outstanding acknowledgement or inhibition for the service@host that has failed.

Fixes may also be manually attempted, regardless of outstanding failures, acknowledgements, or inhibitions.

When a Fix executes, a lock is established for the service@host to prevent the Fix from being run more than once concurrently for the same service@host. It is possible to establish wider scoped Fix locking, to service@ or @host, to prevent multiple Fixes for a given service from running simultaneously on multiple hosts or to prevent multiple Fixes for a given host from running simultaneously, regardless of service. Fix modules are subject to timeouts in the same fashion as check modules. In the event a module times out, its lock may not be correctly removed. As such, Fix locks may be expired when they are found to be stale.

Dependencies

  • The Transport specified by the via keyword must be defined first in a transport stanza.

Fix Stanza Keywords

fix <name> Define a new Fix. This name can then be used in the fix with keyword of a Fix definition.
module <name> Specify the name of the module to run. Fix modules are expected to be found in $MODDIR/fix/modulename (where $MODDIR is by default $INSTDIR/mod).

Arguments passed to the module are specified as name/value pairs, with one pair per line. The specific arguments for each module are described in the module's documentation. # symbols are treated as part of the argument, not as a comment.

via scheduler

via <name>

Indicate where the Fix runs. Fixes run via scheduler are executed on the scheduler host, usually not a very useful scenario. More useful is to run a Fix remotely, using the via keyword to specify the name of a Transport to use. The Transport must be defined before the Fix is defined. Transport are described below.
require host locking Require host level locking to execute the Fix. Only one Fix may be run on a given host at one time, regardless of what the Fixes are for.
require service locking Require service level locking to execute the Fix. Only one Fix may be run on a given service at one time, regardless of how many hosts require fixing.
expire lock after <number> <interval> Override the default Fix lock expiry. When a Fix lock is found that is at least this old, the lock is considered stale and will be removed.
timeout <number> <interval> Override the default Fix timeout. The timeout indicates how long the scheduler should wait for the Fix to complete. A timeout of 0 seconds is exactly that: an instantaneous timeout.

default fix transport module As for check module Transports, the default fix module Transport is via scheduler, which is of limited utility. The default may be changed at any point within the configuration file. The new default will apply to any fixes defined after it (unless overridden within the fix stanza) until the end of the file or until the default is changed again. For example:
  # Run all subsequent fix modules via the plaintext transport stanza
  default fix via remote
  
default fix lock expiry The default time for the expiration of stale fix locks is 120 seconds. The default may be changed at any point within the configuration file. The new default will apply to any Fixes defined after it (unless overridden within the fix stanza) until the end of the file or until the default is changed again. For example:
  # Expire fix locks after 5 minutes
  default expire fix lock after 5 minutes
  
default fix timeout The default Fix timeout is the same as the default Check timeout.

Examples

  1. In this example, a Fix is defined for syslog using the init.d module to restart the syslog daemon and the process module to detect the failure. Both use a remote Transport, described below.
         fix syslog {
           module init.d {
             service syslog
           }
           via remote
         }
    
         check syslogd {
           module process {
             name  .*syslogd
           }
           via remote
           fix with syslog
         }
         
  2. Here, a Fix is defined that can only be run by resetting a central control, regardless of which host is broken. As such, service level locking is required. Because the Fix takes a while to run, the timeout is also extended.
         fix gizmo {
           module gizmofix {
             # A locally written module to fix gizmo
    	 restartkey foobar
           }
           # Use a locally written transport that knows where the fix runs
           via gizmotransport
    
           require service locking
           timeout 5 minutes
           expire lock after 5 minutes
         }
         

Transports
syntax
     transport <name> {
         module <name> {
	     [<argname> <argvalue>]
	     [...]
	 }
     }
 

About Transports

A transport stanza defines a mechanism by which a Check or Fix can be executed on a remote host. More details about how to set up the infrastucture for remote monitoring can be found in the documentation for sr.

By default, check and fix modules are run directly on the scheduler host. However, some check modules and most fix modules run directly on the host to be monitored. (See the check module documentation for more information.) Checks and Fixes that are to be run remotely must be configured to run via a transport module. The appropriate transport stanza must be defined before the Check or Fix that uses it.

Dependencies

  • None.

Transport Stanza Keywords

transport <name> Define a new Transport. This name can then be used in the via keyword of a Check definition.
module <name> Specify the name of the module to run. Transport modules are expected to be found in $MODDIR/transport/modulename (where $MODDIR is by default $INSTDIR/mod).

Arguments passed to the module are specified as name/value pairs, with one pair per line. The specific arguments for each module are described in the module's documentation. # symbols are treated as part of the argument, not as a comment.

Examples

  1. In this example, the mailq module is configured to run via the plaintext transport module.
         transport remote {
           module plaintext {}
         }
    
         check mailq {
           module mailq {
             warn 1000
    	 prob 2000
           }
           via remote
         }
         

Next: dependency.cf


$Date: 2006/11/19 16:34:05 $
$Revision: 0.10 $
keywords
alert on alertplan
- (check)
- (composite)
check
- (check)
- (composite)
check all hosts
- (check)
- (composite)
check on schedule
- (check)
- (composite)
default check via
default expire fix lock
default fix via
default timeout
- (check)
- (fix)
expire lock after
fix
fix with
- (check)
- (composite)
helpfile
- (check)
- (composite)
module
- (check)
- (fix)
- (transport)
optional
required
require host locking
require service locking
result text significant
timeout
- (check)
- (composite)
- (fix)
transport
via
- (check)
- (fix)
via scheduler
- (check)
- (fix)