About check.cf
|
syntax
|
- Whitespace is generally ignored, unless otherwise noted.
- boldface denotes keywords.
- [brackets] denote optional keywords.
- Separators (|) denote mutually exclusive keywords.
- Comments begin with a # symbol, and may appear anywhere
except on the same line as a named
argument. Comments must be terminated with newlines.
<argname> : [a-zA-Z0-9\._-]+
<argvalue> : [^ \n\t][^\n]*
<interval> : "second[s]"|"minute[s]"|"hour[s]"|"day[s]"|"week[s]"
<name> : [a-zA-Z0-9\.+%@_-]+
<number> : [0-9]+
For argvalue, trailing whitespace may be silently removed.
|
check.cf defines Checks,
which are used to monitor services that run on hosts, Composite Checks, which are Checks defined in
terms of other Checks, Fixes, which
are used to attempt to correct problems on hosts reported by Checks,
and Transports, which are used
to execute Checks and Fixes directly on remotely monitored hosts.
Checks
|
syntax
|
check <name> {
module <name> {
[<argname> <argvalue>]
[...]
}
[via scheduler | via <name>]
[check on <name> schedule]
[alert on <name> alertplan]
[fix with <name>]
[check all hosts]
[helpfile <name>]
[timeout <number> <interval>]
[result text significant]
}
default timeout <number> <interval>
default check via (<name> | scheduler)
|
About Checks
A check stanza defines the module to be executed for that
Check, the arguments (if any) to be passed to that module, and some
optional information related to it. A Check is executed for the
hosts defined in a group with the same name.
Dependencies
- The Schedule specified by the check keyword must be defined
in schedule.cf.
- The Alert Plan specified by the alert keyword must be defined
in schedule.cf.
- The Transport specified by the via keyword must be defined
first in a transport stanza.
- The Fix specified by the fix with keyword must be defined
first in a fix stanza.
Check Stanza Keywords
check <name>
|
Define a new Check. When a host Group is
defined with the same name as a Check, then the Check will be
run for those hosts. (But see check
all hosts, below.)
|
module <name>
|
Specify the name of the module to run. Check modules are expected
to be found in $MODDIR/check/modulename (where
$MODDIR is by default $INSTDIR/mod).
Arguments passed to the module are specified as name/value pairs,
with one pair per line. The specific arguments for each module are
described in the module's documentation. # symbols are treated
as part of the argument, not as a comment.
|
via scheduler
via <name>
|
Indicate where the Check runs. Checks run via scheduler are
executed on the scheduler host. This is useful for Checks that
follow a client-server model, such as testing an HTTP server or
querying an SNMP daemon. This is the default.
Checks may also be run remotely, directly on the host being
monitored. This is useful for Checks that need to examine specific
files on a host, among other reasons. To run a Check remotely, use
the via keyword to specify the name of a Transport to use.
The Transport must be defined before the Check is defined.
Transports are described below.
|
check on <name> schedule
|
Specify a Schedule to use for this Check. By default, Checks are
executed according to the Check Schedules supplied for each
HostClass. However, a Check can override
the default Schedule for all hosts to be monitored for that Check.
The Schedule named must be defined in schedule.cf.
|
alert on <name> alertplan
|
Specify an Alert Plan to use for this Check. By default, alert
notifications are transmitted according to the Alert Plan supplied
for each HostClass. However, a Check can override
the default Alert Plan for all hosts to be monitored for that Check.
The Alert Plan named must be defined in schedule.cf.
|
fix with <name>
|
Specify a Fix to use with this Check. If enabled in the appropriate
Alert Plan, a predefined Fix may be attempted to correct an error
state reported by a Check. The Fix must be defined before
the Check is defined. Fixes are described below.
|
check all hosts
|
Run the Check on all defined hosts. The Check will run on every
host defined in every HostClass, regardless of group memberships.
(If multiple Instances are in use, the Check will only be run on
the hosts in the same Instance as the Check.)
|
helpfile <name>
|
Associate a helpfile with this Check. When provided, the contents
of this file may be transmitted along with the alert. This is
useful to provide hints or reminders along with an error message.
Not all format modules support helpfiles. If not begun with /,
filenames are relative to helpdir, as defined in instance.cf.
|
timeout <number> <interval>
|
Override the default Check timeout. The timeout indicates how long
the scheduler should wait for the Check to complete. A timeout of 0
seconds is exactly that: an instantaneous timeout.
|
result text significant
|
Ordinarily, the textdescription portion of a check result
(as defined in the check module specification)
is treated as a comment, and simply recorded. If this flag is set,
the scheduler will look for changes to the text description in addition
to changes in the check return code when determining whether to clear
alert state, including acknowledgements.
|
default timeout
|
The default module timeout is 45 seconds. The default may be
changed at any point within the configuration file. The new
default will apply to any Checks (or Fixes) defined after it, unless
overridden within a stanza or until the default is changed again.
For example:
# Change module timeout
default timeout 3 minutes
|
default check transport module
|
The default Transport is via scheduler, which is
not actually a Transport but rather an indication to execute
the module directly on the scheduler host. The default may be
changed at any point within the configuration file. The new
default will apply to any Checks defined after it (unless
overridden within the check stanza) until the end of the
file or until the default is changed again. For example:
# Run all subsequent check modules via the plaintext transport stanza
default check via remote
# Except in this module
check imap {
module protocol {
service imap
}
via scheduler
}
Changing the default value should be done with care, as some modules
will not run correctly remotely, and others will not run correctly
on the scheduler. Achieving this using default check via
rather than via statements within a check stanza
may reduce the understandability of the configuration file.
|
Examples
- A simple Check, using the protocol module. The module is
passed the service argument http to instruct it to verify that
protocol. This Check will be run for any host found in the
corresponding host Group http, as defined in host.cf.
check http {
module protocol {
service http
}
}
- A Check using the ping module
to monitor every host defined in every HostClass.
check ping {
module ping {}
check all hosts
}
- An IMAP Check (using the protocol
module) that sends out the contents of $helpdir/imap-problems
if the relevant format module supports it.
check imap {
module protocol {
service imap
}
helpfile imap-problems
}
- A ping test with a very quick timeout.
check fastping {
module ping {}
timeout 5 seconds
}
- In order to demonstrate how the override of Schedules and AlertPlans
work, here is an excerpt from a sample schedule.cf.
# From host.cf
class class1 {
hosts {
server1
}
check on hourly schedule
alert on standard alertplan
}
class class2 {
hosts {
server2
}
check on halfhourly schedule
alert on standard alertplan
}
group imap {
server1
server2
}
Back in check.cf, server1 will be checked for
imap hourly and server2 will be checked every
30 minutes.
# From check.cf
check imap {
module protocol {
service imap
}
}
However, if this next stanza were defined instead, both
server1 and server2 would check on the
halfhourly schedule and alert on the critical alertplan.
check imap {
module protocol {
service imap
}
check on halfhourly schedule
alert on critical alertplan
}
Composite Checks
|
syntax
|
check <name> {
[required checks {
<name>
[...]
}]
[optional checks {
<name>
[...]
}]
[check on <name> schedule]
[alert on <name> alertplan]
[fix with <name>]
[check all hosts]
[helpfile <name>]
[timeout <number> <interval>]
}
default timeout <number> <interval>
|
About Composite Checks
A composite check is a Check whose results are determined by
the execution of its member or component Checks. When a
composite Check is scheduled, all of its component Checks are
simultaneously run for all hosts to be examined. The results are
then accumulated and sorted to determine the status of the the
Composite Check.
Composite Checks cannot be composed of other Composite Checks.
This restriction may be lifted in a future release.
Composite Checks are more resource intensive than standard Checks,
and so should not be used when a standard Check can accomplish the
same task.
Composite Check Result Accumulation
There are two types of components: required and
optional. Required components behave similarly to boolean
and, optional components behave similarly to boolean
or. At least one required or one optional component must be
specified. The same component cannot be both required and optional.
The return value of the Composite Check is determined as follows:
- If any component, whether required or optional, does not complete,
then the composite exits with MODEXEC_TIMEDOUT.
- If required components were specified and any failed,
then the Composite Check exits with the highest exit code
in the order
- MODEXEC_MISCONFIG
- MODEXEC_PROBLEM
- MODEXEC_WARNING
- MODEXEC_OK
- For any other return value, larger takes priority over smaller
- If required components were specified and none failed,
then the Composite Check exits with MODEXEC_OK regardless of the
status of any optional components.
- If no required components were specified and any optional
components succeeded, then the Composite Check exits with
MODEXEC_OK.
- If no required components were specified and no optional
component succeeded, then the Composite Check exits with the lowest
exit code in the order
- MODEXEC_OK
- MODEXEC_WARNING
- MODEXEC_PROBLEM
- MODEXEC_MISCONFIG
- For any other return value, smaller takes priority over larger
Comments generated by each component are concatenated together in the
order they are received, delimited by a semi-colon (;).
Scalar values, as defined in the check module
specification and recorded in the history files as specified in
instance.cf, consist of the
number of completed component Checks. The scalar values returned by
each component Check are ignored.
All components of a Composite Check are executed simultaneously.
This is useful when a test cannot properly be executed by one module
alone, or when multiple Checks can trigger execution of the same fix
(see below).
Check Dependencies prevent other Checks from executing. This is useful
when a host becomes unavailable. Only one alert notification (say, for
ping) will be transmitted, instead of one alert notification
for every service configured for that host.
Dependencies
- The component Checks specified by the required and
optional keywords must be defined first in
check stanzas.
- All dependencies that apply to Checks
also apply here.
Composite Check Stanza Keywords
Except where noted below, these keywords have the same behavior as
for simple Checks.
check <name>
|
Define a new Composite Check.
|
required
|
Specify the names of the required components. Each name corresponds
to a previously defined check stanza.
|
optional
|
Specify the names of the optional components. Each name corresponds
to a previously defined check stanza.
|
check on <name> schedule
|
Specify a Schedule to use for this Check.
Whether or not this keyword is provided, any Schedules that would
otherwise apply to the component Checks are ignored.
|
alert on <name> alertplan
|
Specify an Alert Plan to use for this Check.
Whether or not this keyword is provided, any Alert Plans that would
otherwise apply to the component Checks are ignored.
|
fix with <name>
|
Specify a Fix to use with this Check.
Whether or not this keyword is provided, any Fixes that would
otherwise apply to the component Checks are ignored.
|
check all hosts
|
Run the Check on all defined hosts.
|
helpfile <name>
|
Associate a helpfile with this Check.
|
timeout <number> <interval>
|
Override the default Check timeout.
Whether or not this keyword is provided, any timeouts that would
otherwise apply to the component Checks are ignored.
|
Examples
- A Composite Check to monitor an application that has both a web
interface and a directory interface. Both Checks must complete
successfully for the composite Check to be successful. Only the
gizmo group would be defined in host.cf.
check gizmoldap {
module ldap {
port 6389
filter uid=testuser
response objectclass=candidate
}
}
check gizmohttp {
module httpurl {
path /cgi-bin/welcome
query newuser=true
}
}
check gizmo {
required checks {
gizmoldap
gizmohttp
}
check on critical schedule
alert on critical alertplan
}
Fixes
|
syntax
|
fix <name> {
module <name> {
[<argname> <argvalue>]
[...]
}
[via scheduler | via <name>]
[require host locking | require service locking]
[expire lock after <number> <interval>]
[timeout <number> <interval>]
}
default expire fix lock after <number> <interval>
default fix via (<name> | scheduler)
default timeout <number> <interval>
|
About Fixes
A fix stanza defines the module to be executed for a Fix.
Fixes are run in an attempt to automatically restore a service.
Generally, fix modules must execute via transport modules. If
multiple Checks can trip a Fix, it may make sense to composite them.
A Fix will be attempted automatically when the following criteria
are met:
- A Fix has been defined in check.cf.
- An Alert Plan has been defined in schedule.cf that
includes an attempt fix try.
- A Check has been defined which uses the Fix and Alert Plan
described above.
- The Check failure status meets the requirements of the
Alert Plan.
- There is no outstanding acknowledgement or inhibition for
the service@host that has failed.
Fixes may also be manually attempted, regardless of outstanding
failures, acknowledgements, or inhibitions.
When a Fix executes, a lock is established for the
service@host to prevent the Fix from being run more than
once concurrently for the same service@host. It is possible
to establish wider scoped Fix locking, to service@ or
@host, to prevent multiple Fixes for a given service from
running simultaneously on multiple hosts or to prevent multiple Fixes
for a given host from running simultaneously, regardless of service.
Fix modules are subject to timeouts in the same fashion as check
modules. In the event a module times out, its lock may not be
correctly removed. As such, Fix locks may be expired when they are
found to be stale.
Dependencies
- The Transport specified by the via keyword must be defined
first in a transport stanza.
Fix Stanza Keywords
fix <name>
|
Define a new Fix. This name can then be used in the fix with keyword of a Fix definition.
|
module <name>
|
Specify the name of the module to run. Fix modules are expected
to be found in $MODDIR/fix/modulename (where
$MODDIR is by default $INSTDIR/mod).
Arguments passed to the module are specified as name/value pairs,
with one pair per line. The specific arguments for each module are
described in the module's documentation. # symbols are treated
as part of the argument, not as a comment.
|
via scheduler
via <name>
|
Indicate where the Fix runs. Fixes run via scheduler are
executed on the scheduler host, usually not a very useful scenario.
More useful is to run a Fix remotely, using
the via keyword to specify the name of a Transport to use.
The Transport must be defined before the Fix is defined.
Transport are described below.
|
require host locking
|
Require host level locking to execute the Fix. Only one Fix may be
run on a given host at one time, regardless of what the Fixes are for.
|
require service locking
|
Require service level locking to execute the Fix. Only one Fix may be
run on a given service at one time, regardless of how many hosts
require fixing.
|
expire lock after <number> <interval>
|
Override the default Fix lock expiry. When a Fix lock is found that
is at least this old, the lock is considered stale and will be removed.
|
timeout <number> <interval>
|
Override the default Fix timeout. The timeout indicates how long
the scheduler should wait for the Fix to complete. A timeout of 0
seconds is exactly that: an instantaneous timeout.
|
default fix transport module
|
As for check module Transports, the default fix module Transport
is via scheduler, which is of limited utility. The default
may be changed at any point within the configuration file. The new
default will apply to any fixes defined after it (unless overridden
within the fix stanza) until the end of the file or until
the default is changed again. For example:
# Run all subsequent fix modules via the plaintext transport stanza
default fix via remote
|
default fix lock expiry
|
The default time for the expiration of stale fix locks is 120
seconds. The default may be changed at any point within the
configuration file. The new default will apply to any Fixes defined
after it (unless overridden within the fix stanza) until
the end of the file or until the default is changed again. For
example:
# Expire fix locks after 5 minutes
default expire fix lock after 5 minutes
|
default fix timeout
|
The default Fix timeout is the same as the default Check timeout.
|
Examples
- In this example, a Fix is defined for syslog using the
init.d module to restart the
syslog daemon and the process module to detect the
failure. Both use a remote Transport, described below.
fix syslog {
module init.d {
service syslog
}
via remote
}
check syslogd {
module process {
name .*syslogd
}
via remote
fix with syslog
}
- Here, a Fix is defined that can only be run by resetting a central
control, regardless of which host is broken. As such, service
level locking is required. Because the Fix takes a while to run,
the timeout is also extended.
fix gizmo {
module gizmofix {
# A locally written module to fix gizmo
restartkey foobar
}
# Use a locally written transport that knows where the fix runs
via gizmotransport
require service locking
timeout 5 minutes
expire lock after 5 minutes
}
Transports
|
syntax
|
transport <name> {
module <name> {
[<argname> <argvalue>]
[...]
}
}
|
About Transports
A transport stanza defines a mechanism by which a Check or Fix
can be executed on a remote host. More details
about how to set up the infrastucture for remote monitoring can be
found in the documentation for sr.
By default, check and fix modules are run directly on the scheduler
host. However, some check modules and most fix modules run directly
on the host to be monitored. (See the check
module documentation for more information.) Checks and Fixes
that are to be run remotely must be configured to run via
a transport module. The appropriate
transport stanza must be defined before the Check or Fix
that uses it.
Dependencies
Transport Stanza Keywords
transport <name>
|
Define a new Transport. This name can then be used in the
via keyword of a Check definition.
|
module <name>
|
Specify the name of the module to run. Transport modules are expected
to be found in $MODDIR/transport/modulename (where
$MODDIR is by default $INSTDIR/mod).
Arguments passed to the module are specified as name/value pairs,
with one pair per line. The specific arguments for each module are
described in the module's documentation. # symbols are treated
as part of the argument, not as a comment.
|
Examples
- In this example, the mailq
module is configured to run via the plaintext transport module.
transport remote {
module plaintext {}
}
check mailq {
module mailq {
warn 1000
prob 2000
}
via remote
}
Next: dependency.cf
$Date: 2006/11/19 16:34:05 $
$Revision: 0.10 $
|
keywords
alert on alertplan
- (check)
- (composite)
check
- (check)
- (composite)
check all hosts
- (check)
- (composite)
check on schedule
- (check)
- (composite)
default check via
default expire fix lock
default fix via
default timeout
- (check)
- (fix)
expire lock after
fix
fix with
- (check)
- (composite)
helpfile
- (check)
- (composite)
module
- (check)
- (fix)
- (transport)
optional
required
require host locking
require service locking
result text significant
timeout
- (check)
- (composite)
- (fix)
transport
via
- (check)
- (fix)
via scheduler
- (check)
- (fix)
|