All requirements apply to both scheduler and remote check modules,
unless otherwise stated.
- Check modules must be reentrant. That is, if a check
module is run more than once simultaneously, all instances
must run to completion without interference.
- Check modules must not change their process group by any
means, including via setsid(), setpgid(),
setpgrp(), or any similar function.
- Check modules should handle their own parallelization.
If a module is passed more than one host name to check, it is up
to the module to determine the best way to handle it. (This
requirement is relaxed to should from must because
scripted modules may be run under the parallel module.)
- Remote check modules should be written in a scripting
language such as perl to make changes easier and more transparent,
and to allow for easier portability. Scheduler check modules
may also be written in a scripting language. Compiled
modules are permitted when necessary, but are actively discouraged
for remote check modules.
- Each module must place its source code in a directory
underneath survivor/src/modules/check/ with the following
conventions:
- The name of the directory must be
check/modulename/.
- A Makefile.in must be present, with directives for
clean, veryclean, all,
install, and install-remote.
The install should, except in exceptional
circumstances, install the module into
@prefix@/mod/check, owned by @INST_USER@
and @INST_GROUP@, mode 555.
The install-remote directive should be the same
as install, except where it does not make sense for
the module to be installed as part of a remote distribution.
- Documentation describing the module should be in
doc/cm-modulename.html
- Check modules must accept the following arguments:
- -v
A flag indicating the module should validate its configuration.
The module must test for any dependencies (executables,
libraries, modules, configuration files, etc) required for
normal successful execution. If valid, exit with MODEXEC_OK
(using scalar value 0 and the string "Module OK" as
the comment, where Module is the name of the module),
otherwise exit with MODEXEC_PROBLEM, following the output
format specification described below.
- Check modules receive the rest of their data via a
SurvivorCheckData document, where
- Host
Host to perform the check on. Remote check modules will still
be provided this argument, with the value localhost.
Absence of this argument should cause the check module
to exit immediately with an appropriate return code.
- Timeout
The timeout for this module. After timeout seconds,
the check module may be gracelessly terminated. The check
module may use this timeout value to exit gracefully
before time expires. If this option is not provided, the
module may act as if there is no timeout.
- ModuleOption
The names and values of the arguments provided in check.cf or dependency.cf.
This element should conform to the Module XML Argument Specification.
- Check modules should not write output files.
- Check modules must generate output on stdout consisting of
an XML document consisting of a SurvivorCheckResult element for each host
specified. These documents must not be interleaved. Each host's
element should be generated as soon as information is
available, in case the module is timed out. The elements defined
are
- Host
This must be the name of the host as provided by the
SurvivorCheckData argument.
- ReturnCode
The numeric return code (as defined in include/survivor.H).
Possible values include
- MODEXEC_OK: No problem was found.
- MODEXEC_PROBLEM: A critical problem was
found, or the check could not be completed for critical reasons.
- MODEXEC_WARNING: A non-critical problem
was found, and is in danger of becoming critical.
- MODEXEC_NOTICE: A non-critical problem was
found, or the check could not be completed for
non-critical reasons.
- MODEXEC_MISCONFIG: The module is
misconfigured and is unable to perform its check.
- MODEXEC_TIMEDOUT: The check timed out.
but ReturnCode may be a value of 20 through 1000 to
transmit custom return information.
- Scalar
The scalar value must be an integer, either positive
or negative, indicating a value that may be used for long term
monitoring. For example, the number might be a load, or a
simple '0' (no) or '1' (yes) indicating that a service is
responding or not. For disk space usage, it might be between
0 and 100 to indicate fullness, or it might be an actual
amount of bytes in use.
- Comment
The comment may be an empty string, or it
may provide a human readable explanation of the return
or scalar values. The comment may be reformatted or truncated.
- Duration
The duration of the check execution, in milliseconds. If
present, this value must be an integer zero or greater.
- Check modules must exit with the highest return value
generated by any host checked, unless custom return values are in
use, in which case the check module may exit with whatever
value the custom specifications require. Check modules executing
a Type II dependency must exit with a return value
appropriate for the results obtained from the check.
$Date: 2006/11/20 00:05:07 $
$Revision: 0.11 $
|
keywords
|