About schedule.cf
|
syntax
|
- Whitespace is generally ignored, unless otherwise noted.
- boldface denotes keywords.
- [brackets] denote optional keywords.
- Separators (|) denote mutually exclusive keywords.
- Comments begin with a # symbol, and may appear
anywhere. Comments must be terminated with newlines.
<day> : "sunday" | "monday" | "tuesday" | "wednesday" | "thursday" |
"friday" | "saturday"
<name> : [a-zA-Z0-9\.+%@_-]+
<number> : [0-9]+
<interval> : <number> "second[s]"|"minute[s]"|"hour[s]"|"day[s]"|"week[s]"
<time> : [<day>] [0-2][0-9]:[0-5][0-9]
|
schedule.cf defines Schedules, which define periods of time and
intervals for that period of time, Alert
Plans, which define behaviors for when a Check
returns an error, Alert Plan Aliases,
which allow Alert Plans to be reused with a different Call List, Return
Groups, which group together return codes for easier reference,
Alert Throttle, which restricts the
number of alert notifications that can be queued for transmission at one
time, and Alert Shift, which offsets the
times in a Schedule definition for use with Alert Plans.
Schedules
|
syntax
|
schedule <name> {
([from <time> until <time>] (every <interval> | never) |
at { <time> ... })
[...]
}
|
About Schedules
A schedule stanza defines one or more time period/frequency
pairs. A time period indicates a block of time with fixed start
and end points. When no time period is indicated, all time periods
are implied (ie: it is always "now"). A frequency indicates how
often whatever is using the Schedule should execute.
If more than one time period/frequency pair is specified in a Schedule,
the first time period matched will be used.
Schedules are used to determine when Checks are executed, form
the building blocks of Alert Plans, and serve some other purposes.
Dependencies
Schedule Stanza Keywords
schedule <name>
|
Define a new Schedule.
|
from <time> until <time>
|
Specify a time period during which this Schedule is in effect.
The period runs through the last second of the until minute.
|
every <interval>
|
Specify a frequency for when this time period is in effect.
Note that frequencies are generally implemented as approximations, so
a frequency of every 1 hour beginning at 08:00 when used as a Check
Schedule might cause the Check to execute at 08:00, 09:01, 10:02,
11:02, 12:03, 13:04, 14:04, 15:05, etc. every Schedules
may be expected to drift.
|
at
|
Define a time period of one or more exact times, with an implied
frequency of once per time specified.
Note that care should be taken when using the same at
Schedule with both a Check and an Alert Plan for the same service.
It is possible that an alert notification will transmit before the
Check runs, which may produce counterintuitive results.
See Alert Shift for more information.
|
Examples
- First, some example time periods. These would be specified within
a Schedule definition, examples of which appear below.
- from 08:30 until 16:59
The time period from 8:30am through (and including) 4:59pm
every day.
- from monday 08:00 until friday 16:59
The time period from 8:00am on Monday through 4:59pm on Friday.
Inverting the order is a useful way to specify weekends.
- from friday 17:00 until monday 07:59
The time period from 5:00pm Friday through 7:59am Monday.
The reason to end at 7:59 is that the Schedule is in effect
through the end of the time specified, in this case 7:59:59.
This allows another time period to be specified beginning
at 08:00 without conflicting.
- Next, some example frequencies. These would be specified within
a Schedule definition, examples of which follow.
- every 30 minutes
- every hour
- never
- A Schedule defining a frequency of every hour from 8:00am until
9:59pm. A Schedule like this can be used to specify when a
Check runs, or it can be used within part of an Alert Plan to
determine how often alert notifications should be transmitted.
schedule sample1 {
from 22:00 until 07:59 never
from 08:00 until 21:59 every hour
}
- A Schedule defining a frequency of every 15 minutes at all times
weekdays, and every 30 minutes on weekends.
schedules sample2 {
from saturday 00:00 until sunday 23:59 every 30 minutes
every 15 minutes
}
- A Schedule defining an explicit time (an at Schedule) with
an implied frequency of once. A Check using the following
Schedule would execute Mondays at 9:30am and every day at 5:30pm.
schedule sample3 {
at { monday 09:30 }
at { 17:30 }
}
- A Schedule to define nights and weekends.
schedule sample4 {
from friday 22:00 until monday 07:59 every hour
from 22:00 until 07:59 every hour
}
Alert Plans
|
syntax
|
alertplan <name> {
(on return value[s] <number> [...] |
on returngroup <name> |
default) {
[after <number> check failure[s]]
using <name> schedule {
try [<number> time[s]] {
[allow <number> failed host[s] [during <name> schedule]]
(alert <name>
[...]
|attempt fix [if defined]
[tell <name>
[...]])
[flag escalated]
}
[...]
}
}
[...]
[notify [using <name> schedule] on clear | do not notify on clear]
[do not clear state on return values { <number> [...] }
|always clear state]
[clear state honors returngroups { <name> [...] }
|clear state honors all returngroups
|clear state ignores returngroups]
}
|
About Alert Plans
An alert plan stanza defines how alert notifications are
transmitted and how Fixes are attempted when a Check returns an
error. An Alert Plan consists of one or more return value
stanzas, which define actions for specific return values obtained
from Checks.
Dependencies
- The Call List specified by the alert or tell
keywords must be defined in calllist.cf.
- The Schedule specified by the using keyword must be
defined by a schedule stanza before the Alert Plan
that uses it is defined.
Alert Plan Stanza Keywords
alertplan <name>
|
Define a new Alert Plan.
|
on return value[s] <number>
on returngroup <name>
default
|
Begin a new return value stanza. Return value stanzas define actions
for specific return values obtained from Checks. If more than one
return value stanza is defined for the same return value, then the
first stanza meeting the specified minimum number of check failures
(via after n check failures)
will be the one used.
Frequently used sets of return values may be defined into
Return Groups, which may be referenced by
defining a return value stanza using the on returngroup
keyword.
A return value stanza using the default keyword matches all
return values.
|
after <number> check failures
|
Specify the number of consecutive Check failures that must occur before
the return value stanza is in effect. This is useful to prevent
transient failures from generating spurious alerts.
|
using <name> schedule
|
Begin a new schedule stanza. When the named Schedule is in effect,
the try stanzas defined within the schedule stanza will be used.
If more than one schedule stanza is defined within the return value
stanza, then the first Schedule in effect will be used.
|
try <number> times
try
|
Begin a new try stanza. When a schedule stanza matches, a try stanza
defined within it will be selected to determine what actions to take.
Each try stanza must contain either an alert or an
attempt fix keyword.
The try stanza is selected according to the number of alert
notifications for the same status that have already been transmitted.
A try stanza defined without a number of times indicated
will match by default if no previous stanza matches.
Defining multiple try stanzas allows an alert notification to escalate.
Note that a status change is not necessarily the same as a problem
clearing. If a status changes from PROBLEM to WARNING,
any escalation will reset, the same as for PROBLEM to OK.
|
allow <number> failed host[s]
|
Specify degraded mode, allowing the specified number of hosts in the
same group to fail without triggering an alert notification. This
is useful if a redundant host fails when there is plenty of capacity
on the other available hosts.
Hosts are considered redundant when
- The hosts are in the same Group
- The hosts use the same Alert Plan (whether defined via the Check
or via the HostClass)
When degraded mode is in effect, alert actions for redundant hosts
are executed when
- Non-zero return codes are found for at least the specified number
of hosts (the return codes need not match)
- Each host meets the specified minimum number of Check failures
(via after n Check failures)
|
during <name> schedule
|
Specify when degraded mode is in effect.
|
alert <name>
|
Specify which Call List(s) to notify.
|
flag escalated
|
Specify at which point the problem is considered escalated. This is
for the web interface to know when to call
additional attention to a problem. By default, the second try
stanza is the point of escalation for the web interface.
This flag has no bearing on the normal escalation mechanism by
which subsequent try stanzas are attempted.
|
attempt fix
|
Specify that the predefined Fix for the problem should be attempted.
For more information on Fixes, see check.cf.
|
if defined
|
Specify that the predefined Fix for the problem should be attempted
if one exists, otherwise move to the next alert action and do not
generate a misconfiguration error. This is useful to define
Alert Plans that can be used with different Checks regardless of
whether or not each Check has a Fix defined. However, use of this
keyword will make it more difficult to detect errors in configuration
(ie: when a Fix was intended but not actually defined).
|
tell
|
When a Fix is attempted, specify which Call
List(s) to notify.
|
notify on clear
notify using <schedule> on clear
do not notify on clear
|
Specify when notification is to be sent on a problem clearing (returning
to state OK). Whoever was last notified about the problem
will receive the clear notification, even if they are no longer on call.
If a Schedule is specified, then notification will only be sent when
that Schedule is in effect. While Schedules require a frequency,
frequencies do not make sense for clear notifications, which are
transmitted only once. The scheduler will ignore any frequency
specifications in the Schedule, including never.
If a problem is acknowledged or alerts for the service or host are
inhibited, then no notification will be transmitted when the problem
clears. If the Check is manually rescheduled and the problem no
longer exists, no notification will be sent.
If a clear notification is not transmitted successfully for any
reason, it will not be requeued because the no longer valid alert
state must be cleared by the scheduler.
|
do not clear state on return values
always clear state
|
Specify when return-value dependent state should be cleared.
Ordinarily, alert and fix state (including acknowledgements) is
cleared when the return value of a Check changes, for example from
WARNING to PROBLEM. This is equivalent to
always clear state.
When return values are specified, state will not be cleared when the
return value of a Check changes to or from a return value listed.
The exception is if the return value is changing from a specified
value to OK, in which case the state is always cleared.
Note that these keywords do not affect the rules for
try stanzas. However, this is subject
to change in a future release.
|
clear state honors returngroups
clear state honors all returngroups
clear state ignores returngroups
|
Specify when return-value dependent state should be cleared.
Ordinarily, alert and fix state (including acknowledgements) is
cleared when the return value of a check changes, for example from
WARNING to PROBLEM. This is equivalent to
clear state ignores returngroups.
When Return Groups are specified, state will not be cleared when the
return value that a Check changes from and to are both members
of a specified Return Group. The exception is if the return value is
changing to OK, in which case the state is always cleared.
Note that these keywords do not affect the rules for
try stanzas. However, this is subject
to change in a future release.
|
default check failures
|
The default number of Check failures is 1. In order to increase
the readability of the configuration file, the default number of
Check failures may be changed outside of an Alert Plan. The new
default will apply to any Alert Plans defined after it, until the
end of the file or until the default is changed again. For example:
# Require all Checks to fail twice before any Alert Plan does anything.
default after 2 check failures
alertplan sample1c {
on return values 1 5 {
using sample1 schedule {
...
}
}
default {
using sample1 schedule {
...
}
}
}
|
default notify on clear
|
The default notify on clear state is to not notify on clear.
To avoid having to specify notify on clear for each
Alert Plan, the default state can be changed outside of an
alert plan stanza. The new default will apply to Alert Plans
defined after it, until the end of the file or until the default is
changed again. For example:
# Notify on clear for all subsequent alertplans
default notify using sample1 schedule on clear
To restore the default back to no notification, or to override
notification within an Alert Plan, use do not notify
instead. For example:
alertplan sample1d {
default {
...
}
do not notify on clear
}
# Or turn it off for all subsequent Alert Plans
default do not notify on clear
|
default do not clear state
|
The default is to clear state whenever the return value of a Check
changes. To avoid having to specify do not clear state for
each Alert Plan, the default can be changed outside of an Alert Plan
stanza. The new default will apply to Alert Plans defined after it,
until the end of the file or until the default is changed again. For
example:
# Don't clear state if a module is misconfigured
default do not clear state on return values { 4 }
To restore the default back to always clearing state, or to override
clearing state within an Alert Plan, use always clear state
instead. For example:
alertplan sample1e {
default {
...
}
always clear state
}
# Or always clear state for all subsequent Alert Plans
default always clear state
|
default clear state honors returngroups
|
The default is to clear state whenever the return value of a Check
changes. To avoid having to specify clear state honors
returngroups for each Alert Plan, the default can be changed
outside of an Alert Plan stanza. The new default will apply to Alert
Plans defined after it, until the end of the file or until the
default is changed again. For example:
# define a returngroup
returngroup problem { 1 5 }
# PROBLEM and TIMEDOUT are basically the same, so don't clear state
# if the return code swaps between them
default clear state honors returngroups { problem }
To not clear state when a return code swaps within any Return Group
defined so far:
default clear state honors all returngroups
To restore the default back to always clearing state, or to override
clearing state within an Alert Plan, use
clear state ignores returngroups instead. For example:
alertplan sample1f {
default {
...
}
clear state ignores returngroups
}
# Or always clear state for all subsequent alertplans
default clear state ignores returngroups
|
Examples
- First, the basic structure of an Alert Plan consists of one or
more return value stanzas, which define actions according to
the return values obtained from the Check modules. In the
following example, three different return values stanzas are
specified: one for return values 1 and 5 (generally interpreted
as PROBLEM and TIMED OUT), one for return
value 4 (MISCONFIGURATION), and one for any other
return value:
alertplan sample1 {
on return values 1 5 {
...
}
on return value 4 {
...
}
default {
...
}
}
- Notification on clear is also set at the top level of an Alert Plan.
In this example, when this Alert Plan is used and when the
sample1 Schedule is in effect, notification will be sent
when a problem clears:
alertplan sample2 {
on return values 1 5 {
...
}
default {
...
}
notify using sample1 schedule on clear
}
- Each return value stanza holds instructions on when to transmit
alert notifications. In this example, two consecutive Check
failures are required before any requested alert actions are
taken, and then only if the sample1 Schedule is in
effect.
alertplan sample3 {
on return values 1 5 {
after 2 check failures
using sample1 schedule {
...
}
}
default {
after 2 check failures
using sample1 schedule {
...
}
}
}
- This next example illustrates the use of multiple try stanzas
to handle escalations:
alertplan sample4 {
default {
using sample1 schedule {
try 2 times {
# Alert the first level response list not more than two times
# after the check has failed twice during the sample1 schedule.
alert oncall-list
}
try 2 times {
# Consider the alert escalated and alert the second level
# response list not more than two times since the first level
# response list has not cleared the problem.
alert backup-list
alert oncall-list
}
try {
# Consider the alert escalated and alert the oncall managers
# until the problem clears.
alert oncall-list
alert backup-list
alert managers
flag escalated
}
}
}
}
- This example tries the predefined Fix before escalating:
alertplan sample5 {
default {
using sample1 schedule {
try 1 time {
attempt fix
tell oncall-list
}
try {
alert oncall-list
}
}
}
}
Alert Plan Aliases
|
syntax
|
alias <name> to <name> (replacing <name> with <name>
|adding <name>)
|
About Alert Plan Aliases
An Alert Plan Alias duplicates an Alert Plan, but replaces references
to one of its Call Lists with another, or adds a Call List to the
source's set of Call Lists. Currently, only one Call List may be
replaced or added in an Alert Plan Alias, although an Alias can be
aliased. Future releases may permit multiple substitutions or
additions within the same Alias.
Dependencies
- An Alert Plan must be defined before it can be aliased.
Alert Plan Alias Stanza Keywords
alias <name> to <name>
|
Define a new Alert Plan Alias. alias specifies the
Alert Plan to copy, to specifies the name of the new
Alert Plan.
|
replacing <name> with <name>
|
Change the occurrences of the Call List specified by replacing
to the Call List specified by with in the new Alert Plan.
|
adding <name>
|
Add the Call List specified to the new Alert Plan.
|
Examples
- In this example, the sample4 Alert Plan defined above
is reused, but net-oncall becomes the first Call List,
instead of oncall-list.
alias sample4 to sample6 replacing oncall-list with net-oncall
- In this example, net-oncall is added in addition to
oncall-list (and all the others) rather than replacing it.
alias sample4 to sample7 adding net-oncall
Return Groups
|
syntax
|
returngroup <name> { <number> ... }
|
About Return Groups
To avoid having to specify similar sets of return codes over and
over again, Return Groups can be defined and then used within
Alert Plans.
Dependencies
Return Group Keywords
returngroup <name>
|
Define a new Return Group, consisting of the return codes specified.
|
Examples
- In this example, return codes 1 and 5 are defined into one
Return Group, which is then used in a simple Alert Plan.
returngroup problem { 1 5 }
alertplan simple {
on returngroup problem {
using halfhourly schedule
try { alert rotating }
}
}
Alert Throttle
|
syntax
|
throttle <number>
|
About Alert Throttle
Alert Throttle restricts the number of alert notifications that may
be queued per minute. Any alert notifications that have not been
queued when the throttle value is reached will be queued (up to the
throttle value) at the next scheduler run.
If the throttle keyword is not specified in
schedule.cf, the default value of 10 will be used.
If an Alert Plan specified that two or more Call Lists are to be
notified, that is considered one "logical" alert notification for
purposes of calculating the Alert Throttle.
If a clear alert notification is not queued because the Alert Throttle
has been reached, that alert will not be requeued because the
no longer valid alert state must be cleared by the scheduler.
Dependencies
Alert Throttle Keywords
throttle <number>
|
Set the Alert Throttle to the specified number of messages per
minute. If a value of 0 is specified, no throttling is performed.
|
Examples
- In this example, no more than two alert notifications per minute
are permitted:
throttle 2
Alert Shift
|
syntax
|
alert shift <number> <interval>
|
About Alert Shift
Alert Shift offsets all Schedules used in any Alert Plan by the
amount of time specified. (The Schedules are unmodified for Check
purposes.) This is useful to avoid situations where an alert
notification is executed before the Check status it has been
examining has been updated, and is exceptionally useful
for at Schedules.
Dependencies
Alert Shift Keywords
alert shift <number> <interval>
|
Set the Alert Shift to the interval specified.
|
Examples
- In this example, Checks using the sample1 Schedule will
execute at 9:30am, while Alert Plans using the same Schedule will
execute at 9:32am.
alert shift 2 minutes
schedule sample1 {
at { 09:30 }
}
Next: host.cf
$Date: 2006/11/19 14:08:11 $
$Revision: 0.15 $
|
keywords
after
alert
alert shift
alertplan
alias
allow
always clear state
at
attempt fix
clear state honors
clear state ignores
default
default check failures
default do not clear state
default notify on clear
do not clear state
do not notify on clear
during
every
flag escalated
from
if defined
notify on clear
returngroup
on return value
on returngroup
schedule
tell
throttle
try
until
using
|