survivor: schedule.cf

SURVIVOR: schedule.cf

About schedule.cf
Schedules
Alert Plans
Alert Plan Aliases
Return Groups
Alert Throttle
Alert Shift

About schedule.cf

syntax

Whitespace is generally ignored, unless otherwise noted.
boldface denotes keywords.
[brackets] denote optional keywords.
Separators (|) denote mutually exclusive keywords.
Comments begin with a # symbol, and may appear anywhere. Comments must be terminated with newlines.

     <day> : "sunday" | "monday" | "tuesday" | "wednesday" | "thursday" |
                "friday" | "saturday"
     <name> : [a-zA-Z0-9\.+%@_-]+
     <number> : [0-9]+
     <interval> : <number> "second[s]"|"minute[s]"|"hour[s]"|"day[s]"|"week[s]"
     <time> : [<day>] [0-2][0-9]:[0-5][0-9]

schedule.cf defines Schedules, which define periods of time and intervals for that period of time, Alert Plans, which define behaviors for when a Check returns an error, Alert Plan Aliases, which allow Alert Plans to be reused with a different Call List, Return Groups, which group together return codes for easier reference, Alert Throttle, which restricts the number of alert notifications that can be queued for transmission at one time, and Alert Shift, which offsets the times in a Schedule definition for use with Alert Plans.

Schedules
syntax	schedule <name> { ([from <time> until <time>] (every <interval> \| never) \| at { <time> ... }) [...] }

About Schedules

A schedule stanza defines one or more time period/frequency pairs. A time period indicates a block of time with fixed start and end points. When no time period is indicated, all time periods are implied (ie: it is always "now"). A frequency indicates how often whatever is using the Schedule should execute.

If more than one time period/frequency pair is specified in a Schedule, the first time period matched will be used.

Schedules are used to determine when Checks are executed, form the building blocks of Alert Plans, and serve some other purposes.

Dependencies

None.

Schedule Stanza Keywords

schedule <name>	Define a new Schedule.
from <time> until <time>	Specify a time period during which this Schedule is in effect. The period runs through the last second of the until minute.
every <interval>	Specify a frequency for when this time period is in effect. Note that frequencies are generally implemented as approximations, so a frequency of every 1 hour beginning at 08:00 when used as a Check Schedule might cause the Check to execute at 08:00, 09:01, 10:02, 11:02, 12:03, 13:04, 14:04, 15:05, etc. `every` Schedules may be expected to drift.
at	Define a time period of one or more exact times, with an implied frequency of once per time specified. Note that care should be taken when using the same `at` Schedule with both a Check and an Alert Plan for the same service. It is possible that an alert notification will transmit before the Check runs, which may produce counterintuitive results. See Alert Shift for more information.

Examples

First, some example time periods. These would be specified within a Schedule definition, examples of which appear below.
- from 08:30 until 16:59
  The time period from 8:30am through (and including) 4:59pm every day.
- from monday 08:00 until friday 16:59
  The time period from 8:00am on Monday through 4:59pm on Friday. Inverting the order is a useful way to specify weekends.
- from friday 17:00 until monday 07:59
  The time period from 5:00pm Friday through 7:59am Monday. The reason to end at 7:59 is that the Schedule is in effect through the end of the time specified, in this case 7:59:59. This allows another time period to be specified beginning at 08:00 without conflicting.
Next, some example frequencies. These would be specified within a Schedule definition, examples of which follow.
- every 30 minutes
- every hour
- never
A Schedule defining a frequency of every hour from 8:00am until 9:59pm. A Schedule like this can be used to specify when a Check runs, or it can be used within part of an Alert Plan to determine how often alert notifications should be transmitted.
```
     schedule sample1 {
       from 22:00 until 07:59 never
       from 08:00 until 21:59 every hour
     }
     
```

A Schedule defining a frequency of every 15 minutes at all times weekdays, and every 30 minutes on weekends.

     schedules sample2 {
       from saturday 00:00 until sunday 23:59 every 30 minutes
       every 15 minutes
     }

A Schedule defining an explicit time (an at Schedule) with an implied frequency of once. A Check using the following Schedule would execute Mondays at 9:30am and every day at 5:30pm.
```
     schedule sample3 {
       at { monday 09:30 }
       at { 17:30 }
     }
     
```

A Schedule to define nights and weekends.

     schedule sample4 {
       from friday 22:00 until monday 07:59 every hour
       from 22:00 until 07:59 every hour
     }

Alert Plans

syntax

     alertplan <name> {
         (on return value[s] <number> [...] |
	  on returngroup <name> |
	  default) {
	     [after <number> check failure[s]]
	     using <name> schedule {
	         try [<number> time[s]] {
		     [allow <number> failed host[s] [during <name> schedule]]
		     (alert <name>
		      [...]
		     |attempt fix [if defined]
		      [tell <name>
		       [...]])
		     [flag escalated]
		 }
		 [...]
	     }
	 }
         [...]
	 [notify [using <name> schedule] on clear | do not notify on clear]
	 [do not clear state on return values { <number> [...] }
	 |always clear state]
	 [clear state honors returngroups { <name> [...] }
	 |clear state honors all returngroups
	 |clear state ignores returngroups]
     }

About Alert Plans

An alert plan stanza defines how alert notifications are transmitted and how Fixes are attempted when a Check returns an error. An Alert Plan consists of one or more return value stanzas, which define actions for specific return values obtained from Checks.

Dependencies

The Call List specified by the alert or tell keywords must be defined in calllist.cf.
The Schedule specified by the using keyword must be defined by a schedule stanza before the Alert Plan that uses it is defined.

Alert Plan Stanza Keywords

alertplan <name>	Define a new Alert Plan.
on return value[s] <number> on returngroup <name> default	Begin a new return value stanza. Return value stanzas define actions for specific return values obtained from Checks. If more than one return value stanza is defined for the same return value, then the first stanza meeting the specified minimum number of check failures (via `after n check failures`) will be the one used. Frequently used sets of return values may be defined into Return Groups, which may be referenced by defining a return value stanza using the `on returngroup` keyword. A return value stanza using the `default` keyword matches all return values.
after <number> check failures	Specify the number of consecutive Check failures that must occur before the return value stanza is in effect. This is useful to prevent transient failures from generating spurious alerts.
using <name> schedule	Begin a new schedule stanza. When the named Schedule is in effect, the try stanzas defined within the schedule stanza will be used. If more than one schedule stanza is defined within the return value stanza, then the first Schedule in effect will be used.
try <number> times try	Begin a new try stanza. When a schedule stanza matches, a try stanza defined within it will be selected to determine what actions to take. Each try stanza must contain either an `alert` or an `attempt fix` keyword. The try stanza is selected according to the number of alert notifications for the same status that have already been transmitted. A try stanza defined without a number of `times` indicated will match by default if no previous stanza matches. Defining multiple try stanzas allows an alert notification to escalate. Note that a status change is not necessarily the same as a problem clearing. If a status changes from `PROBLEM` to `WARNING`, any escalation will reset, the same as for `PROBLEM` to `OK`.
allow <number> failed host[s]	Specify degraded mode, allowing the specified number of hosts in the same group to fail without triggering an alert notification. This is useful if a redundant host fails when there is plenty of capacity on the other available hosts. Hosts are considered redundant when The hosts are in the same Group The hosts use the same Alert Plan (whether defined via the Check or via the HostClass) When degraded mode is in effect, alert actions for redundant hosts are executed when Non-zero return codes are found for at least the specified number of hosts (the return codes need not match) Each host meets the specified minimum number of Check failures (via `after n Check failures`)
during <name> schedule	Specify when degraded mode is in effect.
alert <name>	Specify which Call List(s) to notify.
flag escalated	Specify at which point the problem is considered escalated. This is for the web interface to know when to call additional attention to a problem. By default, the second try stanza is the point of escalation for the web interface. This flag has no bearing on the normal escalation mechanism by which subsequent try stanzas are attempted.
attempt fix	Specify that the predefined Fix for the problem should be attempted. For more information on Fixes, see `check.cf`.
if defined	Specify that the predefined Fix for the problem should be attempted if one exists, otherwise move to the next alert action and do not generate a misconfiguration error. This is useful to define Alert Plans that can be used with different Checks regardless of whether or not each Check has a Fix defined. However, use of this keyword will make it more difficult to detect errors in configuration (ie: when a Fix was intended but not actually defined).
tell	When a Fix is attempted, specify which Call List(s) to notify.
notify on clear notify using <schedule> on clear do not notify on clear	Specify when notification is to be sent on a problem clearing (returning to state `OK`). Whoever was last notified about the problem will receive the clear notification, even if they are no longer on call. If a Schedule is specified, then notification will only be sent when that Schedule is in effect. While Schedules require a frequency, frequencies do not make sense for clear notifications, which are transmitted only once. The scheduler will ignore any frequency specifications in the Schedule, including never. If a problem is acknowledged or alerts for the service or host are inhibited, then no notification will be transmitted when the problem clears. If the Check is manually rescheduled and the problem no longer exists, no notification will be sent. If a clear notification is not transmitted successfully for any reason, it will not be requeued because the no longer valid alert state must be cleared by the scheduler.
do not clear state on return values always clear state	Specify when return-value dependent state should be cleared. Ordinarily, alert and fix state (including acknowledgements) is cleared when the return value of a Check changes, for example from `WARNING` to `PROBLEM`. This is equivalent to `always clear state`. When return values are specified, state will not be cleared when the return value of a Check changes to or from a return value listed. The exception is if the return value is changing from a specified value to `OK`, in which case the state is always cleared. Note that these keywords do not affect the rules for `try` stanzas. However, this is subject to change in a future release.
clear state honors returngroups clear state honors all returngroups clear state ignores returngroups	Specify when return-value dependent state should be cleared. Ordinarily, alert and fix state (including acknowledgements) is cleared when the return value of a check changes, for example from `WARNING` to `PROBLEM`. This is equivalent to `clear state ignores returngroups`. When Return Groups are specified, state will not be cleared when the return value that a Check changes from and to are both members of a specified Return Group. The exception is if the return value is changing to `OK`, in which case the state is always cleared. Note that these keywords do not affect the rules for `try` stanzas. However, this is subject to change in a future release.

default check failures	The default number of Check failures is 1. In order to increase the readability of the configuration file, the default number of Check failures may be changed outside of an Alert Plan. The new default will apply to any Alert Plans defined after it, until the end of the file or until the default is changed again. For example: # Require all Checks to fail twice before any Alert Plan does anything. default after 2 check failures alertplan sample1c { on return values 1 5 { using sample1 schedule { ... } } default { using sample1 schedule { ... } } }
default notify on clear	The default notify on clear state is to not notify on clear. To avoid having to specify `notify on clear` for each Alert Plan, the default state can be changed outside of an alert plan stanza. The new default will apply to Alert Plans defined after it, until the end of the file or until the default is changed again. For example: # Notify on clear for all subsequent alertplans default notify using sample1 schedule on clear To restore the default back to no notification, or to override notification within an Alert Plan, use `do not notify` instead. For example: alertplan sample1d { default { ... } do not notify on clear } # Or turn it off for all subsequent Alert Plans default do not notify on clear
default do not clear state	The default is to clear state whenever the return value of a Check changes. To avoid having to specify `do not clear state` for each Alert Plan, the default can be changed outside of an Alert Plan stanza. The new default will apply to Alert Plans defined after it, until the end of the file or until the default is changed again. For example: # Don't clear state if a module is misconfigured default do not clear state on return values { 4 } To restore the default back to always clearing state, or to override clearing state within an Alert Plan, use `always clear state` instead. For example: alertplan sample1e { default { ... } always clear state } # Or always clear state for all subsequent Alert Plans default always clear state
default clear state honors returngroups	The default is to clear state whenever the return value of a Check changes. To avoid having to specify `clear state honors returngroups` for each Alert Plan, the default can be changed outside of an Alert Plan stanza. The new default will apply to Alert Plans defined after it, until the end of the file or until the default is changed again. For example: # define a returngroup returngroup problem { 1 5 } # PROBLEM and TIMEDOUT are basically the same, so don't clear state # if the return code swaps between them default clear state honors returngroups { problem } To not clear state when a return code swaps within any Return Group defined so far: default clear state honors all returngroups To restore the default back to always clearing state, or to override clearing state within an Alert Plan, use `clear state ignores returngroups` instead. For example: alertplan sample1f { default { ... } clear state ignores returngroups } # Or always clear state for all subsequent alertplans default clear state ignores returngroups

Examples

First, the basic structure of an Alert Plan consists of one or more return value stanzas, which define actions according to the return values obtained from the Check modules. In the following example, three different return values stanzas are specified: one for return values 1 and 5 (generally interpreted as PROBLEM and TIMED OUT), one for return value 4 (MISCONFIGURATION), and one for any other return value:
```
     alertplan sample1 {
       on return values 1 5 {
         ...
       }
       on return value 4 {
         ...
       }
       default {
         ...
       }
     }
     
```
Notification on clear is also set at the top level of an Alert Plan. In this example, when this Alert Plan is used and when the sample1 Schedule is in effect, notification will be sent when a problem clears:
```
     alertplan sample2 {
       on return values 1 5 {
         ...
       }
       default {
         ...
       }
       notify using sample1 schedule on clear
     }
     
```
Each return value stanza holds instructions on when to transmit alert notifications. In this example, two consecutive Check failures are required before any requested alert actions are taken, and then only if the sample1 Schedule is in effect.
```
     alertplan sample3 {
       on return values 1 5 {
         after 2 check failures

	 using sample1 schedule {
	   ...
	 }
       }
       default {
         after 2 check failures

	 using sample1 schedule {
	   ...
	 }
       }
     }
     
```

This next example illustrates the use of multiple try stanzas to handle escalations:

     alertplan sample4 {
       default {
         using sample1 schedule {
	   try 2 times {
	     # Alert the first level response list not more than two times
	     # after the check has failed twice during the sample1 schedule.
	     alert oncall-list
	   }
	   try 2 times {
	     # Consider the alert escalated and alert the second level
	     # response list not more than two times since the first level
	     # response list has not cleared the problem.
	     alert backup-list
	     alert oncall-list
	   }
	   try {
	     # Consider the alert escalated and alert the oncall managers
	     # until the problem clears.
	     alert oncall-list
	     alert backup-list
	     alert managers
	     flag escalated
	   }
	 }
       }
     }

This example tries the predefined Fix before escalating:

     alertplan sample5 {
       default {
         using sample1 schedule {
	   try 1 time {
	     attempt fix
	     tell oncall-list
	   }
	   try {
	     alert oncall-list
	   }
	 }
       }
     }

Alert Plan Aliases
syntax	alias <name> to <name> (replacing <name> with <name> \|adding <name>)

About Alert Plan Aliases

An Alert Plan Alias duplicates an Alert Plan, but replaces references to one of its Call Lists with another, or adds a Call List to the source's set of Call Lists. Currently, only one Call List may be replaced or added in an Alert Plan Alias, although an Alias can be aliased. Future releases may permit multiple substitutions or additions within the same Alias.

Dependencies

An Alert Plan must be defined before it can be aliased.

Alert Plan Alias Stanza Keywords

alias <name> to <name>	Define a new Alert Plan Alias. `alias` specifies the Alert Plan to copy, `to` specifies the name of the new Alert Plan.
replacing <name> with <name>	Change the occurrences of the Call List specified by `replacing` to the Call List specified by `with` in the new Alert Plan.
adding <name>	Add the Call List specified to the new Alert Plan.

Examples

In this example, the sample4 Alert Plan defined above is reused, but net-oncall becomes the first Call List, instead of oncall-list.
```
     alias sample4 to sample6 replacing oncall-list with net-oncall
     
```
In this example, net-oncall is added in addition to oncall-list (and all the others) rather than replacing it.
```
     alias sample4 to sample7 adding net-oncall
     
```

Return Groups
syntax	returngroup <name> { <number> ... }

About Return Groups

To avoid having to specify similar sets of return codes over and over again, Return Groups can be defined and then used within Alert Plans.

Dependencies

None.

Return Group Keywords

returngroup <name>

Define a new Return Group, consisting of the return codes specified.

Examples

In this example, return codes 1 and 5 are defined into one Return Group, which is then used in a simple Alert Plan.

     returngroup problem { 1 5 }

     alertplan simple {
       on returngroup problem {
         using halfhourly schedule
	 try { alert rotating }
       }
     }

Alert Throttle
syntax	throttle <number>

About Alert Throttle

Alert Throttle restricts the number of alert notifications that may be queued per minute. Any alert notifications that have not been queued when the throttle value is reached will be queued (up to the throttle value) at the next scheduler run.

If the throttle keyword is not specified in schedule.cf, the default value of 10 will be used.

If an Alert Plan specified that two or more Call Lists are to be notified, that is considered one "logical" alert notification for purposes of calculating the Alert Throttle.

If a clear alert notification is not queued because the Alert Throttle has been reached, that alert will not be requeued because the no longer valid alert state must be cleared by the scheduler.

Dependencies

None.

Alert Throttle Keywords

throttle <number>

Set the Alert Throttle to the specified number of messages per minute. If a value of 0 is specified, no throttling is performed.

Examples

In this example, no more than two alert notifications per minute are permitted:
```
     throttle 2
     
```

Alert Shift
syntax	alert shift <number> <interval>

About Alert Shift

Alert Shift offsets all Schedules used in any Alert Plan by the amount of time specified. (The Schedules are unmodified for Check purposes.) This is useful to avoid situations where an alert notification is executed before the Check status it has been examining has been updated, and is exceptionally useful for at Schedules.

Dependencies

None.

Alert Shift Keywords

alert shift <number> <interval>

Set the Alert Shift to the interval specified.

Examples

In this example, Checks using the sample1 Schedule will execute at 9:30am, while Alert Plans using the same Schedule will execute at 9:32am.
```
     alert shift 2 minutes
     
     schedule sample1 {
       at { 09:30 }
     }
     
```

Next: host.cf

$Date: 2006/11/19 14:08:11 $
$Revision: 0.15 $

keywords
after
alert
alert shift
alertplan
alias
allow
always clear state
at
attempt fix
clear state honors
clear state ignores
default
default check failures
default do not clear state
default notify on clear
do not clear state
do not notify on clear
during
every
flag escalated
from
if defined
notify on clear
returngroup
on return value
on returngroup
schedule
tell
throttle
try
until
using