Although every effort is made to minimize incompatibilities between
versions, occasionally changes are required to facilitate future
enhancements and make the system more flexible.
Upgrading from 0.9.7 to 1.0
|
- v1.0 removes support for the notwarncount and notprobcount
arguments to the snmp check module.
There is no replacement for this functionality.
- v1.0 adds a snmpversion argument to the snmp, ups, storedge-t3, nadisk, and hplj, check modules. The
default SNMP version used is '1'. For installations where these
modules were previously used with a different SNMP version, add
the appropriate version as an argument to all relevant checks.
- v1.0 displays the first time a check returned the current result,
via both the command line and web interfaces. When upgrading to
v1.0, the first check time is not available, so the time the
check runs first after the upgrade is used instead. This is
inconsistent with the number of instances reported by the same
interfaces.
Upgrading from 0.9.6 to 0.9.7
|
- v0.9.7 changes some flags to the web interface (sw) and
may require existing bookmarks referencing actions and custom
pagesets to be updated.
- The flags h and s are no longer used to
specify service@host for an action to be performed. Instead,
sh should be used.
- The flag ret should now be used when submitting an
action (a flag) to provide a return path after the
action is processed.
- v0.9.7 changes the specification for callliststatus files
in order to allow substitutions to properly end. Rotating
call lists may lose their place following the upgrade. Before
upgrading, determine who is on call for each rotating call list.
% sc clstat list1
[list1]
-> abc was last notified at abc@site.org via this list
-> List last rotated at Thu Aug 25 09:40:32 2005
-> abc is now on call
After the upgrade, use the new clset command to reset
the call list.
% sc -o person=abc clset list1
[list1] OK: Set
- v0.9.7 changes the specification for report modules slightly.
Report modules may no longer assume a TmpDir will be provided.
Additionally, a new check style has been defined.
For more information, see the updated specification.
Any custom report modules should be updated.
Upgrading from 0.9.5 to 0.9.6
|
- v0.9.6 removes the parallel check module. This module
did not perform any checks, but ran other checks in parallel,
reducing the time required to run them.
Modules run under the parallel module can simply be
run serially instead.
Other methods are available for
module parallelization. Most modules included with the package
use these methods by default.
- v0.9.6 converts check, fix, and transport modules to accept
arguments via XML documents. Custom check, fix, and transport modules that do not use
libcm or Survivor.pm must be modified. Custom
modules that use libcm may need to be modified.
- v0.9.6 generalizes the apc check to support Liebert UPSs
as well. To reflect this, the module has been renamed to
ups. References to apc in check.cf
must be changed to ups.
Upgrading from 0.9.4 to 0.9.5
|
- v0.9.5 converts most state files to an XML based format in order
to facilitate the addition of new features both in this release
and in future releases.
In order to convert the existing state files, as root stop the
v0.9.4 scheduler and then run the convert-state.pl script
in src/util/upgrading. This must be done while the
scheduler is stopped. Run the script once for each
statedir directory defined in
/etc/survivor/instance.cf. Then, start the 0.9.5
scheduler.
If this conversion is not performed, all existing state (but not
history) will be lost when the 0.9.5 scheduler is started.
- v0.9.5 adds a tmpdir keyword to instance.cf, used
for components of the system that require the ability to write
temporary files. The default value is /tmp, which is not
suitable if the system is installed on a host accessible by
non-trusted users. See instance.cf for more information.
Upgrading from 0.9.3 to 0.9.4
|
- v0.9.4 overhauls the internal management of history records in
order to facilitate several changes now and to prepare for
additional changes later. While most of these changes are not
visible, the format of history records has changed.
In order to convert existing history records, as root run the
convert-history.pl script in src/util/upgrading.
This must be done while the scheduler is stopped. It can be done
after the 0.9.4 scheduler has been installed as long as it is not
currently running. Run the script once for each
historydir directory defined in
/etc/survivor/instance.cf.
scheduler# /etc/init.d/survivor stop
scheduler# cd src/util/upgrading
scheduler# ./convert-history.pl /survivor/sample/history
scheduler# /etc/init.d/survivor start
If necessary, it is safe to rerun convert-history twice on
the same directory, as long as the scheduler is not running.
Strictly speaking, converting existing history records is not
necessary. However, having unconverted history records may
prevent the 0.9.4 utilities (including the history retrieval and
rotation functions of sc and the reporting function of
sw) from completing successfully.
- v0.9.4, by default, disables the command line interface for the
root user. This is to increase accountability, as at larger sites
many administrators may have access to the root account, making it
difficult to determine who, for example, inhibited alerts for a
host. Since it is not necessary to be root to run sc, no
functionality is lost. However, if it is desired to have the root
user be able to run the command line interface, simply add the
following line to each instance defined in instance.cf:
allow root
For more information, see the instance
configuration file documentation.
- The 0.9.4 web interface has been overhauled, with two notable
changes. First, cookies are now required for authenticated
sessions. Second, the format of several keywords in
cgi.cf has changed:
For full details of how authentication and authorization now
works, see the documentation for cgi.cf
and the sample configuration file in the source config
directory.
Upgrading from 0.9.2 to 0.9.3
|
0.9.3 removes a dependency on sendmail. In order to ensure
alerts transmitted with the mail transmit module are
successfully sent out, the Perl module Mail::Mailer must
be installed on the scheduler host.
Additionally, 0.9.3 overhauls the configuration of dependencies.
Type II dependencies were improperly implemented in prior releases.
For information on Type II dependencies, see the documentation. Any Type II
dependencies in dependency.cf must be converted.
Type I dependencies have a new syntax, as defined in the documentation. Any Type I
dependencies in dependency.cf must be converted to the new
syntax. For example,
depend foo on bar status
would become
depend {
checks { foo }
for all hosts
on bar status
}
and
depend all except { foo bar } on baz status
would become
depend {
all checks except { foo bar }
for all hosts
on baz status
}
Upgrading from 0.9 or 0.9.1 to 0.9.2
|
0.9.2 introduces a small change to increase the flexibility of the
alerting infrastructure, splitting alert modules into format and transmit
modules. The changes required to use the standard modules are very
simple, just add the following to the beginning of calllist.cf (assuming no local
alert modules are in use):
alert via mail {
format as full
transmit with mail
}
alert via mailtopager {
format as compact
transmit with mail
}
alert via mailtonextel {
format as nextel
transmit with mail
}
alert via mailtosms {
format as sms
transmit with mail
}
Any custom alert modules written will need to be rewritten into
format and/or transmit
modules.
Upgrading from 0.8.x to 0.9
|
0.9 introduces many changes that are incompatible with 0.8.x. Many of
these changes will facilitate future enhancements and make the system
more flexible. These instructions identify the steps needed to
upgrade. We hope that future upgrades will not require nearly so many
changes (and preferably none at all).
- Build the new version.
- Stop the 0.8.x scheduler.
- Install the 0.9 remote package on all remotely monitored hosts.
- Install the 0.9 package on the scheduler host, but do not try
to start the scheduler.
- In calllist.cf, change any
rotating call lists to rotate using a schedule instead of an
explicit time. For example, a call list originally defined as
call list foo {
notifies {
jane@foo
joe@foo
}
via mail
rotates monday 12:00
}
would become
call list foo {
notifies {
jane@foo
joe@foo
}
via mail
rotates using mondayNoon schedule
}
with mondayNoon defined in schedule.cf as
schedule mondayNoon {
at {
monday 12:00
}
}
- Also in calllist.cf, switch to Person-based call lists. Note that
because the state file format for call lists has changed, conversion
of call lists can be slightly complicated. There are three options.
- The easiest procedure is to simply change the definitions.
See the documentation for full
details, but as an example
call list foo {
notifies {
jane@foo
joe@foo
}
via mail
broadcasts to all
}
would become
person jane {
notify jane@foo via mail
}
person joe {
notify joe@foo via mail
}
call list foo {
notifies {
jane
joe
}
via mail
broadcasts to all
}
Note that this procedure will reset simple and rotating call
lists, and may confuse any existing substitutions.
- To clear out all previous state, including any existing
substitutions, first remove all the directories under the
directory calllist in each instance's state directory.
To determine the state directory, see the instance configuration
file, which is usually /etc/survivor/instance.cf.
Then, follow the instructions for #1, above. This will still
reset simple and rotating call lists, but there will be no
confused entries for substitutions.
- If it is necessary to maintain previous state, it is possible
to manually convert the state files. For assistance, please
file a bug report.
- The state file formats for check state and alert state have changed
to support new features introduced in v0.9 and to improve performance.
In order to preserve 0.8.x state, run the movestate.sh script
found in src/util/upgrading once for each statedir
directory defined in /etc/survivor/instance.cf. If the script
is not run, old check and alert state will not carry forward to the v0.9
scheduler and superfluous files will be left lying around.
Run this script as $INSTUSER, or add an appropriate
chown line after the chmod line in the script.
scheduler% su - survivor
% cd src/util/upgrading
% ./movestate.sh /survivor/sample/state
This script should not be run more than once per statedir.
- In schedule.cf, redefine the
alertplans. Alertplans are now defined in terms of tries
rather than the number of check failures. Whereas in v0.8.x an
alert action was based on the number of times a check failed (this
ability is still retained in v0.9 alertplans, but in a less prominent
way), v0.9 selects alert actions based on the number of times an alert
is transmitted. See the documentation
for full details, but as an example
alertplan standard {
default {
after 2 warnings {
alert unix-mailer
using standard schedule
}
}
}
would become
alertplan standard {
default {
after 2 check failures
using standard schedule {
try { alert unix-mailer }
}
}
}
while in the following example, where multiple returngroup problem
stanzas were required to allow 1 failure overnight,
alertplan replicated {
on returngroup problem {
after 2 warnings {
alert unix-pager
using dayevening schedule
}
after 4 warnings {
alert unix-pager
using dayevening schedule
}
}
on returngroup problem {
after 2 warnings {
alert unix-pager
using overnight schedule
allow 1 failure
}
after 4 warnings {
alert unix-pager
using overnight schedule
allow 1 failure
}
}
default {
after 2 warnings {
alert unix-mailer
using extended schedule
}
}
}
would become the less redundant
# Putting this here makes it the default for all alertplans through
# the end of the file, or until redefined.
after 2 check failures
alertplan replicated {
on returngroup problem {
using extended schedule {
try 2 times {
allow 1 failed host during overnight schedule
alert unix-pager
}
try {
allow 1 failed host during overnight schedule
alert unix-pager
flag escalated # This is actually optional, since the second try
# is considered escalated by default
}
}
}
default {
using extended schedule {
try { alert unix-mailer }
}
}
}
- Also in schedule.cf, the
semantics of global notify on clear have changed. Instead of
applying to all alertplans, a global notify on clear is now a
default value, and applies to all alertplans defined after it
(but not before it), until redefined or until the end of the
file. To replicate the v0.8 behavior, make sure the notify on
clear statement is before the first alertplan definition, and add
the keyword default. For example:
alertplan foo {
...
}
# enable global notify on clear
notify using bar schedule on clear
becomes
# this applies to all subsequently defined alertplans, unless overridden
# or redefined
default notify using bar schedule on clear
alertplan foo {
...
}
- In check.cf, all check modules
must now be converted to named argument style. Unfortunately,
the only way to do this is to read the documentation for each
module and convert each entry appropriately. The reason this is
so hard is the exact reason named arguments have been introduced:
the old format was inconsistent, hard to read, and hard to use.
- Also in check.cf, the remote
module is no longer a special type of check module, but is now
one of a new class of modules called transport modules. In order for a
module to run remotely, a transport module must be defined for it
to use. A simple example to port a typical v0.8 entry (including
the conversion to named arguments, described above) would change
check mailq {
module remote(mailq,/var/spool/mqueue,0,1000:2000)
}
check swap {
module remote(swap,80,90)
}
to
transport remote {
module plaintext {}
}
check mailq {
module mailq {
queuedir /var/spool/queue # This could be omitted, it is the default
age 0 # This could also be omitted, same reason
warn 1000
prob 2000
}
via remote
}
check swap {
module swap {
warn 80
prob 90
}
via remote
}
- Also in check.cf, the
semantics of the global check timeout have changed. Instead of
applying to all checks that do not define their own timeout, the
global check timeout is now a default value, and applies to all
checks defined after it (but not before it), until redefined or
until the end of the file. Until it is defined, the initial
default timeout of 45 seconds will apply. To replicate the v0.8
behavior, make sure the timeout statement is before the first
check definition, and add the keyword default. For
example:
check foo {
...
}
# set global timeout
timeout 3 minutes
becomes
# this applies to all subsequently defined checks, unless overridden
# or redefined
default timeout 3 minutes
check foo {
...
}
- In dependency.cf, changes
are required for both Type I and Type II dependencies. Type I
dependencies simply need the keyword status appended.
For example:
depend all on ping
becomes
depend all on ping status
Type II dependencies need to be converted to named arguments, the
same as for check modules, described earlier. Note that Type II
dependencies do not currently support transport modules. This
will be addressed with a forthcoming revision of the dependency
mechanisms.
- Test the new configuration. One way to do this is by using the 0.9
sc to manually run some or all checks.
- Start the 0.9 scheduler.
$Date: 2007/03/29 12:17:29 $
$Revision: 0.12 $
|
keywords
|