survivor: Confusing Error Messages Explained | |||||||||||
Although error messages are generally written to be transparent and understandable in context, some, by virtue of the underlying code design, may seem a little confusing.
ErrorWhen parsing the configuration files,*lex* Unexpected token at line 79 (state 7): "#commented" Sample1024 *lex* Unexpected token at line 79 (state 7): "#commented" 1024 *lex* Unexpected token at line 79 (state 7): "out" 1024 *lex* Unexpected token at line 79 (state 7): "block" 1024 || 3 errors encountered while parsing /home/symon/sample/config/check.cf sc: WARNING: Configuration parse failed ExplanationThe configuration parser, due to the complexities of the underlying parsing mechanism, requires all comments to be terminated with a newline. A comment at the end of the file might not have a newline after it, causing the parser to fail.Simply add a newline after the comment to fix this problem.
ErrorState::lock_type_state failed to open .../lock Sample% sc -i instance clstate oncall [oncall] sc: WARNING: State::lock_calllist_state failed to open /var/instance/state/calllist/oncall/lock sc: WARNING: State::lock_calllist_state failed to open /var/instance/state/calllist/oncall/lock sc: WARNING: State::lock_calllist_state failed to open /var/instance/state/calllist/oncall/lock -> somebody@site.org is now on call ExplanationWhen configuration files are updated, it is the responsibility of the scheduler to update the state directories in accordance with the new configuration. If other utilities, such as the command interface, are run with the new configuration before the scheduler is told of the update, they may try to access state files or directories that do not yet exist.In the case of the sample above, the oncall call list was added to calllist.cf. Before the scheduler was told of the update, the command interface was run to get the state of the new call list. Since the scheduler has not made the call list state directories consistent, the directory /var/instance/state/calllist/oncall does not yet exist, and so its lock file cannot be opened.
ErrorLots and lots of spewage in /var/log/survivor-instance.log.Sampless: WARNING: state_consistency failed to create service state directory ss: WARNING: Unable to create host history directory ss: WARNING: CheckState::lastcheck failed to open ss: WARNING: CheckState::write_results failed to reset permissions on ExplanationMost likely, the user running the scheduler is not INSTUSER or is not a member of INSTGROUP, and so cannot properly access and update the files. See the building instructions for more details.ErrorOn certain Linux platforms, SIGHUP sometimes causes the scheduler to exit, complaining the scheduler is restarting too frequently.Sampless: WARNING: Keepalive process is restarting the scheduler too frequently. ss: WARNING: There may be a configuration file error or some other problem. ss: WARNING: Keepalive process is exiting as a precaution. ExplanationThis is a bug in the system's implementation of sigwait(3). Instead of returning the proper signal sent to the appropriate thread (in this case, SIGHUP, or 1), the non-existant signal 0 is sent. Since the scheduler cannot tell what signal 0 really means, the scheduler exits. When run under keepalive (ss -k), the keepalive daemon will restart the scheduler, effectively simulating a SIGHUP. However, if this is done too often (more than once per minute), the keepalive exits, assuming there is a bigger problem.Update: This appears to have been fixed. libc version 2.3.2 is known to work properly. ErrorFailed to queue check 'service' (check may already be scheduled) Sampless: WARNING: Failed to queue check 'syslogd' (check may already be scheduled) ExplanationEvery minute, the check scheduler attempts to schedule any checks that are due to be executed. If a particular check takes more than a minute to execute, the check scheduler will attempt to schedule that check again.To prevent the same process from being queued multiple times (and thus causing backlogs or concurrency problems), the scheduler will produce the above error message if a previously scheduled check with the same name has not yet completed. ErrorCheckState::lastcheck failed to open .../lastcheck Sampless: WARNING: CheckState::lastcheck failed to open /var/instance/state/host/hostname/service/lastcheck (No such file or directory) ExplanationWhen a new check or host is added, there is no way to guarantee that the check scheduler will notice before the alert scheduler. When both hostname and service are known to be valid, this warning may still be generated. This is due to the alert scheduler noticing the new check or host before the check scheduler, and so there is no lastcheck state to be examined.This warning should not continue after a minute or two, by which time the new check or host will have been queued by the check scheduler. Note that setting an alert shift time in schedule.cf will not eliminate this message, as the alert shift time only adjusts the period during which alerts are generated, and not the relative times compared to the check scheduler within that period. Update: Due to changes within the scheduler, this error should no longer occur.
Error(blank output) orHTTP/1.1 500 Internal Server Error SampleHTTP/1.1 500 Internal Server Error ExplanationSome dynamic libraries, including the OpenSSL libraries such as libssl or libcrypto, are not in the runtime LD_LIBRARY_PATH of the web interface.Ordinarily, the paths to these libraries are encoded into the executable when the package is built. If, however, these libraries are moved or removed, when the program is executed the runtime linker will fail to resolve symbol definitions and the program will not run. It may be possible to replicate this failure by manually running the program: % unsetenv LD_LIBRARY_PATH % ./sw % ld.so.1: ./sw: fatal: libssl.so.0.9.6: open failed: No such file or directory Killed To fix this problem, replace the libraries or rebuild the package with the new locations included in Makefile.inc.
$Date: 2006/11/20 02:54:21 $ $Revision: 0.13 $ |
keywords |