Sawmill Daemon Processes Web Server Logs

Sawmill is a daemon that converts IP addresses to host names in the web server logs and cleans up badly formatted records. It replaces the logres program that was written in 1994 to process the web server logs. Interrupted jobs are restarted automatically and a CGI script can be used to examine the current status of the web logs.

Sawmill currently runs on zivijo (3 copies) and neb (3 copies). It is started by /etc/init.d/sawmill. It can run on any host which has /www/data/httpd/log mounted. Sawmill messages go to /var/adm/messages on those hosts.

To understand how the system works and what can go wrong it is necessary to understand each step of the process.

dayrot and sixhrot

Web server logs are rotated by dayrot or sixhrot. This cron job rotates the log file and then sends a HUP to the web server (or restarts the web server) and then compresses the log file. On the regular (non-secure) web servers the log has this naming convention:

/var/log/cnet/httpd_access_log.1.gz

On the secure web servers the log has this naming convention:

/var/log/cnet/httpsd_access_log.1.gz

copy-weblogs.pl

dayrot or sixhrot calls copy-weblogs.pl to copy the log to the sawmill directory using this naming convention for non-secure logs:

/www/data/httpd/log/httpd_access/2003051916_httpd_yasu.in.gz

The secure web logs are copied to the same directory but with a different file name (httpsd instead of httpd):

/www/data/httpd/log/httpd_access/2003051904_httpsd_yasu.in.gz

copy-weblogs.pl waits 10 minutes (to give cricketbat time to read the log) and then the log is removed from the web server's local disk.

sawmill

The sawmill daemon looks at the files in the sawmill directory periodically and selects the oldest queued file and then processes it. If the job is interrupted for any reason (zivijo reboot, filer maintenance, network outage) sawmill will requeue the job automatically. All the records that were written by the interrupted job to the output file will be kept, and sawmill processing continues with the remaining records. So if a job is killed after processing 90% of the log file, the requeued job will complete in just a few minutes instead of 3 hours or more. Therefore a zivijo reboot has relatively little impact under the new sawmill system. We can run sawmill on multiple hosts simultaneously and we can move sawmill to another host if we anticipate a zivijo downtime. You can also increase the number of sawmill daemons on a particular host. To start an additional sawmill type this command as radish:

radish@zivijo: /opt/ACISweb/bin/getstats/sawmill &

How Can The System Fail?

If the web server is down at log rotation time then the next log will be twice as large. For example if yasu is down at 10am there will be no 10am file for yasu but the 4pm file will be about twice the normal size. This problem still exists in the new sawmill system. One possible solution would be to create an empty file representing the 10am file that is missing so that all the log files are present. This is a minor problem since we have all the data. At the end of the week the weekly-getstats job concatenates the daily logs into weekly logs, and it will print an error message that can be ignored. Carol Kassel will ask about the missing file but you can tell her to ignore the problem since all the data is in the subsequent file.

If the web server host is up but the filer is down at log rotation time then the log will be rotated but not copied. In that case (or some other scenario prevents the log from being copied) that log will sit on the web server until the next log rotation time which is 6 hours later (httpd) or 24 hours later (httpsd). At the next log rotation time the old log from 10am will be copied to the sawmill directory using the correct naming convention. copy-weblogs.pl has been improved so it knows the correct file name for old logs.

If there is a problem with dayrot or sixhrot or copy-weblogs.pl then the log file won't appear in the sawmill directory. In this case the CGI will show an empty box for that date/time/host. If the file was copied to the sawmill directory but sawmill is not running then the CGI will display status "queued" for that date/time/host. In this case you need to stop/start sawmill on both hosts (zivijo and neb) as follows.

Restart Sawmill

Use the script:

root@zivijo: /etc/init.d/sawmill stop
root@zivijo: /etc/init.d/sawmill start

Sawmill Naming Convention

  2003051916_httpd_yasu.in.gz       input file waiting to be processed
  2003051916_httpd_yasu.status      status file
  2003051916_httpd_yasu.tmp.gz      output file (not yet complete)
  2003051916_httpd_yasu.log.gz      output file (complete)

Sawmill Algorithm For Choosing The Next Queued File

scan the directory to find the oldest queued file and begin processing

step 1: make three lists: input files, output files, and status files
step 2: delete any stale status files: for which no output file exists
        and status is older than STALE_STATUS secs
step 3: go down the sorted list of input files and pick the first one
        without a corresponding status file (oldest queued file)
step 4: process the selected file

Last modified Jun 23 2003