Webstats 3.0 Log File Format

Here is a typical log record from our web server:

cpe-067-131-138-032.ec.res.rr.com - - [25/Jun/2006:00:00:07 -0400] "(GET /2459.html HTTP/1.1)" 200 13833 "(ref http://www.goaskalice.columbia.edu/2114.html)" "(client Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1))" "vhost www.goaskalice.columbia.edu"

Here is the Apache directive that tells our web server what format to use:

LogFormat "%h %l %u %t \"(%r)\" %>s %b \"(ref %{Referer}i)\" \"(client %{User-agent}i)\"" \"vhost %v\""

Notice that we've extended the Common Log Format by adding the referer, user-agent (browser type), and vhost fields at the end of each record. We surround the request, referer, and user-agent fields with quotes and parentheses, instead of just quotes, to make these fields easier to parse. webstats and sawmill will also accept records without the parentheses, and the extract files created by webstats will not contain the parentheses, for compatibility with other programs. The virtual host field can be omitted if your log contains requests for only one domain.

The client field is ignored by webstats. We have other software that we use to analyze the client information (browser type).

Specific information about log file formats is available in the Apache documentation for log file formats.


* Academic Information Systems 212 854.1919 consultant@columbia.edu *
Research and Development Group
last modified on 07/07/06