OK, I consolidated my various logging and reporting scripts and packaged
them so they can be used on any Unix-based computer that is not running
a Web server (or any other process that listens on TCP port 80).  You
have to be root to run it, since 80 is a privileged port.

The script is here:

  ftp://kermit.columbia.edu/kermit/cu/port80log

To run it, you'll need C-Kermit 8.0 Beta.03:

  http://www.columbia.edu/kermit/ck80.html

You'll find prebuilt binaries at the end of the page for every conceivable
Unix platform (if you have a platform that's not covered, please let me
know).  Copy the appropriate binary to your computer as:

  /usr/local/bin/wermit

or whatever ("wermit" instead of "kermit" to indicate it's a test version).

The script takes in and logs port 80 accesses, starting over again every hour
with a new log.  Every hour or so (it blocks waiting for incoming connections
so the timing is not exact) it summarizes the log, mails the summary to the
desired address, uploads the log with FTP, and then starts a new log.

The tricky part is FTP, because of course this requires a destination host,
a directory, and a password.  You'll need to edit the script to specify
the FTP host, directory, and user (it prompts you for the password).  If you
say "no" it skips the FTP part.

By default, the summary report is mailed to security@columbia.edu.  You can
change that too if you want.  This stuff is all defined near the top,
marked with (*).

After making the needed edits, you can run the script by starting Kermit
(as root) and telling it to "take port80log".  Alternatively, you can edit
the script to have a shebang (kerbang) line specifying the location of the
Kermit executable, but since there is no standard location you'll have to
go with the conventions used on your platform.  In this case, you also have
to give the script execute permission.  (You still have to be root to run it.)

The first cut of the script does not sort the emailed report in the requested
way, but that shouldn't be a big distraction since the report won't be very
long.  Instead, it's sorted in descending order hits per site.

The script can be run on as many computers as desired simultaneously, but only
one copy per computer.  Conceivably if you wanted to be sure never to miss any
requests, you could run it under inetd but I haven't tried that.  If you do,
be sure to redirect stdout and stderr to a file or /dev/null.

Log files contain the raw data: timestamp, IP name (if known), IP address,
and (in doublequotes) the incoming request, which identifies the attack.
Logfile names are <host>_<date>_<time>.log, where <host> is the name of
computer that's running the script, <date> is numeric yyyymmdd, and <time>
is secs since midnite.  The report file is created locally before mailing;
it has the same name as the log, except .txt instead of .log.  New files
are created each hour.

When you first start running the script, it probably won't attract too many
hits.  But the word will spread quickly, and soon the hits will be pouring
in.