CU HOMECUIT Internal
CUIT Home
CUIT all of Columbia

How CUIT Addresses Growing Spam Problem

10 Aug 06

No one likes getting spam in their morning email inbox. Why can't someone do something about this, you might ask. Here in CUIT we are doing something. You may not realize that each day, CUIT's spam and virus filters refuse about 60 percent of the University's total incoming email, up from 45 percent from last year. Just over the past month the number of rejected messages ran between 900,000 and over a million a day. And of course some spam manages to slip through the filters (as soon as we learn the spammers' tricks, they get more creative) so the numbers above do not reflect how much spam we receive, but only how much spam we stop from reaching your mailbox.

There is something that you can do too. Help us by reporting your spam messages. If you use CubMail, you can use the "Report as Spam" feature. In other email programs, forward the spam to spam@columbia.edu. Please include the fuller headers. See http://www.columbia.edu/acis/faq/full-headers.html if you need help with this step. We use the reports to see what gets past our filters. It is very useful to see multiple reports of similar spam so that we can see what parts of the spammers' mailing vary and what parts stay the same.

Of course, spam is not marked as spam. The only way to detect it is to look for common features of spam that are not common in legitimate mail. This is what all anti-spam filters do. The concept is that future spam can be identified by its similarity to past spam. Spammers try to defeat this by varying their techniques and content.

To address the spam problem, CUIT uses MIMEDefang, a software tool for writing filters. MIMEDefang analyzes the message, but it is up to the email system administrator to write rules for the software to act on. We check for signs of fakery in the incoming message, and we check the spam mail server in three blocklist services that list servers known to be involved with spam. Selected messages are checked by the SpamAssassin software. Some of the rules we use are based on spam reported at Columbia. The rules change frequently, depending on what spammers are doing currently. Some rules are sufficient to reject a message. Other rules detect features that are common in spam but not limited to spam, so they don't cause rejection, but they score as possible spam. As the score rises it becomes more and more likely that a message really is spam. We reject email with a score of 8 or more. For any message we do not accept, the sender is notified that the message has been rejected.

One of the most commonly reported and challenging-to-catch spam e-mails is the "Advance Fee Letter," usually from someone supposedly trying to bring an enormous sum of money into the United States. These messages are extremely difficult to filter out because they are all different from each other. For example, to get any of them at all, spam software has to check for 46 variant groups of phrases such as "you may be surprised to receive this", "urgent and confidential", "honest cooperation", "wife of the late". The software must then check how many combinations match--which is possibly the most complex spam filter rule. And still the rules catch only about half of these messages. If you want to know more, there are several web sites devoted just to this one kind of spam, such as http://www.nigerianspam.com/.

Next time you see another of those unwanted emails in your mailbox, remember that it is only a tiny fraction of the millions that CUIT prevents you from having to delete. And, if possible, please take the time to send us your spam. It all helps.

Joseph Brennan
Unix Systems and Email Group