Re: [Exim] Spam-scoring?

Top Page
Delete this message
Reply to this message
Author: Mark Morley
Date:  
To: Alan J. Flavell
CC: Exim users list
Subject: Re: [Exim] Spam-scoring?
> Consequently I was thinking it would be useful to accumulate some of
> these indicators by a series of tests, maybe even both positive and
> negative indicators of spam-likelihood (similar to what one would do
> in a usenet scorefile, for example), and was a bit surprised to see no
> mention in the FAQ.


Funny you should mention it...

I've been working on a scoring filter for some time now, and just last week
I opened it up to our customers (we're an ISP) to experiment with. We don't
delete messages ourselves, we leave that up to individual customers to do
(although we do provide a really easy filtering system that we created
called PEP: http://www.islandnet.com/pep.html).

Anyway, what I do is score each and every message that passes through our
server (about a million a week) and add the X-SPAM-Score: header. The idea
is that messages with a score of 100 or more are practically guaranteed to
be spam. Some customers may choose to ignore this header entirely, others
may want to be more aggressive and filter messages with a lower score, and
others may want to play it safe and filter messages with a score of 150 or
even higher.

Of the tens of thousands of messages that have scored 100 or higher, I've
had less than a dozen false positives (and most of those were legit messages
that came from massively RBL'd servers).

Certain rules are worth an immediate 100 points (like having friend@public
in any header, or finding certain timezone values in received headers, etc.)

Other rules are worth as little as 15 points (being on the xbl.selwerd.cx
RBL list, or having certain punctuation marks in the subject, etc.)

I also maintain a DBM file of common spam domains (worth 80 points) and
free email domains (worth 40 points) and common spammer usernames (worth
50 points).

Another thing I've done is to create dozens of spam traps. These are
email addresses that I embed within hidden comments on web pages, or
post to newsgroups, etc. Within 24 hours they are usually harvested
and abused by spammers. Any time a message arrives at one of these
spam traps, I record the From: and Return-Path: values into a DBM
file. The filter scores 100 points for any subsequent message that
is received from those addresses. This one alone catches thousands
of spams a day...

Customers report any false positives back to me (none have been reported
yet, just the few I found myself) and they also send me spam that scores
less than 100 so I can tweak the filters.

If anyone is interested I could make it (and the associated lookup tables)
available via FTP on a regular basis...

Mark