Re: [Exim] Spam Filtering Language

Author: Alan J. Flavell
Date:
To: Marc Perkel
CC: Exim users list
Subject: Re: [Exim] Spam Filtering Language

On Sun, 30 Sep 2001, Marc Perkel wrote:

> What I want to do is to conatenate string StrA = StrB + StrC.
> Specifically, I want to add the subject line to the first and last 500
> characters of the body and test the resulting string for phrases to
> filter.

I see that Philip has given you the answer to that...

Are you sure that's such a good idea? My impression is that a test
which can work well on subject headers is much more likely to produce
false positives if applied to message bodies; conversely a test for
things that regularly turn up in message bodies is unlikely to match a
subject line.

Furthermore there are key phrases that appear near the beginnings of
spams (Dear Friend,), and other key phrases that appear near the end
("We strongly oppose the use of spam" is a good one that's turning up
recently). I suggest you're better off testing the items separately,
against different match lists.

BTW, I've noticed that some spammers tend to mess about with their
texts, scattering odd spaces here and there within words, in different
places in each item, evidently this must be to try to avoid them being
string-matched. (They're also erratic with their spelling, but that
may be because they're semi-literate, rather than being any deliberate
plan on their part.)

> Or other suggestions as to what works best?

Again you need to define "best". That's why there are so many options
to turn on or off!

When I got dissatisfied at the amount of spam that was still getting
through, I raided http://colondot.net/mbm/mailfilter.shtml for some
good ideas, and then worked outwards from there.

Hmmm, if I tried to tell you comprehensively what the rules were, our
mailer would rate the item as spam ;-) Anyhow, there's no reason that
what works for us would be ideal for you. I suggest starting at that
colondot page for ideas, as we did, and work outwards on the basis of
what you're getting.

I suggest that once you've got a scoring recipe drafted, set a high
threshold for rejection and a low one for freezing, something like
this,

if  ($n9 is above 200)
      then fail "The mailer rated this as spam"
endif

if  ($n9 is above 80)
     then freeze "Spam score is $n9"
endif

and adjust the thresholds towards each other (and adjust the recipes
on the basis of experience) as you go along.

You might want to add this line to the recipes too

          headers add "X-Spam-score: $n9"

N.B

After updating the system filter, remember to test it with exim -bF

A defective system filter will paralyse the system!! (check the
paniclog too).

CAVEAT: YMMV, and this _can_ result in loss of genuine mail, so
proceed with care.

Another thing you may want to think about is de-rating the score for
mails that have been originated locally, otherwise any attempt by your
users to discuss spam recipes, or to report spam to ISPs etc., could
be blocked too!

cheers

This message is part of the following thread:
	the complete thread tree sorted by date
	Philip Hazel at
	Marc Perkel at