Re: [Exim] Filtering by subject

Top Page
Delete this message
Reply to this message
Author: Sherwood Botsford
Date:  
To: I.S. Manager
CC: exim-users
Subject: Re: [Exim] Filtering by subject
On Tue, 11 Jul 2000, I.S. Manager wrote:

= As part of my war on spam, I'm starting to filter by subject. Trouble is,
= I'm doing it the simplistic way (I'm pretty new to exim), like this:
= 
= if $header_subject begins "e-Qualize your business" or
=    $header_subject
= begins "Avoid Speeding Tickets" or
=    $header_subject begins "Proven
= money-maker"
= then
=         fail text "SPAM is not acceptable at this site.
= And you're a scumball too."
=         finish
= endif
= 
= I can see this getting out of hand pretty fast. I've managed to figure out
= how to reject by host:
= 
= 
= host_reject_recipients = "+warn_unknown:\
= 
= partial1-dbm;/etc/exim/host_reject_recipients"
= 
= (aside: strangely, this works on one of my mail servers, but not on another
= with an identical version of exim)
= 
= I'm wondering if there's a way to do the subject filtering in the same (or
= similar) manner, and if it will be better than the monolithic approach I'm
= taking right now (intuition says yes, but....).
= 


The problem with this is that the filter list can get out of hand.
Since the filters are applied linearly, as soon as you've collected
a few thousand spams messages, your performance will go down the
tubes. (Ok, if it's only filtering a few dozen messages per day
you don't care...)

The second problem is that you need to either manually add each
rule, or you need to create a pseudo user that you forward
the spam do, then that account has a script that updates the
spam filter.

1. Use the rbl, rss, and dul domains. I recently added rss and
dul to my primary server here, and found that the amount of spam
has dropped markedly.

2. I attempted to do some subject line filtering using procmail.

Here's my procmailrc filter along with some comments:

LOGFILE=/var/spool/exim/log/procmail
COMSAT=no
UMASK=007
LOGABSTRACT=ALL
SPAMDIR=/u/vega1/misclogins/spam/mail

:0 H:
* bigfoot.com
/bigfoot

[ Bigfoot was once a large problem with spam. RBL and RSS now
takes care of these.]

:0 H:
* friend@???
$SPAMDIR/friend

[ One spam package would use friend@??? as the from address.
Having sender_verify to check the validity of the putative host
reduces this problem. Now they have to forge an existing header. ]

#:0 H:
#* ^Received:.*hotmail\.com
#$SPAMDIR/hotmail

[ Hotmail used to be a major problem. They've cleaned up their
act. On the rare occasions I get hotmail spam, bouncing it
to abuse@??? gets speedy results. ]


[ The header filters are first, since they are fast.
Now come the content filters -- a lot more problematic. ]

FILTER=bulk

:0 HB:
* -500^0
#* +600^1 ^Subject:[A-Z1-9'!:=;. #$][A-Z1-9'!:=;. #$][A-Z1-9'!:=;. #$][A-Z1-9'!:=;. #$][A-Z1-9'!=:;. #$][A-Z1-9'!=:;. #$][A-Z1-9'!=:;. #$][A-Z1-9'!=:;. #$][A-Z1-9'!=:;. #$]
* +600^1 Extractor Pro Bulk E- Mail Software
* +600^1 EMAIL MARKETING WORKS
* +300^1 bulk email
* +200^1 ^[A-Z1-9'!:;. #$]{10,80}$
$SPAMDIR/bulk

[ We were getting a bunch of these. This filter looks for subject lines
containing a sequence of 10 uppercase/puncutation caracters in a row.
(the version of procmail I was using then didn't support full regex.) ]

FILTER=porn
:0 HB:
* -600^0
* ^Subject:*XXX
* 100^.6 sex
* 100^.6 sexy
* 150^1 live sex
* 400^1 Adults Only
* 100^.8 XXX
* 100^1 large breasts
* 100^1 boobs
* 100^1 nude
* 100^1 big boobs
* 50^.8 cum
* 100^1 erotic
* 100^1 hooters
* 100^1 big tits
* 500^0 18-21 years
* 200^1 sexually explicit
$SPAMDIR/sex

[This rule exhibits one of procmail's strengths. You could have
a bunch of weighting rules that added points to a score. If
the score was positive, the mail was rejected. The problem was
the 3 character codes. They would occur at random in uuencoded
or MIME messages. It also would often stop the delivery of
dirty jokes, which was not my intention. ]


FILTER="Shout \$\$\$ "
:0 B:
* -800^0
* +100^.8 !!
* +150^.8 !!!
* +100^.9 ^[^-#=a-z][^-#=a-z][^-#=a-z][^-#=a-z][^-#=a-z][^-#=a-z][^#=-a-z][^#=-a-z][^-#=a-z]+$
* +300^1 [$][$][$]
$SPAMDIR/sales

[Anything too many lines that had no lowercase at all, or strings of
exclamation marks or $ signs was a sales thing. ]

Anyway, after playing with this on and off for several months, I found
that it was too much trouble to set the rules tight enough to do
any good, but loose enough that I didn't trap legit mail. After
mistakenly blocking a bunch of Word docuement files, I gave it up,
.

Sherwood Botsford     | sherwood@???
Sorcerers Apprentice    | Math Dept, U of A, Edmonton, AB T6G 2G1
System Administrator    | Tel: 780 492 5728 
Trouble shooter            | Fax: 780 492 6826