Re: [exim] Reducing Spam Assassin Load

Góra strony
Delete this message
Reply to this message
Autor: Marc Perkel
Data:  
Dla: exim-users
Temat: Re: [exim] Reducing Spam Assassin Load
OK - Here's my crude attempt at implementing something. The idea being
that often a HAM ATTACK can slow down spam filtering. We all use tricks
to reduce SPAM so SA doesn't have to process it. This reduces the load
in processing HAM by blessing senders for up to 5 minutes.

What I'm doing here is creating a temporary white list that lasts 5
minutes. The idea is to reduce load when you get a lot of incoming ham
from the same source. The solution isn't perfect - but it's a start.

It is based on creating a text file that has a key - the sender's
address + the first 2 digits of the IP address. There are appended to a
text file whenever a ham is sent that spam assassin will autolearn. By
default that means score of -2.

Incoming email is checked against the text file and if there's a match
then SA is bypassed and the ham is passed on.

Every 5 minutes a cron job empties the file limiting the duration of the
white listing and keeping the list from growing very big so as not to
slow things down.

In the system filter:

if "$h_X-Spam-Status:" contains "autolearn=ham"
then
logfile /var/spool/spam/ham-from.txt
logwrite "$h_X-Fingerprint:"
endif

In the ACL:

warn    message = X-Fingerprint: $sender_address-\
${extract{1}{.}{$sender_host_address}}.${extract{2}{.}{$sender_host_address}}
    !senders = : postmaster@*
    !condition = ${if def:h_X-Fingerprint:}


warn    message = X-Spam-Check: No
    log_message = HAM - Spam Filter Bypass - $sender_address
    !condition = ${if def:h_X-Spam-Check:}
    !senders = : postmaster@*
    condition = ${lookup{$sender_address-\
${extract{1}{.}{$sender_host_address}}.${extract{2}{.}{$sender_host_address}}}\
lsearch{/var/spool/spam/ham-from.txt}{yes}{no}}


It's simple - seems to work - and at least demonstrates the concept of
reducing load by bypassing ham.

The idea here for those not following the thread is that a ham mailing
list is sending 200 people a newsletter. Why check that with SA 200
times. In this case the first one is found to be ham. Then 50 more come
in that bypass SA. At the end of 5 minutes the file is cleared so the
next on is checked and it gets on the list again. In 20 minutes all the
newsletters are received but you only had to check 4 of them for ham.

The idea here is that this cuts down on ham spikes that can slow things
down and the busy 9-10am period where everyone get to work and starts
answering email. This isn't a perfect solution - but it's simple. We
need something better.