Re: [Exim] Using lists as OR expressions to make an effectiv…

Top Page
Delete this message
Reply to this message
Author: Marc Perkel
Date:  
To: exim-users
Subject: Re: [Exim] Using lists as OR expressions to make an effective spam filter
This is a multi-part message in MIME format.
--
[ Picked text/plain from multipart/alternative ]
I'll try that. I would clean up my code if it works.

As to inefficient, I do know about not using .* on body tests. But -
using this technique - having files of lists of things which may have
regular expressions in them too - my giant or lists, has been a serious
breakthrough in spam filtering. Granted I'm only processing 25000
messages a week, but I'm getting 99% accuracy on spam detection and it's
because of these lists.

My best list include what sites spam links to, lists of dead email
addresses on my system that if spam includes a copy to dead address it's
caught. I have white lists of hosts. Technical phrases, deliberately
mispelled words. Some of these things can be done in Spam Assassin by
adding a massive amount of new rules. This make it simple and it is
still fast and it works very well.

My spam detection accuracy is around 99% and just a few months ago I
would have bet serious money that 99% wasn't possible. I think I can
eventually do better than that because with 25000 messages, 250 mistakes
is still too many.

Philip Hazel wrote:

>On Sun, 25 May 2003, Marc Perkel wrote:
>
>
>
>>I again request that EXIM build this in.
>>
>>
>
>1. What exactly do you want built in? Is it the ability to say "If x
>matches any regex in this file"? That ability is already there:
>
> if ${lookup{x}wildlsearch{/some/file}{yes}{no}} is yes then ...
>
>2. Doing a large number of regular expression matches is going to be
>inefficient; it will be worth your while to study Jeff Friedl's book in
>order to tune your regexes for maximum efficiency.
>
>3. Doing this from within the Exim filter language, which is interpreted
>in a simple-minded way (it was never designed for this) is going to be
>extremely inefficient. OK, if your load is low and your hosts can handle
>the workload, then why not? But if the load increases... I am still of
>the opinion that the best place (from an efficiency point of view) to do
>this kind of work is in an external program.
>
>
>

--