Re: [exim] Timeout with spamassassin

Top Page
Delete this message
Reply to this message
Author: W B Hacker
Date:  
To: exim users
Subject: Re: [exim] Timeout with spamassassin
SeattleServer.com wrote:

> On Saturday 14 October 2006 09:14, W B Hacker wrote:
>
>>We don't let SA run SPAM_BAYES at all.
>>
>>Too little gain for the pain.
>
>
> Agreed. There are better bayesian engines than SA, but after a lot of months
> of pain, we decided to eliminate bayesian filtering altogether.
>
> Not only is in incredibly resource-intensive and demands you buy better disks
> and such for your servers to handle the load, but even the best bayesian
> engines can only offer 98-99% accuracy with proper training (in the real
> world, with lazy users, 80% was about tops), which isn't enough, and leaves
> the users having to dig through a pile of spam all the time anyways, which in
> my book, defeats the whole purpose of filtering it in the first place.
>
> Bayesian - neat theory, bad reality. Now, what am I supposed to do with all
> these 15kRPM SCSI disks? ;-)
>
> Cheers,


We are not *totally* down on it - Bayes works quite well when 'boxed' to the
traffic of a single user - as in an MUA. But there, one has the entire local
resources for just one account, so the resource load is not a big deal.

As Ferdinand Feghoot said at the cannibal BBQ:

"One man's meat is another man's poi, son"

Segregating as narrowly as that on a multi-user MTA would probably improve the
accuracy, but make the resource load even worse.

Note also that several folks here have mentioned ways of managing the learning
process & DB rebuild that exacerbate the resource load issue.

*However* - at the end of the day, what most turned us off is that every once in
a while, when the moon is full - or something far less predictable, really -
Bayes 'learns' a contrarian pattern, either blessing certain spam types or
condemning good traffic.

Catching this before it has done harm requires intense dedication to perusing
parts of traffic - individual message content, and in 'near-real time' at that -
than we cannot reasonably allocate.

If you think your Bayes never does this, just archive all the rejects and peruse
them once in a while.

The OSF will eventually surface.

Bill