Author: W B Hacker Date: To: exim users Subject: Re: [exim] Timeout with spamassassin
SeattleServer.com wrote:
> On Saturday 14 October 2006 09:14, W B Hacker wrote:
>
>>We don't let SA run SPAM_BAYES at all.
>>
>>Too little gain for the pain.
>
>
> Agreed. There are better bayesian engines than SA, but after a lot of months
> of pain, we decided to eliminate bayesian filtering altogether.
>
> Not only is in incredibly resource-intensive and demands you buy better disks
> and such for your servers to handle the load, but even the best bayesian
> engines can only offer 98-99% accuracy with proper training (in the real
> world, with lazy users, 80% was about tops), which isn't enough, and leaves
> the users having to dig through a pile of spam all the time anyways, which in
> my book, defeats the whole purpose of filtering it in the first place.
>
> Bayesian - neat theory, bad reality. Now, what am I supposed to do with all
> these 15kRPM SCSI disks? ;-)
>
> Cheers,
We are not *totally* down on it - Bayes works quite well when 'boxed' to the
traffic of a single user - as in an MUA. But there, one has the entire local
resources for just one account, so the resource load is not a big deal.
As Ferdinand Feghoot said at the cannibal BBQ:
"One man's meat is another man's poi, son"
Segregating as narrowly as that on a multi-user MTA would probably improve the
accuracy, but make the resource load even worse.
Note also that several folks here have mentioned ways of managing the learning
process & DB rebuild that exacerbate the resource load issue.
*However* - at the end of the day, what most turned us off is that every once in
a while, when the moon is full - or something far less predictable, really -
Bayes 'learns' a contrarian pattern, either blessing certain spam types or
condemning good traffic.
Catching this before it has done harm requires intense dedication to perusing
parts of traffic - individual message content, and in 'near-real time' at that -
than we cannot reasonably allocate.
If you think your Bayes never does this, just archive all the rejects and peruse
them once in a while.