Auteur: Daniel Tiefnig Date: À: exim users Sujet: Re: [exim] MIME parts and sa-learn
W B Hacker wrote: > - IMNSHO, trying to 'learn' spam/ham discrimination on a mixed-user
> server has two drawbacks:
>
> -- It uses a great deal of machine resources compared to a multitude
> of simpler and more repeatable/predictable means of filtering.
Which there are ...
> -- it can be confused by per-user differences, not only as to what
> one user consders spam and another does not
I didn't write I'd like to share bayes data between users ... Although I
will do. ;-)
But my user base is very small (i.e. only a few employees) and maintains
a close relationship. I wouldn't do that on a large-scale server with
lots of different users. (Like for ISP systems which I have
administrated a lot, but no longer do.) And finally, I'm still
experimenting a bit, and will disable bayes classifying if it doesn't
work out well.
> So Spam-Bayes and friends can easily get it 'wrong' if applied
> system-wide, yet may need even greater resources if they are to be
> applied per-recipient - not easily done in the requisite DATA phase
> anyway - at least not as to rejection vs mere demerit scoring.
Well, this is a problem of content based filtering in general.
> Conversely, Bayesian filtering seems to be at its best when applied
> in the end-user's MUA, where there it is always 'per-recipient'
> specific, AND has at least 'momentary' access to a generally greater
> chunk of processing power than a server might be able to spare at
> busy times.
Sure, but client-side filtering has other drawbacks, e.g. when using
different clients like laptops, workstations, mobile phones ...
Generally, it counterfeights the idea of IMAP-based mail access.
> Next is the general 'need' to reinvent the classification anyway. It
> might have a better payoff to utilize SA for all EXCEPT Bayesian /
> 'learning'/ AWL, and add, for example, DSPAM, wherein a broader
> global dataset of spam vs ham 'fingerprints' can be applied with less
> total effort than developing your own on the fly.
Thanks for the hint, I tried dspam a few years ago and liked it very
much, but for the moment I don't want to maintain another spam filter
besides spamassassin. At some time I might decide to use dspam to
replace spamassassin at all, but not to run two different scanners which
will produce contradicting results.
> As said, a 'contrarian' viewpoint, so YMMV.