Re: [exim] MIME parts and sa-learn

Góra strony
Delete this message
Reply to this message
Autor: Daniel Tiefnig
Data:  
Dla: exim users
Temat: Re: [exim] MIME parts and sa-learn
W B Hacker wrote:
> - IMNSHO, trying to 'learn' spam/ham discrimination on a mixed-user
> server has two drawbacks:
>
> -- It uses a great deal of machine resources compared to a multitude
> of simpler and more repeatable/predictable means of filtering.


Which there are ...

> -- it can be confused by per-user differences, not only as to what
> one user consders spam and another does not


I didn't write I'd like to share bayes data between users ... Although I
will do. ;-)
But my user base is very small (i.e. only a few employees) and maintains
a close relationship. I wouldn't do that on a large-scale server with
lots of different users. (Like for ISP systems which I have
administrated a lot, but no longer do.) And finally, I'm still
experimenting a bit, and will disable bayes classifying if it doesn't
work out well.

> So Spam-Bayes and friends can easily get it 'wrong' if applied
> system-wide, yet may need even greater resources if they are to be
> applied per-recipient - not easily done in the requisite DATA phase
> anyway - at least not as to rejection vs mere demerit scoring.


Well, this is a problem of content based filtering in general.

> Conversely, Bayesian filtering seems to be at its best when applied
> in the end-user's MUA, where there it is always 'per-recipient'
> specific, AND has at least 'momentary' access to a generally greater
> chunk of processing power than a server might be able to spare at
> busy times.


Sure, but client-side filtering has other drawbacks, e.g. when using
different clients like laptops, workstations, mobile phones ...
Generally, it counterfeights the idea of IMAP-based mail access.

> Next is the general 'need' to reinvent the classification anyway. It
> might have a better payoff to utilize SA for all EXCEPT Bayesian /
> 'learning'/ AWL, and add, for example, DSPAM, wherein a broader
> global dataset of spam vs ham 'fingerprints' can be applied with less
> total effort than developing your own on the fly.


Thanks for the hint, I tried dspam a few years ago and liked it very
much, but for the moment I don't want to maintain another spam filter
besides spamassassin. At some time I might decide to use dspam to
replace spamassassin at all, but not to run two different scanners which
will produce contradicting results.

> As said, a 'contrarian' viewpoint, so YMMV.


Thank you, your thoughts are very appreciated.

br,
daniel