Maybe you could generate a fingerprint based on the first X lines of
the email and match?
Mail comes in, first 20 lines generate 20 md5 fingerprints. You
process and store SA score.
Next message comes in, if 80% of the fingerprints match, bypass SA.
This would require integration into some sort of database. You simply
do a select equal join based on queue id possibly and see if it returns
16 or more rows.
This of course is prone to all kinds of problems, I like to have every
user train their own filters. This is why droping 85% of spam prior to
stastical filtering is a bad idea IMHO.
I use DSPAM and SA in tandem, as far as I know I'm the only one using
it in the fassion I have setup. After a while, SA is simply not
used.
--- Marc Perkel <marc@???> wrote:
>
>
> Peter Bowyer wrote:
>
> >On 01/10/05, Marc Perkel <marc@???> wrote:
> >
> >
> >>One of the things that is creating SA load is processing good
> email. I'm
> >>trying to figure out a way to bless stuff that I know is ham so I
> can
> >>bypass spam assassin. And it has to somehow just learn it
> automatically.
> >>
> >>
> >
> >But that's what SA does - learns what's spam and what's ham by
> >Bayesian analysis. I'd have thought any attempt to do this up front
> >would end up duplicating what SA does?
> >
> >You could experiment with a reputation system which applies positive
> >scores whan an IP sends you ham and negative scores when it sends
> spam
> >or fails an up-front test (DNSBL, HELO checks and so on). And set a
> >threshold for whitelisting around the SA check. But that would
> prevent
> >SA learning from known ham - which is an important part of the
> >Bayesian process.
> >
> >
> >
> I know SA does that but SA is very processor and resource hungry. One
> of
> the tricks I use to process the volume of email that I do is to avoid
>
> using SA whenever I can. I have eliminated about 85% of spam before
> it
> goes to SA and that has increased my capacity to process mail
> greatly.
> Now the problem is that all ham has to be processed through SA. Often
>
> I'm getting a lot of ham from the same users or mailing lists which
> is
> the same good message over and over. And it all passes - but it slows
>
> things down.
>
>
> --
> ## List details at http://www.exim.org/mailman/listinfo/exim-users
> ## Exim details at http://www.exim.org/
> ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
>