The MD5 fingerprint is an interesting idea. Probably just concatinate
specific fields like the From: header and the host it was received from.
If the message is "hammy" enough you append it to a text file called
"blessed.txt". New messages are first checked against the blessed file
and if blessed they bypass spam assassin.
The blessed file is deleted every 30 minutes by a cron job which limits
the time of the blessing and keeps the list size down so as to keep it fast.
Not a perfect solution - but I think it could work.
Lanny Jason Godsey wrote:
>Maybe you could generate a fingerprint based on the first X lines of
>the email and match?
>
>Mail comes in, first 20 lines generate 20 md5 fingerprints. You
>process and store SA score.
>
>Next message comes in, if 80% of the fingerprints match, bypass SA.
>
>This would require integration into some sort of database. You simply
>do a select equal join based on queue id possibly and see if it returns
>16 or more rows.
>
>This of course is prone to all kinds of problems, I like to have every
>user train their own filters. This is why droping 85% of spam prior to
>stastical filtering is a bad idea IMHO.
>
>I use DSPAM and SA in tandem, as far as I know I'm the only one using
>it in the fassion I have setup. After a while, SA is simply not
>used.
>
>--- Marc Perkel <marc@???> wrote:
>
>
>
>>Peter Bowyer wrote:
>>
>>
>>
>>>On 01/10/05, Marc Perkel <marc@???> wrote:
>>>
>>>
>>>
>>>
>>>>One of the things that is creating SA load is processing good
>>>>
>>>>
>>email. I'm
>>
>>
>>>>trying to figure out a way to bless stuff that I know is ham so I
>>>>
>>>>
>>can
>>
>>
>>>>bypass spam assassin. And it has to somehow just learn it
>>>>
>>>>
>>automatically.
>>
>>
>>>>
>>>>
>>>>
>>>>
>>>But that's what SA does - learns what's spam and what's ham by
>>>Bayesian analysis. I'd have thought any attempt to do this up front
>>>would end up duplicating what SA does?
>>>
>>>You could experiment with a reputation system which applies positive
>>>scores whan an IP sends you ham and negative scores when it sends
>>>
>>>
>>spam
>>
>>
>>>or fails an up-front test (DNSBL, HELO checks and so on). And set a
>>>threshold for whitelisting around the SA check. But that would
>>>
>>>
>>prevent
>>
>>
>>>SA learning from known ham - which is an important part of the
>>>Bayesian process.
>>>
>>>
>>>
>>>
>>>
>>I know SA does that but SA is very processor and resource hungry. One
>>of
>>the tricks I use to process the volume of email that I do is to avoid
>>
>>using SA whenever I can. I have eliminated about 85% of spam before
>>it
>>goes to SA and that has increased my capacity to process mail
>>greatly.
>>Now the problem is that all ham has to be processed through SA. Often
>>
>>I'm getting a lot of ham from the same users or mailing lists which
>>is
>>the same good message over and over. And it all passes - but it slows
>>
>>things down.
>>
>>
>>--
>>## List details at http://www.exim.org/mailman/listinfo/exim-users
>>## Exim details at http://www.exim.org/
>>## Please use the Wiki with this list - http://www.exim.org/eximwiki/
>>
>>
>>
>
>
>
>
--
Marc Perkel - marc@???
Spam Filter: http://www.junkemailfilter.com
My Blog: http://marc.perkel.com