On Thu, 2006-01-05 at 13:48 +0000, Tony Finch wrote:
> MailScanner has gained an interesting optimization recently, that may be
> worth adding to Exim. It takes a checksum of the message body, and if that
> matches the checksum of a message that has already been received it skips
> the full SpamAssassin and anti-virus scan and instead pulls the results
> out of a cache. This seems to be more effective than you might expect,
> even without any kind of fuzzy-matching in the checksum.
I haven't noticed as much of the short random padding strings in stuff
that I see recently - a couple of years back many messages appeared to
have a short string at the beginning or end of the body and normally in
the subject.
I guess the thing to do here is handle the *body* - ie no headers so
that there are no receive and tracking headers polluting things.
However it would mean that the content scanning stuff also has to keep a
persistant database - with appropriate aging of old information. Also
need to be careful of things like empty bodies (all have same size and
chksum). Maybe this wants to be a couple of functions - get a message
body signature (used as a key - maybe the size & checksum), and the
ability to store/retrieve data, ideally with a lifetime, which could be
used in ACLs
Nigel.
--
[ Nigel Metheringham Nigel.Metheringham@??? ]
[ - Comments in this message are my own and not ITO opinion/policy - ]