On Thu, 29 Jan 1998, Martin Hamilton wrote:
> procmail could be tweaked to add a "recent MD5" cache to its existing
> "recent message ID" cache, etc etc.
I do exactly that, by making "formail" believe that the MD5 is really a
message-id.
This is a procmail recipe that remembers recent MD5 checksums, and
files duplicate messages in the "duplicate-messages" folder.
# Detect duplicate messages based on MD5 of normalised body
:0:.md5.lock
* B ?? ? (m=`$HOME/bin/normalise-body | md5`; \
echo "Message-ID: <$m@MD5>" \
| formail -D 8192 .body-md5.cache )
duplicate-messages
It relies on an "md5" command that simply spits out an MD5
checksum in hex, and on this silly little "normalise-body"
script:
#!/usr/bin/perl
# A very weak attempt at normalising the body of a mail message.
# Removes trailing white space on all lines, and removes leading
# and trailing blank lines.
# Does not attempt to normalise any MIME content-transfer-encoding.
$total_nonblank_lines = 0;
$consecutive_blank_lines = 0;
while (<>) {
s/\s+$//;
if (/^$/) {
$consecutive_blank_lines++;
} else {
print "\n" x $consecutive_blank_lines if $total_nonblank_lines;
print $_;
$consecutive_blank_lines = 0;
$total_nonblank_lines++;
}
}
The above procmail recipe and perl code are in the public domain.5
--apb (Alan Barrett)
--
*** Exim information can be found at
http://www.exim.org/ ***