"Dave Cinege" writes:
| Using a wrapper script seemed to help a lot but hypermail is still taking a s
| hit on me when the archive starts to grow (like 15-20+ posts)
If you have a plain text archive of the messages try feeding them into
hypermail manually with the progress reporting stuff turned on. Its
header processing (particularly the ^From line stuff - eek!) looks to
be seriously brain damaged. That version I hacked up should have a
work-around for one particular problem we were having, but there are
probably any number of others. I don't even want to think about mail
from people using things like cc:Mail, Exchange and Outlook.
I understand (but have just come back from a foreign trip and have
2774 more unread email messages, so I won't check just now :-) that
EIT have re-released all of Kevin Hughes' code under a different
copyright. I've an unconfirmed rumour that it's the GPL.
I'm not at all sure that hacking at Hypermail even more to make it
work properly is the right answer, though - e.g. handling MIME
properly would be a big deal. There's a few other interesting
possibilities, like MHonArc and Wilma.
Ciao!
Martin
PS While I'm here - here's an idea which came up at the joint
TERENA/RIPE anti-spamming BOF yesterday :-
Tweak mail systems like Exim (and sendmail :-) to generate a
Content-MD5: header for the message bodies, if it's not already
present - see RFC 1864 for more info.
Optionally :-
Add support to the mail system itself for some level of throttling
when N messages with the same message body MD5 are sent within some
timeframe M. This could also be done externally, of course - e.g.
procmail could be tweaked to add a "recent MD5" cache to its existing
"recent message ID" cache, etc etc.
The motivation is to partially defeat the case where identical spam is
sent to zillions of people through the same system (though I'm sure
that the spammers would quickly retaliate by say including their own
MD5 - of the Date: header in the message body, but it would raise the
entry level barrier just a little :-), and also to address the
problems of broken end user mail programs which try to send multiple
copies of the same message, and broken mail hubs/gateways which
stupidly send error messages to the wrong places.
The neat thing is that if we're talking about something which is
integrated into the overall Internet mail system, a lot of this stuff
will be automatically dealt with at the point where it's initially
injected - as opposed to wasting bandwidth and machine/people time at
the receiving end. It's also good for preventing Dilbert style bosses
from spamming the whole of their staff :-) And... it doesn't require
knowledge of the message body contents, which gets around the legal/
ethical issue to an extent.
Dan will probably have this up and running in qmail by now, but I just
thought I'd run it by people here to see what they thought. The
response from the people at the BOF when I suggested it was mostly
blank looks, but then most of them didn't seem to know about things
like RBL (which only about three of the research network operators
said they were using) and relay blocking, so...
PPS Of course... MD5 might not be the best choice of checksum because
of the recent collision scares.
--
*** Exim information can be found at
http://www.exim.org/ ***