Autor: Kjetil Torgrim Homme Data: A: Chris Lightfoot CC: exim-users, Marcus Barczak Assumpte: Re: [exim] slightly OT - reconstructing mbox files?
On Thu, 2006-08-24 at 08:54 +0100, Chris Lightfoot wrote: > You need a heuristic to detect a block of message headers,
> and you need to be sure that you don't mistake lines in
> message bodies for headers (e.g. quoted headers in a
> bounce) or the headers of a MIME part for the headers of a
> message.
[snipped lots of good points]
one suggestion for heuristic is to consider each block of candidate
headers: is the _first_ Received header something which your archiving
server would make? if so, it's a new message. any bounces and
forwarded messages should have a different host in their first
Received-line.
to find candidate header blocks, look for empty lines. if the first
line after the blank line is on the format /^[A-Za-z0-9-]+:/, start to
collect the lines, until you see a new blank line.
I haven't tried this myself, of course :)
--
Kjetil T.