Author: Alan J. Flavell Date: To: Exim-Users (E-mail) Subject: Re: [exim] Iconv
On Mon, 17 Oct 2005, David Woodhouse wrote:
> On Mon, 2005-10-03 at 13:03 +0100, Alan J. Flavell wrote:
> > Assuming that it's really impractical for exim to map everthing it
> > gets from MIME-encoded headers into some Unicode format for
> > internal use,
>
> Why would that be so impractical?
We've already seen examples reported, of utilities which are intended
to parse exim log files and apparently get thrown by non-ASCII
characters in the log.
> The log should be in UTF-8,
That's an entirely defensible point of view; but log files get written
from all kinds of places in the exim code. I'd make two points:
1) ensuring that exim always writes its logs with valid utf-8 encoding
would be a non-trivial exercise. (And check the applicable Unicode
rules relating to parsing invalid data streams claiming to be utf-8.)
2) introducing utf-8 logs "without the option" is liable to mess up
some of the useful existing log-parsing tools, especially those coming
from third parties.
> and it isn't particularly hard to convert.
I presume you mean by applying the Iconv library. Would that mean
introducing an additional exim pre-requisite? (I'm not aware of exim
currently using Iconv, but maybe it already does...).
Anyhow, in a technical log it *might* be more productive to retain the
original encoded form (for possible debugging purposes), rather than a
form that's derived from it by some complex conversion. I don't think
the case for utf-8 (nor for any other Unicode encoding scheme, for
that matter, though I'd agree that utf-8 looks the best choice if such
a choice has to be made) is so completely cut-and-dried as you
presented it.
Also, as an aside - I'm told that Han unification can lead to loss of
information when CJK codings are converted "by rote" into a Unicode
encoding. But this isn't my field, so I won't try to go into detail.
But if it were to be agreed that it's the right thing to do, then I
suppose that item (1) is best addressed in the course of a general
tidying-up of exim log writing, as this topic has come up before, and
Philip has conceded that there are inconsistencies in there (modulo
the usual shortage of Round Tuits ;-).
> Btw, it's interesting to note that the original poster was sending an
> autoreply to a message with a spam score of 5.1. :)
We had some problems here when we were scoring Asian MIME encodings -
several members of staff come from over there, and have correspondents
they really /do/ want to communicate with, despite the torrent of spam
we were trying to keep out.