Autore: Alan J. Flavell Data: To: Exim-Users (E-mail) Oggetto: Re: [exim] Iconv
On Mon, 3 Oct 2005, Ron McKeating wrote:
> > As you see, those raw bytes have been written to the log. You're then
> > pasting them into your mail to this list, that's advertised as
> > iso-8859-1, resulting in the unreadable sequence of accented Latin-1
> > characters that I (and presumably most of the rest of us) are seeing.
I've given this a bit more thought, and, really, I'd have to say that
it's illogical to write raw bytes to the log without any character
encoding indication, since (1) it could result in almost any bytes
appearing in the log, and confusing parsing routines and (2) it loses
information which might prove to be important later.
Assuming that it's really impractical for exim to map everthing it
gets from MIME-encoded headers into some Unicode format for internal
use, the only alternative that comes immediately to mind would be to
log the actual MIME-encoded version instead, i.e in this case it would
be:
=?GB2312?B?0e7PyMn6?=
which (although equally human-unreadable as the raw bytes) at least
has not lost information, and is capable of being decoded to
human-readable format for anyone who can read that writing system.
I suppose it's a question to Phil as to whether it would be
practicable to do that (it ought to be done consistently in any of the
places where such data gets into the log, at least from conforming
headers, which I believe this was).
Of course this would have to be accompanied by whatever fixes were
needed in log-parsing utilities to ensure that their parsing, at
least, wasn't broken by this "new" format appearing in the log.
> Well the header I put in above is the header for the vacation reply,
> and what you see there is how it appeared in the log when viewed
> with exigrep and vi.
Yes, but that's only successful very indirectly. It just so happens
that we both work in a Latin-1 locale - so you see those raw bytes as
Latin-1 characters, but presumably anyone who worked in a different
locale would see something different in vi. Hence the importance of
also *describing* what one is seeing, rather than merely pasting raw
bytes. I recall a totally incomprehensible email conversation with
someone who was seeing Cyrillic (because he worked in koi8-r) but
refusing to say so, merely assuring me that I could see for myself -
but what I was seeing in his raw mail was accented Latin-1! (The
standoff was finally resolved, otherwise I wouldn't have been able to
explain it here...).
Doubtless a Chinese user of exim would be baffled by our inability to
see the "proper" Chinese characters in the log (but might instead
complain if we happened to log some accented French character string
there).
> Tom Kistner has said he may take at look at his exilog package to
> see if there is anyway to stop this from breaking exilog. Lets see
> how that goes.
I doubt that the raw byte string can contain white space or a zero
byte, but, in general, I guess it could contain pretty much anything
else. Curious that this issue doesn't come up more often...?