Re: [exim-dev] invalid byte sequence for encoding "UTF8" fro…

Top Page
Delete this message
Reply to this message
Author: Phil Pennock
Date:  
To: Axel Rau
CC: exim-dev
Subject: Re: [exim-dev] invalid byte sequence for encoding "UTF8" from header
On 2012-01-29 at 16:58 +0100, Axel Rau wrote:
> I'm getting
> PGSQL: query failed: ERROR: invalid byte sequence for encoding "UTF8": 0xfc
> while trying to insert a $h_subject.
> Main config contains
> headers_charset = utf-8
> and exim has been built with
> HAVE_ICONV=yes
> ldd shows
> libiconv.so.3 => /usr/local/lib/libiconv.so.3 (0x2817f000)
> Is this a bug in headers_charset?
> May illegal utf-8 in a header be forwarded to $h?


Yes. Headers in reality contain arbitrary binary data; they're supposed
to be constrained to ASCII with MIME used for encoding other data, but
that can't be relied upon.

The "headers_charset" option only affects MIME decoding of RFC 2047
constructs; if the construct is =?KOI8-RU?Q?...?= then that "..." will
be decoded to KOI8-RU, then translated to headers_charset if possible.
If there was a translation error (unsupported by iconv conversion) then
the data is included verbatim.

There's no support for coercing all raw binary data encountered into the
charset, MIME is assumed to be used for non-ASCII.

Proposals for a better system of handling errors appreciated. Also for
how to efficiently deal with systems that insert binary raw into
headers.
--
https://twitter.com/syscomet