On Fri, 4 Jul 2003, Dr Andrew C Aitchison wrote:
> I have two questions I'd need answering.
>
> 1 What encoding is the match string using ?
Encoding is the nightmare. The easy thing is to turn the Q (quoted
printable) or B (base64) encoding back into individual "binary".
Subsequently converting that to some other encoding begs the question:
what do we convert to?
The patch I was sent came from Japan, and it defaults to a Japanese
encoding. The default is screwed into the code, but is configurable.
However, I'm not at all convinced it makes sense in the generic Exim
code.
> we could invent a syntax such as:
> if $h_subject: contains $encoding{iso-8859-1}{"Internet café"}
> which would allow the encoding of the match string.
But "ordinary" users are never going to be experienced enough to realize
that they need such complexity.
> 2 The same subject line could also be encoded as (I think)
> Subject: =?UTF-8?Q?Internet_caf=C3=A9?=
>
> Do we want your test to match both of these subject lines, or just one ?
Good question, to which I do not know the answer. There is already a
${from_utf8: operator that converts from UTF-8 to single bytes, assuming
all the code points are < 256. In effect, from UTF-8 to ISO-8859-1.
The most basic thing that Exim could do would be to decode to a binary
string, but no more.
> Non-ASCII readers will be in a better position than me to judge;
> can the average user put a string like "Internet café" into their
> filter file with their favourite editor and have the result in an
> unambiguous encoding ?
I suspect that what happens in most places is that the local encoding is
used, though maybe UTF-8 is getting more wide use these days.
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.