On Tue, 8 Jul 2003, Philip Hazel wrote:
> End users are much more likely to have written something simple like
>
> if $h_subject: contains "....."
>
> in a filter, where the ..... happens to contain characters that need
> encoding. They don't know about MIME encoding etc. Trying to educate
> them to use (say) $mh_xxxx: instead is going to be one long, continuous
> battle for ever more.
>
> And the third point is that if we end up with all three of $h_ $rh_ and
> $mh_ it is complicated to explain, and the existence of $h_ becomes
> anomalous. What would it be useful for?
>
> I don't yet have a firm view on this. That's why I'm collecting opinions
> from the list.... Which choice is likely to cause the greater pain and
> trouble in the long run?
I doubt that many users will wish to care about the encoding of the
message text, so we need to convert both the message and the comparison
string into the same encoding.
*iconv appears to be the standard way of converting between any
two encodings, and ships with Solaris 7 and RedHat 6.2 at least.
There are cases where these mappings are one-many, but I'm not sure
whether any of these are non-pathological.
How about the following:
$rh_ does a byte wise comparison after converting both strings from
the MIME encoding,
whilst $h_ assumes that the comparison string is in a particular encoding
(given in a server default unless overridden in the filter file in use),
and uses iconv to convert the message text into that encoding.
There will rarely be a need to use multiple encodings in a single filter,
and anyone who does need that can use $rh_.
This way I can use $h_ to test for "internet caf?" without caring
whether your message uses iso-8859-1, iso-8859-15 or UTF-8,
provided only that my sysadmin ensures that my editor and the exim config
agree how the string is stored in my filter file.
--
Dr. Andrew C. Aitchison Computer Officer, DPMMS, Cambridge
A.C.Aitchison@??? http://www.dpmms.cam.ac.uk/~werdna