On 2018-08-17, Phil Pennock via Exim-dev <exim-dev@???> wrote:
> Anyone have strong feelings on how Exim should handle UTF-8 with
> operators such as ${length_1:STR} ?
>
> Document that the current operators work on bytes
Yeah stay with treating srings as nul terminated arrays of octets.
The same unit the RFCs use to define email and SMTP.
> and add ulength_1 for being UTF-8 aware?
Would also need utf8-aware also substr and strlen.
is it going to count code-points or glyphs?
> Look at the top-bit being set and assume UTF-8, or
> will that break too much with all the places which are still ISO-8859-1?
Just looking at that bit won't tell you enough to count code-points or
glyphs. you need to then group the octets together, and you need to do
something when you hit a non-valid octet....
parts of ${utf8clean can probably be re-used.
"${lc" "${uc" and "${if eqi" need consideraton too
--
ت