Re: [exim-dev] UTF-8 and Exim string operations

Autor: Jasen Betts
Data:
Para: exim-dev
Asunto: Re: [exim-dev] UTF-8 and Exim string operations

On 2018-08-17, Phil Pennock via Exim-dev <exim-dev@???> wrote:
> Anyone have strong feelings on how Exim should handle UTF-8 with
> operators such as ${length_1:STR} ?
>
> Document that the current operators work on bytes

Yeah stay with treating srings as nul terminated arrays of octets.
The same unit the RFCs use to define email and SMTP.

> and add ulength_1 for being UTF-8 aware?

Would also need utf8-aware also substr and strlen.
is it going to count code-points or glyphs?

> Look at the top-bit being set and assume UTF-8, or
> will that break too much with all the places which are still ISO-8859-1?

Just looking at that bit won't tell you enough to count code-points or
glyphs. you need to then group the octets together, and you need to do
something when you hit a non-valid octet....
parts of ${utf8clean can probably be re-used.

"${lc" "${uc" and "${if eqi" need consideraton too

-- 
     ت

Esta mensaxe é parte do seguinte fío:
	Árbore completa do fío ordenada por data
	Phil Pennock o
	Jeremy Harris o