[exim-dev] Re: [Bug 2998] utf8clean should mask surrogate c…

Top Page
Delete this message
Reply to this message
Author: Jasen Betts
Date:  
To: exim-dev
Subject: [exim-dev] Re: [Bug 2998] utf8clean should mask surrogate code points (U00D800 to U00DFFFF)
On 2023-07-22, Exim Bugzilla via Exim-dev <exim-dev@???> wrote:
> https://bugs.exim.org/show_bug.cgi?id=2998
>
> --- Comment #1 from Jeremy Harris <jgh146exb@???> ---
> The patch looks simple, but I can't pretend to understand that bit of
> RFC 2279. It seems to be taking about UCS-2 rather than UTF-8.
> Is a better description possible?


interestingly that RFC seems to use UCS-2 interchanably with UTF-16


There was an excellent discussion of WTF-8 (like UTF-8 but with
surrogates) somewhere on the ineternet (I thought wikipedia, but I
can't find it now)


https://unicodebook.readthedocs.io/unicode_encodings.html
section 7.5. UTF-16 surrogate pairs

This bug is mainly motiviated by postgresql only accepting well formed
UTF-8. so UTF-8 that encodes uFE01 is rejected and leads to
mis-behaviour.


--
Jasen.
🇺🇦 Слава Україні

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@???
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/