https://bugs.exim.org/show_bug.cgi?id=2120
--- Comment #4 from Philip Hazel <ph10@???> ---
Thanks for the comments. I was thinking about this overnight, and had second
thoughts about it, along the lines of what Christian says. I think we need to
know exactly what is the problem here. A valid UTF-8 string can never contain
characters in the surrogate range 0xd800-0xdfff, and the UTF check in
pcre2_match() will pick this up.
I *think* he is saying that some websites have explicit checks for character
values in the surrogate range, using patterns containing explicit values such
as \x{d800}.
I have just realized that there is in any case an oddity in PCRE2. The range
[\x{d7ff}-\x{e000}] is accepted, but [\x{d800}-\x{dfff}] is not.
I will await further input from Rob. Allowing these values in UTF-8 or UTF-32
mode would be possible (under some option), but not in UTF-16 mode because they
cannot be represented in that mode.
--
You are receiving this mail because:
You are on the CC list for the bug.