[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable al…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks
https://bugs.exim.org/show_bug.cgi?id=2120

--- Comment #4 from Philip Hazel <ph10@???> ---
Thanks for the comments. I was thinking about this overnight, and had second
thoughts about it, along the lines of what Christian says. I think we need to
know exactly what is the problem here. A valid UTF-8 string can never contain
characters in the surrogate range 0xd800-0xdfff, and the UTF check in
pcre2_match() will pick this up.

I *think* he is saying that some websites have explicit checks for character
values in the surrogate range, using patterns containing explicit values such
as \x{d800}.

I have just realized that there is in any case an oddity in PCRE2. The range
[\x{d7ff}-\x{e000}] is accepted, but [\x{d800}-\x{dfff}] is not.

I will await further input from Rob. Allowing these values in UTF-8 or UTF-32
mode would be possible (under some option), but not in UTF-16 mode because they
cannot be represented in that mode.

--
You are receiving this mail because:
You are on the CC list for the bug.