[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable al…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks
https://bugs.exim.org/show_bug.cgi?id=2120

--- Comment #16 from Philip Hazel <ph10@???> ---
The good news: I have added (but not yet documented or committed) a new option
field with a bit that allows surrogate code points to be used via escape
sequences in UTF-8 and UTF-32 modes.

The bad news: I am having second thoughts about unknown escape sequences. There
isn't really a problem for ones like \j that have no meaning in Perl, though of
course that does leave one open to disaster when Perl adds something new.
However, PCRE2 warns for some like \L that *do* mean something in Perl but are
not supported by PCRE2. And there are complications such as \o which in Perl
must be followed by octal digits enclosed in braces. If you are in "permissive"
mode, then it seems ok just to treat \opqr as opqr but what about malformed
octal numbers such as \o{999} for which PCRE2 currently gives an error? I feel
this is getting too complicated, so I'm going to back off for now.

I will document and then commit the surrogate patch, and think about other
issues. Zoltan may be right in suggesting the best way forward would be to add
a JS translator.

--
You are receiving this mail because:
You are on the CC list for the bug.