[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable al…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks
https://bugs.exim.org/show_bug.cgi?id=2120

--- Comment #5 from Rob <rob@???> ---
Thanks for replying to my issue. i'll try to clarify ...

I'm using PCRE2 in a Javascript interpreter for a web browser. Viewing some
pages on the New York Times website caused the Javascript interpreter to throw
a syntax error at the following line ...

var f =
/[\x00-\x1f\ud800-\udfff\ufffe\uffff\u0300-\u0333\u033d-\u0346\u034a-\u034c\u0350-\u0352\u0357-\u0358\u035c-\u0362\u0374\u037e\u0387\u0591-\u05af\u05c4\u0610-\u0617\u0653-\u0654\u0657-\u065b\u065d-\u065e\u06df-\u06e2\u06eb-\u06ec\u0730\u0732-\u0733\u0735-\u0736\u073a\u073d\u073f-\u0741\u0743\u0745\u0747\u07eb-\u07f1\u0951\u0958-\u095f\u09dc-\u09dd\u09df\u0a33\u0a36\u0a59-\u0a5b\u0a5e\u0b5c-\u0b5d\u0e38-\u0e39\u0f43\u0f4d\u0f52\u0f57\u0f5c\u0f69\u0f72-\u0f76\u0f78\u0f80-\u0f83\u0f93\u0f9d\u0fa2\u0fa7\u0fac\u0fb9\u1939-\u193a\u1a17\u1b6b\u1cda-\u1cdb\u1dc0-\u1dcf\u1dfc\u1dfe\u1f71\u1f73\u1f75\u1f77\u1f79\u1f7b\u1f7d\u1fbb\u1fbe\u1fc9\u1fcb\u1fd3\u1fdb\u1fe3\u1feb\u1fee-\u1fef\u1ff9\u1ffb\u1ffd\u2000-\u2001\u20d0-\u20d1\u20d4-\u20d7\u20e7-\u20e9\u2126\u212a-\u212b\u2329-\u232a\u2adc\u302b-\u302c\uaab2-\uaab3\uf900-\ufa0d\ufa10\ufa12\ufa15-\ufa1e\ufa20\ufa22\ufa25-\ufa26\ufa2a-\ufa2d\ufa30-\ufa6d\ufa70-\ufad9\ufb1d\ufb1f\ufb2a-\ufb36\ufb38-\ufb3c\ufb3e\ufb40-\ufb41\ufb43-\ufb44\ufb46-\ufb4e\ufff0-\uffff]/g;

pcre2_compile fails at the range \ud800-\udfff. The JS interpreter must respond
with a syntax error and the script is not executed.

I cant control what JS developers use as a regex queries other than insuring
the script being parsed is valid UTF8 (or gets converted to UTF8 by the
browser), and this query works in both Chrome and Firefox. Checking the
surrogate range seems too benign to be error-worthy.

I could be partly to blame for using PCRE in UTF8 mode instead of UTF16 as per
the Javascript specification, and i'm unsure whether it would throw the same
error in UTF16 mode

I understand PCRE2_NO_UTF_CHECK might not be the right solution. I'm quite
happy to continue hand patching new PCRE releases. I mostly submitted this
because a google search for "disallowed Unicode code point (>= 0xd800 && <=
0xdfff)" generated quite a few hits so it doesnt seem to be an inconsequential
issue.

--
You are receiving this mail because:
You are on the CC list for the bug.