Revision: 1612
http://vcs.pcre.org/viewvc?view=rev&revision=1612
Author: ph10
Date: 2015-11-27 17:13:13 +0000 (Fri, 27 Nov 2015)
Log Message:
-----------
Fix negated POSIX class within negated overall class UCP bug.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/pcre_compile.c
code/trunk/testdata/testinput6
code/trunk/testdata/testoutput6
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/ChangeLog 2015-11-27 17:13:13 UTC (rev 1612)
@@ -10,6 +10,9 @@
1. If PCRE_AUTO_CALLOUT was set on a pattern that had a (?# comment between
an item and its qualifier (for example, A(?#comment)?B) pcre_compile()
misbehaved. This bug was found by the LLVM fuzzer.
+
+2. Further to 8.38/46, negated classes such as [^[:^ascii:]\d] were also not
+ working correctly in UCP mode.
Version 8.38 23-November-2015
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/pcre_compile.c 2015-11-27 17:13:13 UTC (rev 1612)
@@ -5063,20 +5063,22 @@
ptr = tempptr + 1;
continue;
- /* For the other POSIX classes (ascii, xdigit) we are going to fall
- through to the non-UCP case and build a bit map for characters with
- code points less than 256. If we are in a negated POSIX class
- within a non-negated overall class, characters with code points
- greater than 255 must all match. In the special case where we have
- not yet generated any xclass data, and this is the final item in
- the overall class, we need do nothing: later on, the opcode
+ /* For the other POSIX classes (ascii, cntrl, xdigit) we are going
+ to fall through to the non-UCP case and build a bit map for
+ characters with code points less than 256. If we are in a negated
+ POSIX class, characters with code points greater than 255 must
+ either all match or all not match. In the special case where we
+ have not yet generated any xclass data, and this is the final item
+ in the overall class, we need do nothing: later on, the opcode
OP_NCLASS will be used to indicate that characters greater than 255
are acceptable. If we have already seen an xclass item or one may
follow (we have to assume that it might if this is not the end of
- the class), explicitly match all wide codepoints. */
+ the class), explicitly list all wide codepoints, which will then
+ either not match or match, depending on whether the class is or is
+ not negated. */
default:
- if (!negate_class && local_negate &&
+ if (local_negate &&
(xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET))
{
*class_uchardata++ = XCL_RANGE;
Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6 2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/testdata/testinput6 2015-11-27 17:13:13 UTC (rev 1612)
@@ -1553,4 +1553,13 @@
\x{200}
\x{37e}
+/[^[:^ascii:]\d]/8W
+ a
+ ~
+ 0
+ \a
+ \x{7f}
+ \x{389}
+ \x{20ac}
+
/-- End of testinput6 --/
Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6 2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/testdata/testoutput6 2015-11-27 17:13:13 UTC (rev 1612)
@@ -2557,4 +2557,20 @@
\x{37e}
0: \x{37e}
+/[^[:^ascii:]\d]/8W
+ a
+ 0: a
+ ~
+ 0: ~
+ 0
+No match
+ \a
+ 0: \x{07}
+ \x{7f}
+ 0: \x{7f}
+ \x{389}
+No match
+ \x{20ac}
+No match
+
/-- End of testinput6 --/