[Pcre-svn] [1612] code/trunk: Fix negated POSIX class within…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1612] code/trunk: Fix negated POSIX class within negated overall class UCP bug.
Revision: 1612
          http://vcs.pcre.org/viewvc?view=rev&revision=1612
Author:   ph10
Date:     2015-11-27 17:13:13 +0000 (Fri, 27 Nov 2015)
Log Message:
-----------
Fix negated POSIX class within negated overall class UCP bug.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/pcre_compile.c
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput6


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/ChangeLog    2015-11-27 17:13:13 UTC (rev 1612)
@@ -10,6 +10,9 @@
 1.  If PCRE_AUTO_CALLOUT was set on a pattern that had a (?# comment between 
     an item and its qualifier (for example, A(?#comment)?B) pcre_compile() 
     misbehaved. This bug was found by the LLVM fuzzer.
+    
+2.  Further to 8.38/46, negated classes such as [^[:^ascii:]\d] were also not 
+    working correctly in UCP mode.



Version 8.38 23-November-2015

Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/pcre_compile.c    2015-11-27 17:13:13 UTC (rev 1612)
@@ -5063,20 +5063,22 @@
             ptr = tempptr + 1;
             continue;


-            /* For the other POSIX classes (ascii, xdigit) we are going to fall
-            through to the non-UCP case and build a bit map for characters with
-            code points less than 256. If we are in a negated POSIX class
-            within a non-negated overall class, characters with code points
-            greater than 255 must all match. In the special case where we have
-            not yet generated any xclass data, and this is the final item in
-            the overall class, we need do nothing: later on, the opcode
+            /* For the other POSIX classes (ascii, cntrl, xdigit) we are going
+            to fall through to the non-UCP case and build a bit map for
+            characters with code points less than 256. If we are in a negated
+            POSIX class, characters with code points greater than 255 must
+            either all match or all not match. In the special case where we
+            have not yet generated any xclass data, and this is the final item
+            in the overall class, we need do nothing: later on, the opcode
             OP_NCLASS will be used to indicate that characters greater than 255
             are acceptable. If we have already seen an xclass item or one may
             follow (we have to assume that it might if this is not the end of
-            the class), explicitly match all wide codepoints. */
+            the class), explicitly list all wide codepoints, which will then
+            either not match or match, depending on whether the class is or is
+            not negated. */


             default:
-            if (!negate_class && local_negate &&
+            if (local_negate &&
                 (xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET))
               {
               *class_uchardata++ = XCL_RANGE;


Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6    2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/testdata/testinput6    2015-11-27 17:13:13 UTC (rev 1612)
@@ -1553,4 +1553,13 @@
     \x{200}
     \x{37e}


+/[^[:^ascii:]\d]/8W
+    a
+    ~
+    0
+    \a
+    \x{7f}
+    \x{389}
+    \x{20ac}
+
 /-- End of testinput6 --/


Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2015-11-26 20:29:13 UTC (rev 1611)
+++ code/trunk/testdata/testoutput6    2015-11-27 17:13:13 UTC (rev 1612)
@@ -2557,4 +2557,20 @@
     \x{37e}
  0: \x{37e}


+/[^[:^ascii:]\d]/8W
+    a
+ 0: a
+    ~
+ 0: ~
+    0
+No match
+    \a
+ 0: \x{07}
+    \x{7f}
+ 0: \x{7f}
+    \x{389}
+No match
+    \x{20ac}
+No match
+
 /-- End of testinput6 --/