[Pcre-svn] [962] code/trunk: Fix auto-possessifying bugs whe…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [962] code/trunk: Fix auto-possessifying bugs when PCRE_UCP is not set, but character tables
Revision: 962
          http://vcs.pcre.org/viewvc?view=rev&revision=962
Author:   ph10
Date:     2012-04-20 18:28:23 +0100 (Fri, 20 Apr 2012)


Log Message:
-----------
Fix auto-possessifying bugs when PCRE_UCP is not set, but character tables
specify characters in the range 127-255 are letters, spaces, etc.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/pcre_compile.c
    code/trunk/testdata/testinput15
    code/trunk/testdata/testinput18
    code/trunk/testdata/testoutput15
    code/trunk/testdata/testoutput18


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2012-04-20 11:49:13 UTC (rev 961)
+++ code/trunk/ChangeLog    2012-04-20 17:28:23 UTC (rev 962)
@@ -9,7 +9,7 @@
 2.  Removed a bashism from the RunTest script.


 3.  Add a cast to pcre_exec.c to fix the warning "unary minus operator applied
-    to unsigned type, result still unsigned" that was given by an MS compiler 
+    to unsigned type, result still unsigned" that was given by an MS compiler
     on encountering the code "-sizeof(xxx)".


 4.  Partial matching support is added to the JIT compiler.
@@ -18,13 +18,13 @@
     of more than one character:


     (a) /^(..)\1/ did not partially match "aba" because checking references was
-        done on an "all or nothing" basis. This also applied to repeated 
+        done on an "all or nothing" basis. This also applied to repeated
         references.


     (b) \R did not give a hard partial match if \r was found at the end of the
         subject.


-    (c) \X did not give a hard partial match after matching one or more 
+    (c) \X did not give a hard partial match after matching one or more
         characters at the end of the subject.


     (d) When newline was set to CRLF, a pattern such as /a$/ did not recognize
@@ -33,11 +33,11 @@
     (e) When newline was set to CRLF, the metacharacter "." did not recognize
         a partial match for a CR character at the end of the subject string.


-6.  If JIT is requested using /S++ or -s++ (instead of just /S+ or -s+) when 
-    running pcretest, the text "(JIT)" added to the output whenever JIT is 
+6.  If JIT is requested using /S++ or -s++ (instead of just /S+ or -s+) when
+    running pcretest, the text "(JIT)" added to the output whenever JIT is
     actually used to run the match.


-7.  Individual JIT compile options can be set in pcretest by following -s+[+] 
+7.  Individual JIT compile options can be set in pcretest by following -s+[+]
     or /S+[+] with a digit between 1 and 7.


8. OP_NOT now supports any UTF character not just single-byte ones.
@@ -46,21 +46,21 @@

 10. The command "./RunTest list" lists the available tests without actually
     running any of them. (Because I keep forgetting what they all are.)
-    
-11. Add PCRE_INFO_MAXLOOKBEHIND. 


+11. Add PCRE_INFO_MAXLOOKBEHIND.
+
 12. Applied a (slightly modified) user-supplied patch that improves performance
     when the heap is used for recursion (compiled with --disable-stack-for-
-    recursion). Instead of malloc and free for each heap frame each time a 
-    logical recursion happens, frames are retained on a chain and re-used where 
-    possible. This sometimes gives as much as 30% improvement.  
-    
+    recursion). Instead of malloc and free for each heap frame each time a
+    logical recursion happens, frames are retained on a chain and re-used where
+    possible. This sometimes gives as much as 30% improvement.
+
 13. As documented, (*COMMIT) is now confined to within a recursive subpattern
     call.


14. As documented, (*COMMIT) is now confined to within a positive assertion.

-15. It is now possible to link pcretest with libedit as an alternative to 
+15. It is now possible to link pcretest with libedit as an alternative to
     libreadline.


 16. (*COMMIT) control verb is now supported by the JIT compiler.
@@ -86,12 +86,17 @@
     matches in certain environments (the workspace was not being correctly
     retained). Also added to pcre_dfa_exec() a simple plausibility check on
     some of the workspace data at the beginning of a restart.
-    
+
 25. \s*\R was auto-possessifying the \s* when it should not, whereas \S*\R
     was not doing so when it should - probably a typo introduced by SVN 528
-    (change 8.10/14). 
+    (change 8.10/14).


+26. When PCRE_UCP was not set, \w+\x{c4} was incorrectly auto-possessifying the
+    \w+ when the character tables indicated that \x{c4} was a word character.
+    There were several related cases, all because the tests for doing a table
+    lookup were testing for characters less than 127 instead of 255.


+
Version 8.30 04-February-2012
-----------------------------


Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2012-04-20 11:49:13 UTC (rev 961)
+++ code/trunk/pcre_compile.c    2012-04-20 17:28:23 UTC (rev 962)
@@ -3132,22 +3132,22 @@
   When it is set, \d etc. are converted into OP_(NOT_)PROP codes. */


case OP_DIGIT:
- return next > 127 || (cd->ctypes[next] & ctype_digit) == 0;
+ return next > 255 || (cd->ctypes[next] & ctype_digit) == 0;

case OP_NOT_DIGIT:
- return next <= 127 && (cd->ctypes[next] & ctype_digit) != 0;
+ return next <= 255 && (cd->ctypes[next] & ctype_digit) != 0;

case OP_WHITESPACE:
- return next > 127 || (cd->ctypes[next] & ctype_space) == 0;
+ return next > 255 || (cd->ctypes[next] & ctype_space) == 0;

case OP_NOT_WHITESPACE:
- return next <= 127 && (cd->ctypes[next] & ctype_space) != 0;
+ return next <= 255 && (cd->ctypes[next] & ctype_space) != 0;

case OP_WORDCHAR:
- return next > 127 || (cd->ctypes[next] & ctype_word) == 0;
+ return next > 255 || (cd->ctypes[next] & ctype_word) == 0;

case OP_NOT_WORDCHAR:
- return next <= 127 && (cd->ctypes[next] & ctype_word) != 0;
+ return next <= 255 && (cd->ctypes[next] & ctype_word) != 0;

   case OP_HSPACE:
   case OP_NOT_HSPACE:
@@ -3225,22 +3225,22 @@
   switch(-next)
     {
     case ESC_d:
-    return c > 127 || (cd->ctypes[c] & ctype_digit) == 0;
+    return c > 255 || (cd->ctypes[c] & ctype_digit) == 0;


     case ESC_D:
-    return c <= 127 && (cd->ctypes[c] & ctype_digit) != 0;
+    return c <= 255 && (cd->ctypes[c] & ctype_digit) != 0;


     case ESC_s:
-    return c > 127 || (cd->ctypes[c] & ctype_space) == 0;
+    return c > 255 || (cd->ctypes[c] & ctype_space) == 0;


     case ESC_S:
-    return c <= 127 && (cd->ctypes[c] & ctype_space) != 0;
+    return c <= 255 && (cd->ctypes[c] & ctype_space) != 0;


     case ESC_w:
-    return c > 127 || (cd->ctypes[c] & ctype_word) == 0;
+    return c > 255 || (cd->ctypes[c] & ctype_word) == 0;


     case ESC_W:
-    return c <= 127 && (cd->ctypes[c] & ctype_word) != 0;
+    return c <= 255 && (cd->ctypes[c] & ctype_word) != 0;


     case ESC_h:
     case ESC_H:


Modified: code/trunk/testdata/testinput15
===================================================================
--- code/trunk/testdata/testinput15    2012-04-20 11:49:13 UTC (rev 961)
+++ code/trunk/testdata/testinput15    2012-04-20 17:28:23 UTC (rev 962)
@@ -277,4 +277,40 @@


/\777/8DZ

+/\w+\x{C4}/8BZ
+    a\x{C4}\x{C4}
+
+/\w+\x{C4}/8BZT1
+    a\x{C4}\x{C4}
+    
+/\W+\x{C4}/8BZ
+    !\x{C4}
+ 
+/\W+\x{C4}/8BZT1
+    !\x{C4}
+
+/\W+\x{A1}/8BZ
+    !\x{A1}
+ 
+/\W+\x{A1}/8BZT1
+    !\x{A1}
+
+/X\s+\x{A0}/8BZ
+    X\x20\x{A0}\x{A0}
+
+/X\s+\x{A0}/8BZT1
+    X\x20\x{A0}\x{A0}
+
+/\S+\x{A0}/8BZ
+    X\x{A0}\x{A0}
+
+/\S+\x{A0}/8BZT1
+    X\x{A0}\x{A0}
+
+/\x{a0}+\s!/8BZ
+    \x{a0}\x20!
+
+/\x{a0}+\s!/8BZT1
+    \x{a0}\x20!
+
 /-- End of testinput15 --/


Modified: code/trunk/testdata/testinput18
===================================================================
--- code/trunk/testdata/testinput18    2012-04-20 11:49:13 UTC (rev 961)
+++ code/trunk/testdata/testinput18    2012-04-20 17:28:23 UTC (rev 962)
@@ -240,4 +240,40 @@


/\x{DF00}/8

+/\w+\x{C4}/8BZ
+    a\x{C4}\x{C4}
+
+/\w+\x{C4}/8BZT1
+    a\x{C4}\x{C4}
+    
+/\W+\x{C4}/8BZ
+    !\x{C4}
+ 
+/\W+\x{C4}/8BZT1
+    !\x{C4}
+
+/\W+\x{A1}/8BZ
+    !\x{A1}
+ 
+/\W+\x{A1}/8BZT1
+    !\x{A1}
+
+/X\s+\x{A0}/8BZ
+    X\x20\x{A0}\x{A0}
+
+/X\s+\x{A0}/8BZT1
+    X\x20\x{A0}\x{A0}
+
+/\S+\x{A0}/8BZ
+    X\x{A0}\x{A0}
+
+/\S+\x{A0}/8BZT1
+    X\x{A0}\x{A0}
+
+/\x{a0}+\s!/8BZ
+    \x{a0}\x20!
+
+/\x{a0}+\s!/8BZT1
+    \x{a0}\x20!
+
 /-- End of testinput18 --/


Modified: code/trunk/testdata/testoutput15
===================================================================
--- code/trunk/testdata/testoutput15    2012-04-20 11:49:13 UTC (rev 961)
+++ code/trunk/testdata/testoutput15    2012-04-20 17:28:23 UTC (rev 962)
@@ -910,4 +910,140 @@
 First char = \x{c7}
 Need char = \x{bf}


+/\w+\x{C4}/8BZ
+------------------------------------------------------------------
+        Bra
+        \w++
+        \x{c4}
+        Ket
+        End
+------------------------------------------------------------------
+    a\x{C4}\x{C4}
+ 0: a\x{c4}
+
+/\w+\x{C4}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \w+
+        \x{c4}
+        Ket
+        End
+------------------------------------------------------------------
+    a\x{C4}\x{C4}
+ 0: a\x{c4}\x{c4}
+    
+/\W+\x{C4}/8BZ
+------------------------------------------------------------------
+        Bra
+        \W+
+        \x{c4}
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{C4}
+ 0: !\x{c4}
+ 
+/\W+\x{C4}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \W++
+        \x{c4}
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{C4}
+ 0: !\x{c4}
+
+/\W+\x{A1}/8BZ
+------------------------------------------------------------------
+        Bra
+        \W+
+        \x{a1}
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{A1}
+ 0: !\x{a1}
+ 
+/\W+\x{A1}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \W+
+        \x{a1}
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{A1}
+ 0: !\x{a1}
+
+/X\s+\x{A0}/8BZ
+------------------------------------------------------------------
+        Bra
+        X
+        \s++
+        \x{a0}
+        Ket
+        End
+------------------------------------------------------------------
+    X\x20\x{A0}\x{A0}
+ 0: X \x{a0}
+
+/X\s+\x{A0}/8BZT1
+------------------------------------------------------------------
+        Bra
+        X
+        \s+
+        \x{a0}
+        Ket
+        End
+------------------------------------------------------------------
+    X\x20\x{A0}\x{A0}
+ 0: X \x{a0}\x{a0}
+
+/\S+\x{A0}/8BZ
+------------------------------------------------------------------
+        Bra
+        \S+
+        \x{a0}
+        Ket
+        End
+------------------------------------------------------------------
+    X\x{A0}\x{A0}
+ 0: X\x{a0}\x{a0}
+
+/\S+\x{A0}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \S++
+        \x{a0}
+        Ket
+        End
+------------------------------------------------------------------
+    X\x{A0}\x{A0}
+ 0: X\x{a0}
+
+/\x{a0}+\s!/8BZ
+------------------------------------------------------------------
+        Bra
+        \x{a0}++
+        \s
+        !
+        Ket
+        End
+------------------------------------------------------------------
+    \x{a0}\x20!
+ 0: \x{a0} !
+
+/\x{a0}+\s!/8BZT1
+------------------------------------------------------------------
+        Bra
+        \x{a0}+
+        \s
+        !
+        Ket
+        End
+------------------------------------------------------------------
+    \x{a0}\x20!
+ 0: \x{a0} !
+
 /-- End of testinput15 --/


Modified: code/trunk/testdata/testoutput18
===================================================================
--- code/trunk/testdata/testoutput18    2012-04-20 11:49:13 UTC (rev 961)
+++ code/trunk/testdata/testoutput18    2012-04-20 17:28:23 UTC (rev 962)
@@ -845,4 +845,140 @@
 /\x{DF00}/8
 Failed: invalid UTF-16 string at offset 0


+/\w+\x{C4}/8BZ
+------------------------------------------------------------------
+        Bra
+        \w++
+        \xc4
+        Ket
+        End
+------------------------------------------------------------------
+    a\x{C4}\x{C4}
+ 0: a\x{c4}
+
+/\w+\x{C4}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \w+
+        \xc4
+        Ket
+        End
+------------------------------------------------------------------
+    a\x{C4}\x{C4}
+ 0: a\x{c4}\x{c4}
+    
+/\W+\x{C4}/8BZ
+------------------------------------------------------------------
+        Bra
+        \W+
+        \xc4
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{C4}
+ 0: !\x{c4}
+ 
+/\W+\x{C4}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \W++
+        \xc4
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{C4}
+ 0: !\x{c4}
+
+/\W+\x{A1}/8BZ
+------------------------------------------------------------------
+        Bra
+        \W+
+        \xa1
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{A1}
+ 0: !\x{a1}
+ 
+/\W+\x{A1}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \W+
+        \xa1
+        Ket
+        End
+------------------------------------------------------------------
+    !\x{A1}
+ 0: !\x{a1}
+
+/X\s+\x{A0}/8BZ
+------------------------------------------------------------------
+        Bra
+        X
+        \s++
+        \xa0
+        Ket
+        End
+------------------------------------------------------------------
+    X\x20\x{A0}\x{A0}
+ 0: X \x{a0}
+
+/X\s+\x{A0}/8BZT1
+------------------------------------------------------------------
+        Bra
+        X
+        \s+
+        \xa0
+        Ket
+        End
+------------------------------------------------------------------
+    X\x20\x{A0}\x{A0}
+ 0: X \x{a0}\x{a0}
+
+/\S+\x{A0}/8BZ
+------------------------------------------------------------------
+        Bra
+        \S+
+        \xa0
+        Ket
+        End
+------------------------------------------------------------------
+    X\x{A0}\x{A0}
+ 0: X\x{a0}\x{a0}
+
+/\S+\x{A0}/8BZT1
+------------------------------------------------------------------
+        Bra
+        \S++
+        \xa0
+        Ket
+        End
+------------------------------------------------------------------
+    X\x{A0}\x{A0}
+ 0: X\x{a0}
+
+/\x{a0}+\s!/8BZ
+------------------------------------------------------------------
+        Bra
+        \xa0++
+        \s
+        !
+        Ket
+        End
+------------------------------------------------------------------
+    \x{a0}\x20!
+ 0: \x{a0} !
+
+/\x{a0}+\s!/8BZT1
+------------------------------------------------------------------
+        Bra
+        \xa0+
+        \s
+        !
+        Ket
+        End
+------------------------------------------------------------------
+    \x{a0}\x20!
+ 0: \x{a0} !
+
 /-- End of testinput18 --/