[Pcre-svn] [627] code/trunk: Fix bug when a character > 0xff…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [627] code/trunk: Fix bug when a character > 0xffff appears in a lookbehind within a lookbehind.
Revision: 627
          http://www.exim.org/viewvc/pcre2?view=rev&revision=627
Author:   ph10
Date:     2016-12-24 16:25:11 +0000 (Sat, 24 Dec 2016)
Log Message:
-----------
Fix bug when a character > 0xffff appears in a lookbehind within a lookbehind.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/src/pcre2_compile.c
    code/trunk/testdata/testinput5
    code/trunk/testdata/testoutput5


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2016-12-23 18:34:10 UTC (rev 626)
+++ code/trunk/ChangeLog    2016-12-24 16:25:11 UTC (rev 627)
@@ -48,12 +48,12 @@
 given only for a callout at the end of the pattern. Automatic callouts are no
 longer inserted before and after explicit callouts in the pattern.


-A number of bugs in the refactored code were subsequently fixed before release,
-but after the code was made available in the repository. Many of the bugs were
-discovered by fuzzing testing. Several of them were related to the change from
-assuming a zero-terminated pattern (which previously had required non-zero
-terminated strings to be copied). These bugs were never in released code, but
-are noted here for the record.
+A number of bugs in the refactored code were subsequently fixed during testing
+before release, but after the code was made available in the repository. Many
+of the bugs were discovered by fuzzing testing. Several of them were related to
+the change from assuming a zero-terminated pattern (which previously had
+required non-zero terminated strings to be copied). These bugs were never in
+fully released code, but are noted here for the record.

   (a) An overall recursion such as (?0) inside a lookbehind assertion was not
       being diagnosed as an error.
@@ -107,14 +107,18 @@
       followed by '?' or '+', and there was at least one literal character
       between them, an internal error "unexpected repeat" occurred (example:
       /.+\QX\E+/).
-      
-  (p) A buffer overflow could occur while sorting the names in the group name 
-      list (depending on the order in which the names were seen). 
-      
+
+  (p) A buffer overflow could occur while sorting the names in the group name
+      list (depending on the order in which the names were seen).
+
   (q) A conditional group that started with a callout was not doing the right
       check for a following assertion, leading to compiling bad code. Example:
-      /(?(C'XX))?!XX/   
+      /(?(C'XX))?!XX/


+  (r) If a character whose code point was greater than 0xffff appeared within
+      a lookbehind that was within another lookbehind, the calculation of the
+      lookbehind length went wrong and could provoke an internal error.
+
 4. Back references are now permitted in lookbehind assertions when there are
 no duplicated group numbers (that is, (?| has not been used), and, if the
 reference is by name, there is only one group of that name. The referenced
@@ -231,24 +235,24 @@
 repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX
 but didn't).


-35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
+35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
matching length and just records zero. Typically this happens when there are
too many nested or recursive back references. If the limit was reached in
-certain recursive cases it failed to be triggered and an internal error could
+certain recursive cases it failed to be triggered and an internal error could
be the result.

-36. The pcre2_dfa_match() function now takes note of the recursion limit for
-the internal recursive calls that are used for lookrounds and recursions within
+36. The pcre2_dfa_match() function now takes note of the recursion limit for
+the internal recursive calls that are used for lookrounds and recursions within
the pattern.

37. More refactoring has got rid of the internal could_be_empty_branch()
function (around 400 lines of code, including comments) by keeping track of
-could-be-emptiness as the pattern is compiled instead of scanning compiled
+could-be-emptiness as the pattern is compiled instead of scanning compiled
groups. (This would have been much harder before the refactoring of #3 above.)
-This lifts a restriction on the number of branches in a group (more than about
+This lifts a restriction on the number of branches in a group (more than about
1100 would give "pattern is too complicated").

-38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
+38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
auto_callout".



Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2016-12-23 18:34:10 UTC (rev 626)
+++ code/trunk/src/pcre2_compile.c    2016-12-24 16:25:11 UTC (rev 627)
@@ -7924,6 +7924,7 @@


 Returns:     new value of pptr
              NULL if META_END is reached - should never occur
+               or for an unknown meta value - likewise 
 */


 static uint32_t *
@@ -7934,9 +7935,11 @@
 for (pptr += 1;; pptr++)
   {
   uint32_t meta = META_CODE(*pptr);
+  
   switch(meta)
     {
     default:  /* Just skip over most items */
+    if (meta < META_END) continue;  /* Literal */
     break;


     /* This should never occur. */
@@ -8007,7 +8010,7 @@


/* The extra data item length for each meta is in a table. */

-  meta = (meta & 0x0fff0000u) >> 16;
+  meta = (meta >> 16) & 0x7fff;
   if (meta >= sizeof(meta_extra_lengths)) return NULL;
   pptr += meta_extra_lengths[meta];
   }
@@ -8497,7 +8500,7 @@
 for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
   {
   if (*pptr < META_END) continue;  /* Literal */
-
+  
   switch (META_CODE(*pptr))
     {
     default:


Modified: code/trunk/testdata/testinput5
===================================================================
--- code/trunk/testdata/testinput5    2016-12-23 18:34:10 UTC (rev 626)
+++ code/trunk/testdata/testinput5    2016-12-24 16:25:11 UTC (rev 627)
@@ -1757,4 +1757,6 @@
 /[\P{Yi}]/utf,locale=C
 \x{2f000}


+/^(?<!(?=􃡜))/B,utf
+
# End of testinput5

Modified: code/trunk/testdata/testoutput5
===================================================================
--- code/trunk/testdata/testoutput5    2016-12-23 18:34:10 UTC (rev 626)
+++ code/trunk/testdata/testoutput5    2016-12-24 16:25:11 UTC (rev 627)
@@ -4203,4 +4203,17 @@
 \x{2f000}
  0: \x{2f000}


+/^(?<!(?=􃡜))/B,utf
+------------------------------------------------------------------
+        Bra
+        ^
+        AssertB not
+        Assert
+        \x{10385c}
+        Ket
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
 # End of testinput5