[Pcre-svn] [190] code/trunk: Disallow quantification of asse…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [190] code/trunk: Disallow quantification of assertion conditions, for Perl compatibility (and in
Revision: 190
          http://www.exim.org/viewvc/pcre2?view=rev&revision=190
Author:   ph10
Date:     2015-01-28 17:31:11 +0000 (Wed, 28 Jan 2015)


Log Message:
-----------
Disallow quantification of assertion conditions, for Perl compatibility (and in
any case it didn't always work).

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcre2pattern.3
    code/trunk/src/pcre2_compile.c
    code/trunk/src/pcre2_error.c
    code/trunk/src/pcre2_intmodedep.h
    code/trunk/src/pcre2_match.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/ChangeLog    2015-01-28 17:31:11 UTC (rev 190)
@@ -44,7 +44,18 @@
 segfault at compile time (while trying to find the minimum matching length). 
 The infinite loop is now broken (with the minimum length unset, that is, zero).


+9. If an assertion that was used as a condition was quantified with a minimum
+of zero, matching went wrong. In particular, if the whole group had unlimited
+repetition and could match an empty string, a segfault was likely. The pattern
+(?(?=0)?)+ is an example that caused this. Perl allows assertions to be
+quantified, but not if they are being used as conditions, so the above pattern
+is faulted by Perl. PCRE2 has now been changed so that it also rejects such
+patterns.

+10. The error message for an invalid quantifier has been changed from "nothing
+to repeat" to "quantifier does not follow a repeatable item".
+
+
Version 10.00 05-January-2015
-----------------------------


Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/doc/pcre2pattern.3    2015-01-28 17:31:11 UTC (rev 190)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "26 January 2015" "PCRE2 10.00"
+.TH PCRE2PATTERN 3 "28 January 2015" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -1742,8 +1742,8 @@
   the \eR escape sequence
   an escape such as \ed or \epL that matches a single character
   a character class
-  a back reference (see next section)
-  a parenthesized subpattern (including assertions)
+  a back reference
+  a parenthesized subpattern (including most assertions)
   a subroutine call to a subpattern (recursive or otherwise)
 .sp
 The general repetition quantifier specifies a minimum and maximum number of
@@ -2152,10 +2152,11 @@
 capturing is carried out only for positive assertions. (Perl sometimes, but not
 always, does do capturing in negative assertions.)
 .P
-For compatibility with Perl, assertion subpatterns may be repeated; though
+For compatibility with Perl, most assertion subpatterns may be repeated; though
 it makes no sense to assert the same thing several times, the side effect of
-capturing parentheses may occasionally be useful. In practice, there only three
-cases:
+capturing parentheses may occasionally be useful. However, an assertion that 
+forms the condition for a conditional subpattern may not be quantified. In
+practice, for other assertions, there only three cases:
 .sp
 (1) If the quantifier is {0}, the assertion is never obeyed during matching.
 However, it may contain internal capturing parenthesized groups that are called
@@ -3301,6 +3302,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 January 2015
+Last updated: 28 January 2015
 Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_compile.c    2015-01-28 17:31:11 UTC (rev 190)
@@ -5178,14 +5178,19 @@
           }


         /* For conditions that are assertions, check the syntax, and then exit
-        the switch. This will take control down to where bracketed groups,
-        including assertions, are processed. */
+        the switch. This will take control down to where bracketed groups
+        are processed. The assertion will be handled as part of the group,
+        but we need to identify this case because the conditional assertion may 
+        not be quantifier. */


         if (tempptr[1] == CHAR_QUESTION_MARK &&
               (tempptr[2] == CHAR_EQUALS_SIGN ||
                tempptr[2] == CHAR_EXCLAMATION_MARK ||
                tempptr[2] == CHAR_LESS_THAN_SIGN))
+          {
+          cb->iscondassert = TRUE;       
           break;
+          } 


         /* Other conditions use OP_CREF/OP_DNCREF/OP_RREF/OP_DNRREF, and all
         need to skip at least 1+IMM2_SIZE bytes at the start of the group. */
@@ -6098,12 +6103,22 @@
       goto FAILED;
       }


-    /* Assertions used not to be repeatable, but this was changed for Perl
-    compatibility, so all kinds can now be repeated. We copy code into a
+    /* All assertions used not to be repeatable, but this was changed for Perl
+    compatibility. All kinds can now be repeated except for assertions that are 
+    conditions (Perl also forbids these to be repeated). We copy code into a
     non-register variable (tempcode) in order to be able to pass its address
-    because some compilers complain otherwise. */
+    because some compilers complain otherwise. At the start of a conditional 
+    group whose condition is an assertion, cb->iscondassert is set. We unset it 
+    here so as to allow assertions later in the group to be quantified. */


-    previous = code;                      /* For handling repetition */
+    if (bravalue >= OP_ASSERT && bravalue <= OP_ASSERTBACK_NOT &&
+        cb->iscondassert)
+      {
+      previous = NULL;
+      cb->iscondassert = FALSE;
+      }
+    else previous = code;          
+      
     *code = bravalue;
     tempcode = code;
     tempreqvary = cb->req_varyopt;        /* Save value before bracket */
@@ -6121,9 +6136,9 @@
          skipbytes,                       /* Skip over bracket number */
          cond_depth +
            ((bravalue == OP_COND)?1:0),   /* Depth of condition subpatterns */
-         &subfirstcu,                   /* For possible first char */
+         &subfirstcu,                     /* For possible first char */
          &subfirstcuflags,
-         &subreqcu,                     /* For possible last char */
+         &subreqcu,                       /* For possible last char */
          &subreqcuflags,
          bcptr,                           /* Current branch chain */
          cb,                              /* Compile data block */
@@ -7474,6 +7489,7 @@
 cb.external_flags = 0;
 cb.external_options = options;
 cb.hwm = cworkspace;
+cb.iscondassert = FALSE;
 cb.max_lookbehind = 0;
 cb.name_entry_size = 0;
 cb.name_table = NULL;
@@ -7725,6 +7741,7 @@
 cb.name_table = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code));
 cb.start_code = codestart;
 cb.hwm = (PCRE2_UCHAR *)(cb.start_workspace);
+cb.iscondassert = FALSE;
 cb.req_varyopt = 0;
 cb.had_accept = FALSE;
 cb.had_pruneorskip = FALSE;


Modified: code/trunk/src/pcre2_error.c
===================================================================
--- code/trunk/src/pcre2_error.c    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_error.c    2015-01-28 17:31:11 UTC (rev 190)
@@ -74,7 +74,7 @@
   "missing terminating ] for character class\0"
   "invalid escape sequence in character class\0"
   "range out of order in character class\0"
-  "nothing to repeat\0"
+  "quantifier does not follow a repeatable item\0"
   /* 10 */
   "internal error: unexpected repeat\0"
   "unrecognized character after (? or (?-\0"


Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_intmodedep.h    2015-01-28 17:31:11 UTC (rev 190)
@@ -688,6 +688,7 @@
   BOOL had_pruneorskip;            /* (*PRUNE) or (*SKIP) encountered */
   BOOL check_lookbehind;           /* Lookbehinds need later checking */
   BOOL dupnames;                   /* Duplicate names exist */
+  BOOL iscondassert;               /* Next assert is a condition */ 
 } compile_block;


/* Structure for keeping the properties of the in-memory stack used

Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_match.c    2015-01-28 17:31:11 UTC (rev 190)
@@ -75,7 +75,7 @@
 #define OVFLMASK    0xffff0000    /* The bits used for the overflow flag */
 #define OVFLBIT     0x00010000    /* The bit that is set for overflow */


-/* Values for setting in mb->match_function_type to indicate two special types
+/* Bits for setting in mb->match_function_type to indicate two special types
of call to match(). We do it this way to save on using another stack variable,
as stack usage is to be discouraged. */

@@ -487,7 +487,7 @@

 do
   {
-  if (cbegroup) mb->match_function_type = MATCH_CBEGROUP;
+  if (cbegroup) mb->match_function_type |= MATCH_CBEGROUP;
   rrc = match(eptr, callpat + PRIV(OP_lengths)[*callpat], mstart, offset_top,
     mb, eptrb, rdepth + 1);
   memcpy(mb->ovector, new_recursive->ovec_save,
@@ -771,9 +771,9 @@
 if (rdepth >= mb->match_limit_recursion) RRETURN(PCRE2_ERROR_RECURSIONLIMIT);


/* At the start of a group with an unlimited repeat that may match an empty
-string, the variable mb->match_function_type is set to MATCH_CBEGROUP. It is
-done this way to save having to use another function argument, which would take
-up space on the stack. See also MATCH_CONDASSERT below.
+string, the variable mb->match_function_type contains the MATCH_CBEGROUP bit.
+It is done this way to save having to use another function argument, which
+would take up space on the stack. See also MATCH_CONDASSERT below.

When MATCH_CBEGROUP is set, add the current subject pointer to the chain of
such remembered pointers, to be checked when we hit the closing ket, in order
@@ -782,12 +782,12 @@
NOT be used with tail recursion, because the memory block that is used is on
the stack, so a new one may be required for each match(). */

-if (mb->match_function_type == MATCH_CBEGROUP)
+if ((mb->match_function_type & MATCH_CBEGROUP) != 0)
{
newptrb.epb_saved_eptr = eptr;
newptrb.epb_prev = eptrb;
eptrb = &newptrb;
- mb->match_function_type = 0;
+ mb->match_function_type &= ~MATCH_CBEGROUP;
}

/* Now, at last, we can start processing the opcodes. */
@@ -1016,7 +1016,7 @@

       for (;;)
         {
-        if (op >= OP_SBRA) mb->match_function_type = MATCH_CBEGROUP;
+        if (op >= OP_SBRA) mb->match_function_type |= MATCH_CBEGROUP;
         RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, mb,
           eptrb, RM1);
         if (rrc == MATCH_ONCE) break;  /* Backing up through an atomic group */
@@ -1091,7 +1091,7 @@
     for (;;)
       {
       if (op >= OP_SBRA || op == OP_ONCE)
-        mb->match_function_type = MATCH_CBEGROUP;
+        mb->match_function_type |= MATCH_CBEGROUP;


       /* If this is not a possibly empty group, and there are no (*THEN)s in
       the pattern, and this is the final alternative, optimize as described
@@ -1181,7 +1181,7 @@
       for (;;)
         {
         mb->ovector[mb->offset_end - number] = eptr - mb->start_subject;
-        if (op >= OP_SBRA) mb->match_function_type = MATCH_CBEGROUP;
+        if (op >= OP_SBRA) mb->match_function_type |= MATCH_CBEGROUP;
         RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, mb,
           eptrb, RM63);
         if (rrc == MATCH_KETRPOS)
@@ -1255,7 +1255,7 @@


     for (;;)
       {
-      if (op >= OP_SBRA) mb->match_function_type = MATCH_CBEGROUP;
+      if (op >= OP_SBRA) mb->match_function_type |= MATCH_CBEGROUP;
       RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, mb,
         eptrb, RM48);
       if (rrc == MATCH_KETRPOS)
@@ -1404,11 +1404,11 @@
       break;


       /* The condition is an assertion. Call match() to evaluate it - setting
-      mb->match_function_type to MATCH_CONDASSERT causes it to stop at the end
-      of an assertion. */
+      the MATCH_CONDASSERT bit in mb->match_function_type causes it to stop at
+      the end of an assertion. */


       default:
-      mb->match_function_type = MATCH_CONDASSERT;
+      mb->match_function_type |= MATCH_CONDASSERT;
       RMATCH(eptr, ecode, offset_top, mb, NULL, RM3);
       if (rrc == MATCH_MATCH)
         {
@@ -1459,7 +1459,7 @@
         goto TAIL_RECURSE;
         }


-      mb->match_function_type = MATCH_CBEGROUP;
+      mb->match_function_type |= MATCH_CBEGROUP;
       RMATCH(eptr, ecode, offset_top, mb, eptrb, RM49);
       RRETURN(rrc);
       }
@@ -1548,10 +1548,10 @@
     case OP_ASSERT:
     case OP_ASSERTBACK:
     save_mark = mb->mark;
-    if (mb->match_function_type == MATCH_CONDASSERT)
+    if ((mb->match_function_type & MATCH_CONDASSERT) != 0)
       {
       condassert = TRUE;
-      mb->match_function_type = 0;
+      mb->match_function_type &= ~MATCH_CONDASSERT;
       }
     else condassert = FALSE;


@@ -1619,10 +1619,10 @@
     case OP_ASSERT_NOT:
     case OP_ASSERTBACK_NOT:
     save_mark = mb->mark;
-    if (mb->match_function_type == MATCH_CONDASSERT)
+    if ((mb->match_function_type & MATCH_CONDASSERT) != 0)
       {
       condassert = TRUE;
-      mb->match_function_type = 0;
+      mb->match_function_type &= ~MATCH_CONDASSERT;
       }
     else condassert = FALSE;


@@ -1844,7 +1844,7 @@
       cbegroup = (*callpat >= OP_SBRA);
       do
         {
-        if (cbegroup) mb->match_function_type = MATCH_CBEGROUP;
+        if (cbegroup) mb->match_function_type |= MATCH_CBEGROUP;
         RMATCH(eptr, callpat + PRIV(OP_lengths)[*callpat], offset_top,
           mb, eptrb, RM6);
         memcpy(mb->ovector, new_recursive.ovec_save,


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/testdata/testinput2    2015-01-28 17:31:11 UTC (rev 190)
@@ -4078,10 +4078,10 @@


# End of substitute tests

-"((?=(?(?=(?(?=(?(?=())))*)))))"
+"((?=(?(?=(?(?=(?(?=()))))))))"
     a


-"(?(?=)?==)(((((((((?=)))))))))"
+"(?(?=)==)(((((((((?=)))))))))"
     a


/(a)(b)|(c)/
@@ -4138,4 +4138,18 @@

/(?<N>(?J)(?<N>))(?-J)\k<N>/

+# Quantifiers are not allowed on condition assertions, but are otherwise
+# OK in conditions.
+
+/(?(?=0)?)+/
+
+/(?(?=0)(?=00)?00765)/
+     00765
+
+/(?(?=0)(?=00)?00765|(?!3).56)/
+     00765
+     456
+     ** Failers
+     356   
+
 # End of testinput2 


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/testdata/testoutput2    2015-01-28 17:31:11 UTC (rev 190)
@@ -147,7 +147,7 @@
 Failed: error 108 at offset 3: range out of order in character class


/^*/
-Failed: error 109 at offset 1: nothing to repeat
+Failed: error 109 at offset 1: quantifier does not follow a repeatable item

/(abc/
Failed: error 114 at offset 4: missing closing parenthesis
@@ -230,7 +230,7 @@
Failed: error 115 at offset 6: reference to non-existent subpattern

/{4,5}abc/
-Failed: error 109 at offset 4: nothing to repeat
+Failed: error 109 at offset 4: quantifier does not follow a repeatable item

/(a)(b)(c)\2/I
Capturing subpattern count = 3
@@ -883,10 +883,10 @@
Failed: error 106 at offset 2: missing terminating ] for character class

/*a/
-Failed: error 109 at offset 0: nothing to repeat
+Failed: error 109 at offset 0: quantifier does not follow a repeatable item

/(*)b/
-Failed: error 109 at offset 1: nothing to repeat
+Failed: error 109 at offset 1: quantifier does not follow a repeatable item

/abc)/
Failed: error 122 at offset 3: unmatched closing parenthesis
@@ -895,7 +895,7 @@
Failed: error 114 at offset 4: missing closing parenthesis

/a**/
-Failed: error 109 at offset 2: nothing to repeat
+Failed: error 109 at offset 2: quantifier does not follow a repeatable item

/)(/
Failed: error 122 at offset 0: unmatched closing parenthesis
@@ -919,10 +919,10 @@
Failed: error 106 at offset 2: missing terminating ] for character class

/*a/Ii
-Failed: error 109 at offset 0: nothing to repeat
+Failed: error 109 at offset 0: quantifier does not follow a repeatable item

/(*)b/Ii
-Failed: error 109 at offset 1: nothing to repeat
+Failed: error 109 at offset 1: quantifier does not follow a repeatable item

/abc)/Ii
Failed: error 122 at offset 3: unmatched closing parenthesis
@@ -931,7 +931,7 @@
Failed: error 114 at offset 4: missing closing parenthesis

/a**/Ii
-Failed: error 109 at offset 2: nothing to repeat
+Failed: error 109 at offset 2: quantifier does not follow a repeatable item

/)(/Ii
Failed: error 122 at offset 0: unmatched closing parenthesis
@@ -3025,16 +3025,16 @@
Subject length lower bound = 3

/a+?+/I
-Failed: error 109 at offset 3: nothing to repeat
+Failed: error 109 at offset 3: quantifier does not follow a repeatable item

/a{2,3}?+b/I
-Failed: error 109 at offset 7: nothing to repeat
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item

/(?U)a+?+/I
-Failed: error 109 at offset 7: nothing to repeat
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item

/a{2,3}?+b/I,ungreedy
-Failed: error 109 at offset 7: nothing to repeat
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item

/x(?U)a++b/IB
------------------------------------------------------------------
@@ -8816,7 +8816,7 @@
0: a

/a(*FAIL)+b/
-Failed: error 109 at offset 8: nothing to repeat
+Failed: error 109 at offset 8: quantifier does not follow a repeatable item

/(abc|pqr|123){0}[xyz]/I
Capturing subpattern count = 1
@@ -13724,13 +13724,13 @@

# End of substitute tests

-"((?=(?(?=(?(?=(?(?=())))*)))))"
+"((?=(?(?=(?(?=(?(?=()))))))))"
     a
  0: 
  1: 
  2: 


-"(?(?=)?==)(((((((((?=)))))))))"
+"(?(?=)==)(((((((((?=)))))))))"
     a
 No match


@@ -13897,4 +13897,24 @@

/(?<N>(?J)(?<N>))(?-J)\k<N>/

+# Quantifiers are not allowed on condition assertions, but are otherwise
+# OK in conditions.
+
+/(?(?=0)?)+/
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item
+
+/(?(?=0)(?=00)?00765)/
+     00765
+ 0: 00765
+
+/(?(?=0)(?=00)?00765|(?!3).56)/
+     00765
+ 0: 00765
+     456
+ 0: 456
+     ** Failers
+No match
+     356   
+No match
+
 # End of testinput2