Revision: 190
http://www.exim.org/viewvc/pcre2?view=rev&revision=190
Author: ph10
Date: 2015-01-28 17:31:11 +0000 (Wed, 28 Jan 2015)
Log Message:
-----------
Disallow quantification of assertion conditions, for Perl compatibility (and in
any case it didn't always work).
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcre2pattern.3
code/trunk/src/pcre2_compile.c
code/trunk/src/pcre2_error.c
code/trunk/src/pcre2_intmodedep.h
code/trunk/src/pcre2_match.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/ChangeLog 2015-01-28 17:31:11 UTC (rev 190)
@@ -44,7 +44,18 @@
segfault at compile time (while trying to find the minimum matching length).
The infinite loop is now broken (with the minimum length unset, that is, zero).
+9. If an assertion that was used as a condition was quantified with a minimum
+of zero, matching went wrong. In particular, if the whole group had unlimited
+repetition and could match an empty string, a segfault was likely. The pattern
+(?(?=0)?)+ is an example that caused this. Perl allows assertions to be
+quantified, but not if they are being used as conditions, so the above pattern
+is faulted by Perl. PCRE2 has now been changed so that it also rejects such
+patterns.
+10. The error message for an invalid quantifier has been changed from "nothing
+to repeat" to "quantifier does not follow a repeatable item".
+
+
Version 10.00 05-January-2015
-----------------------------
Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/doc/pcre2pattern.3 2015-01-28 17:31:11 UTC (rev 190)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "26 January 2015" "PCRE2 10.00"
+.TH PCRE2PATTERN 3 "28 January 2015" "PCRE2 10.00"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -1742,8 +1742,8 @@
the \eR escape sequence
an escape such as \ed or \epL that matches a single character
a character class
- a back reference (see next section)
- a parenthesized subpattern (including assertions)
+ a back reference
+ a parenthesized subpattern (including most assertions)
a subroutine call to a subpattern (recursive or otherwise)
.sp
The general repetition quantifier specifies a minimum and maximum number of
@@ -2152,10 +2152,11 @@
capturing is carried out only for positive assertions. (Perl sometimes, but not
always, does do capturing in negative assertions.)
.P
-For compatibility with Perl, assertion subpatterns may be repeated; though
+For compatibility with Perl, most assertion subpatterns may be repeated; though
it makes no sense to assert the same thing several times, the side effect of
-capturing parentheses may occasionally be useful. In practice, there only three
-cases:
+capturing parentheses may occasionally be useful. However, an assertion that
+forms the condition for a conditional subpattern may not be quantified. In
+practice, for other assertions, there only three cases:
.sp
(1) If the quantifier is {0}, the assertion is never obeyed during matching.
However, it may contain internal capturing parenthesized groups that are called
@@ -3301,6 +3302,6 @@
.rs
.sp
.nf
-Last updated: 26 January 2015
+Last updated: 28 January 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi
Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_compile.c 2015-01-28 17:31:11 UTC (rev 190)
@@ -5178,14 +5178,19 @@
}
/* For conditions that are assertions, check the syntax, and then exit
- the switch. This will take control down to where bracketed groups,
- including assertions, are processed. */
+ the switch. This will take control down to where bracketed groups
+ are processed. The assertion will be handled as part of the group,
+ but we need to identify this case because the conditional assertion may
+ not be quantifier. */
if (tempptr[1] == CHAR_QUESTION_MARK &&
(tempptr[2] == CHAR_EQUALS_SIGN ||
tempptr[2] == CHAR_EXCLAMATION_MARK ||
tempptr[2] == CHAR_LESS_THAN_SIGN))
+ {
+ cb->iscondassert = TRUE;
break;
+ }
/* Other conditions use OP_CREF/OP_DNCREF/OP_RREF/OP_DNRREF, and all
need to skip at least 1+IMM2_SIZE bytes at the start of the group. */
@@ -6098,12 +6103,22 @@
goto FAILED;
}
- /* Assertions used not to be repeatable, but this was changed for Perl
- compatibility, so all kinds can now be repeated. We copy code into a
+ /* All assertions used not to be repeatable, but this was changed for Perl
+ compatibility. All kinds can now be repeated except for assertions that are
+ conditions (Perl also forbids these to be repeated). We copy code into a
non-register variable (tempcode) in order to be able to pass its address
- because some compilers complain otherwise. */
+ because some compilers complain otherwise. At the start of a conditional
+ group whose condition is an assertion, cb->iscondassert is set. We unset it
+ here so as to allow assertions later in the group to be quantified. */
- previous = code; /* For handling repetition */
+ if (bravalue >= OP_ASSERT && bravalue <= OP_ASSERTBACK_NOT &&
+ cb->iscondassert)
+ {
+ previous = NULL;
+ cb->iscondassert = FALSE;
+ }
+ else previous = code;
+
*code = bravalue;
tempcode = code;
tempreqvary = cb->req_varyopt; /* Save value before bracket */
@@ -6121,9 +6136,9 @@
skipbytes, /* Skip over bracket number */
cond_depth +
((bravalue == OP_COND)?1:0), /* Depth of condition subpatterns */
- &subfirstcu, /* For possible first char */
+ &subfirstcu, /* For possible first char */
&subfirstcuflags,
- &subreqcu, /* For possible last char */
+ &subreqcu, /* For possible last char */
&subreqcuflags,
bcptr, /* Current branch chain */
cb, /* Compile data block */
@@ -7474,6 +7489,7 @@
cb.external_flags = 0;
cb.external_options = options;
cb.hwm = cworkspace;
+cb.iscondassert = FALSE;
cb.max_lookbehind = 0;
cb.name_entry_size = 0;
cb.name_table = NULL;
@@ -7725,6 +7741,7 @@
cb.name_table = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code));
cb.start_code = codestart;
cb.hwm = (PCRE2_UCHAR *)(cb.start_workspace);
+cb.iscondassert = FALSE;
cb.req_varyopt = 0;
cb.had_accept = FALSE;
cb.had_pruneorskip = FALSE;
Modified: code/trunk/src/pcre2_error.c
===================================================================
--- code/trunk/src/pcre2_error.c 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_error.c 2015-01-28 17:31:11 UTC (rev 190)
@@ -74,7 +74,7 @@
"missing terminating ] for character class\0"
"invalid escape sequence in character class\0"
"range out of order in character class\0"
- "nothing to repeat\0"
+ "quantifier does not follow a repeatable item\0"
/* 10 */
"internal error: unexpected repeat\0"
"unrecognized character after (? or (?-\0"
Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_intmodedep.h 2015-01-28 17:31:11 UTC (rev 190)
@@ -688,6 +688,7 @@
BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */
BOOL check_lookbehind; /* Lookbehinds need later checking */
BOOL dupnames; /* Duplicate names exist */
+ BOOL iscondassert; /* Next assert is a condition */
} compile_block;
/* Structure for keeping the properties of the in-memory stack used
Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/src/pcre2_match.c 2015-01-28 17:31:11 UTC (rev 190)
@@ -75,7 +75,7 @@
#define OVFLMASK 0xffff0000 /* The bits used for the overflow flag */
#define OVFLBIT 0x00010000 /* The bit that is set for overflow */
-/* Values for setting in mb->match_function_type to indicate two special types
+/* Bits for setting in mb->match_function_type to indicate two special types
of call to match(). We do it this way to save on using another stack variable,
as stack usage is to be discouraged. */
@@ -487,7 +487,7 @@
do
{
- if (cbegroup) mb->match_function_type = MATCH_CBEGROUP;
+ if (cbegroup) mb->match_function_type |= MATCH_CBEGROUP;
rrc = match(eptr, callpat + PRIV(OP_lengths)[*callpat], mstart, offset_top,
mb, eptrb, rdepth + 1);
memcpy(mb->ovector, new_recursive->ovec_save,
@@ -771,9 +771,9 @@
if (rdepth >= mb->match_limit_recursion) RRETURN(PCRE2_ERROR_RECURSIONLIMIT);
/* At the start of a group with an unlimited repeat that may match an empty
-string, the variable mb->match_function_type is set to MATCH_CBEGROUP. It is
-done this way to save having to use another function argument, which would take
-up space on the stack. See also MATCH_CONDASSERT below.
+string, the variable mb->match_function_type contains the MATCH_CBEGROUP bit.
+It is done this way to save having to use another function argument, which
+would take up space on the stack. See also MATCH_CONDASSERT below.
When MATCH_CBEGROUP is set, add the current subject pointer to the chain of
such remembered pointers, to be checked when we hit the closing ket, in order
@@ -782,12 +782,12 @@
NOT be used with tail recursion, because the memory block that is used is on
the stack, so a new one may be required for each match(). */
-if (mb->match_function_type == MATCH_CBEGROUP)
+if ((mb->match_function_type & MATCH_CBEGROUP) != 0)
{
newptrb.epb_saved_eptr = eptr;
newptrb.epb_prev = eptrb;
eptrb = &newptrb;
- mb->match_function_type = 0;
+ mb->match_function_type &= ~MATCH_CBEGROUP;
}
/* Now, at last, we can start processing the opcodes. */
@@ -1016,7 +1016,7 @@
for (;;)
{
- if (op >= OP_SBRA) mb->match_function_type = MATCH_CBEGROUP;
+ if (op >= OP_SBRA) mb->match_function_type |= MATCH_CBEGROUP;
RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, mb,
eptrb, RM1);
if (rrc == MATCH_ONCE) break; /* Backing up through an atomic group */
@@ -1091,7 +1091,7 @@
for (;;)
{
if (op >= OP_SBRA || op == OP_ONCE)
- mb->match_function_type = MATCH_CBEGROUP;
+ mb->match_function_type |= MATCH_CBEGROUP;
/* If this is not a possibly empty group, and there are no (*THEN)s in
the pattern, and this is the final alternative, optimize as described
@@ -1181,7 +1181,7 @@
for (;;)
{
mb->ovector[mb->offset_end - number] = eptr - mb->start_subject;
- if (op >= OP_SBRA) mb->match_function_type = MATCH_CBEGROUP;
+ if (op >= OP_SBRA) mb->match_function_type |= MATCH_CBEGROUP;
RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, mb,
eptrb, RM63);
if (rrc == MATCH_KETRPOS)
@@ -1255,7 +1255,7 @@
for (;;)
{
- if (op >= OP_SBRA) mb->match_function_type = MATCH_CBEGROUP;
+ if (op >= OP_SBRA) mb->match_function_type |= MATCH_CBEGROUP;
RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, mb,
eptrb, RM48);
if (rrc == MATCH_KETRPOS)
@@ -1404,11 +1404,11 @@
break;
/* The condition is an assertion. Call match() to evaluate it - setting
- mb->match_function_type to MATCH_CONDASSERT causes it to stop at the end
- of an assertion. */
+ the MATCH_CONDASSERT bit in mb->match_function_type causes it to stop at
+ the end of an assertion. */
default:
- mb->match_function_type = MATCH_CONDASSERT;
+ mb->match_function_type |= MATCH_CONDASSERT;
RMATCH(eptr, ecode, offset_top, mb, NULL, RM3);
if (rrc == MATCH_MATCH)
{
@@ -1459,7 +1459,7 @@
goto TAIL_RECURSE;
}
- mb->match_function_type = MATCH_CBEGROUP;
+ mb->match_function_type |= MATCH_CBEGROUP;
RMATCH(eptr, ecode, offset_top, mb, eptrb, RM49);
RRETURN(rrc);
}
@@ -1548,10 +1548,10 @@
case OP_ASSERT:
case OP_ASSERTBACK:
save_mark = mb->mark;
- if (mb->match_function_type == MATCH_CONDASSERT)
+ if ((mb->match_function_type & MATCH_CONDASSERT) != 0)
{
condassert = TRUE;
- mb->match_function_type = 0;
+ mb->match_function_type &= ~MATCH_CONDASSERT;
}
else condassert = FALSE;
@@ -1619,10 +1619,10 @@
case OP_ASSERT_NOT:
case OP_ASSERTBACK_NOT:
save_mark = mb->mark;
- if (mb->match_function_type == MATCH_CONDASSERT)
+ if ((mb->match_function_type & MATCH_CONDASSERT) != 0)
{
condassert = TRUE;
- mb->match_function_type = 0;
+ mb->match_function_type &= ~MATCH_CONDASSERT;
}
else condassert = FALSE;
@@ -1844,7 +1844,7 @@
cbegroup = (*callpat >= OP_SBRA);
do
{
- if (cbegroup) mb->match_function_type = MATCH_CBEGROUP;
+ if (cbegroup) mb->match_function_type |= MATCH_CBEGROUP;
RMATCH(eptr, callpat + PRIV(OP_lengths)[*callpat], offset_top,
mb, eptrb, RM6);
memcpy(mb->ovector, new_recursive.ovec_save,
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/testdata/testinput2 2015-01-28 17:31:11 UTC (rev 190)
@@ -4078,10 +4078,10 @@
# End of substitute tests
-"((?=(?(?=(?(?=(?(?=())))*)))))"
+"((?=(?(?=(?(?=(?(?=()))))))))"
a
-"(?(?=)?==)(((((((((?=)))))))))"
+"(?(?=)==)(((((((((?=)))))))))"
a
/(a)(b)|(c)/
@@ -4138,4 +4138,18 @@
/(?<N>(?J)(?<N>))(?-J)\k<N>/
+# Quantifiers are not allowed on condition assertions, but are otherwise
+# OK in conditions.
+
+/(?(?=0)?)+/
+
+/(?(?=0)(?=00)?00765)/
+ 00765
+
+/(?(?=0)(?=00)?00765|(?!3).56)/
+ 00765
+ 456
+ ** Failers
+ 356
+
# End of testinput2
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2015-01-28 11:28:53 UTC (rev 189)
+++ code/trunk/testdata/testoutput2 2015-01-28 17:31:11 UTC (rev 190)
@@ -147,7 +147,7 @@
Failed: error 108 at offset 3: range out of order in character class
/^*/
-Failed: error 109 at offset 1: nothing to repeat
+Failed: error 109 at offset 1: quantifier does not follow a repeatable item
/(abc/
Failed: error 114 at offset 4: missing closing parenthesis
@@ -230,7 +230,7 @@
Failed: error 115 at offset 6: reference to non-existent subpattern
/{4,5}abc/
-Failed: error 109 at offset 4: nothing to repeat
+Failed: error 109 at offset 4: quantifier does not follow a repeatable item
/(a)(b)(c)\2/I
Capturing subpattern count = 3
@@ -883,10 +883,10 @@
Failed: error 106 at offset 2: missing terminating ] for character class
/*a/
-Failed: error 109 at offset 0: nothing to repeat
+Failed: error 109 at offset 0: quantifier does not follow a repeatable item
/(*)b/
-Failed: error 109 at offset 1: nothing to repeat
+Failed: error 109 at offset 1: quantifier does not follow a repeatable item
/abc)/
Failed: error 122 at offset 3: unmatched closing parenthesis
@@ -895,7 +895,7 @@
Failed: error 114 at offset 4: missing closing parenthesis
/a**/
-Failed: error 109 at offset 2: nothing to repeat
+Failed: error 109 at offset 2: quantifier does not follow a repeatable item
/)(/
Failed: error 122 at offset 0: unmatched closing parenthesis
@@ -919,10 +919,10 @@
Failed: error 106 at offset 2: missing terminating ] for character class
/*a/Ii
-Failed: error 109 at offset 0: nothing to repeat
+Failed: error 109 at offset 0: quantifier does not follow a repeatable item
/(*)b/Ii
-Failed: error 109 at offset 1: nothing to repeat
+Failed: error 109 at offset 1: quantifier does not follow a repeatable item
/abc)/Ii
Failed: error 122 at offset 3: unmatched closing parenthesis
@@ -931,7 +931,7 @@
Failed: error 114 at offset 4: missing closing parenthesis
/a**/Ii
-Failed: error 109 at offset 2: nothing to repeat
+Failed: error 109 at offset 2: quantifier does not follow a repeatable item
/)(/Ii
Failed: error 122 at offset 0: unmatched closing parenthesis
@@ -3025,16 +3025,16 @@
Subject length lower bound = 3
/a+?+/I
-Failed: error 109 at offset 3: nothing to repeat
+Failed: error 109 at offset 3: quantifier does not follow a repeatable item
/a{2,3}?+b/I
-Failed: error 109 at offset 7: nothing to repeat
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item
/(?U)a+?+/I
-Failed: error 109 at offset 7: nothing to repeat
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item
/a{2,3}?+b/I,ungreedy
-Failed: error 109 at offset 7: nothing to repeat
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item
/x(?U)a++b/IB
------------------------------------------------------------------
@@ -8816,7 +8816,7 @@
0: a
/a(*FAIL)+b/
-Failed: error 109 at offset 8: nothing to repeat
+Failed: error 109 at offset 8: quantifier does not follow a repeatable item
/(abc|pqr|123){0}[xyz]/I
Capturing subpattern count = 1
@@ -13724,13 +13724,13 @@
# End of substitute tests
-"((?=(?(?=(?(?=(?(?=())))*)))))"
+"((?=(?(?=(?(?=(?(?=()))))))))"
a
0:
1:
2:
-"(?(?=)?==)(((((((((?=)))))))))"
+"(?(?=)==)(((((((((?=)))))))))"
a
No match
@@ -13897,4 +13897,24 @@
/(?<N>(?J)(?<N>))(?-J)\k<N>/
+# Quantifiers are not allowed on condition assertions, but are otherwise
+# OK in conditions.
+
+/(?(?=0)?)+/
+Failed: error 109 at offset 7: quantifier does not follow a repeatable item
+
+/(?(?=0)(?=00)?00765)/
+ 00765
+ 0: 00765
+
+/(?(?=0)(?=00)?00765|(?!3).56)/
+ 00765
+ 0: 00765
+ 456
+ 0: 456
+ ** Failers
+No match
+ 356
+No match
+
# End of testinput2