Revision: 1202
http://www.exim.org/viewvc/pcre2?view=rev&revision=1202
Author: ph10
Date: 2020-01-01 12:07:02 +0000 (Wed, 01 Jan 2020)
Log Message:
-----------
Allow real repetition of assertions.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/html/pcre2pattern.html
code/trunk/doc/pcre2.txt
code/trunk/doc/pcre2pattern.3
code/trunk/src/pcre2_compile.c
code/trunk/testdata/testinput1
code/trunk/testdata/testoutput1
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/ChangeLog 2020-01-01 12:07:02 UTC (rev 1202)
@@ -32,7 +32,14 @@
regex engine. The Perl regex folks are aware of this usage and have made a note
about it.
+9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
+1, believing that repeating an assertion is pointless. However, if a positive
+assertion contains capturing groups, repetition can be useful. In any case, an
+assertion could always be wrapped in a repeated group. The only restriction
+that is now imposed is that an unlimited maximum is changed to one more than
+the minimum.
+
Version 10.34 21-November-2019
------------------------------
Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/doc/html/pcre2pattern.html 2020-01-01 12:07:02 UTC (rev 1202)
@@ -1901,8 +1901,8 @@
(?|(?<AA>aa)|(?<AA>bb))
</pre>
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
-option at compile time, or by the use of (?J) within the pattern, as described
-in the section entitled
+option at compile time, or by the use of (?J) within the pattern, as described
+in the section entitled
<a href="#internaloptions">"Internal Option Setting"</a>
above.
</P>
@@ -1968,7 +1968,7 @@
an escape such as \d or \pL that matches a single character
a character class
a backreference
- a parenthesized group (including most assertions)
+ a parenthesized group (including lookaround assertions)
a subroutine call (recursive or otherwise)
</pre>
The general repetition quantifier specifies a minimum and maximum number of
@@ -2359,7 +2359,7 @@
For versions of PCRE2 less than 10.25, backreferences of this type used to
cause the group that they reference to be treated as an
<a href="#atomicgroup">atomic group.</a>
-This restriction no longer applies, and backtracking into such groups can occur
+This restriction no longer applies, and backtracking into such groups can occur
as normal.
<a name="bigassertions"></a></P>
<br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
@@ -2420,26 +2420,13 @@
strings within the assertion.
</P>
<P>
-For compatibility with Perl, most assertion groups may be repeated; though it
-makes no sense to assert the same thing several times, the side effect of
-capturing may occasionally be useful. However, an assertion that forms the
-condition for a conditional group may not be quantified. In practice, for
-other assertions, there only three cases:
-<br>
-<br>
-(1) If the quantifier is {0}, the assertion is never obeyed during matching.
-However, it may contain internal capture groups that are called from elsewhere
-via the
-<a href="#groupsassubroutines">subroutine mechanism.</a>
-<br>
-<br>
-(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
-were {0,1}. At run time, the rest of the pattern match is tried with and
-without the assertion, the order depending on the greediness of the quantifier.
-<br>
-<br>
-(3) If the minimum repetition is greater than zero, the quantifier is ignored.
-The assertion is obeyed just once when encountered during matching.
+Most assertion groups may be repeated; though it makes no sense to assert the
+same thing several times, the side effect of capturing in positive assertions
+may occasionally be useful. However, an assertion that forms the condition for
+a conditional group may not be quantified. PCRE2 used to restrict the
+repetition of assertions, but from release 10.35 the only restriction is that
+an unlimited maximum repetition is changed to be one more than the minimum. For
+example, {3,} is treated as {3,4}.
</P>
<br><b>
Alphabetic assertion names
@@ -3840,9 +3827,9 @@
</P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 29 December 2019
+Last updated: 01 January 2020
<br>
-Copyright © 1997-2019 University of Cambridge.
+Copyright © 1997-2020 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/doc/pcre2.txt 2020-01-01 12:07:02 UTC (rev 1202)
@@ -7729,7 +7729,7 @@
an escape such as \d or \pL that matches a single character
a character class
a backreference
- a parenthesized group (including most assertions)
+ a parenthesized group (including lookaround assertions)
a subroutine call (recursive or otherwise)
The general repetition quantifier specifies a minimum and maximum num-
@@ -8162,25 +8162,15 @@
passes to the previous backtracking point, thus discarding any captured
strings within the assertion.
- For compatibility with Perl, most assertion groups may be repeated;
- though it makes no sense to assert the same thing several times, the
- side effect of capturing may occasionally be useful. However, an asser-
- tion that forms the condition for a conditional group may not be quan-
- tified. In practice, for other assertions, there only three cases:
+ Most assertion groups may be repeated; though it makes no sense to as-
+ sert the same thing several times, the side effect of capturing in pos-
+ itive assertions may occasionally be useful. However, an assertion that
+ forms the condition for a conditional group may not be quantified.
+ PCRE2 used to restrict the repetition of assertions, but from release
+ 10.35 the only restriction is that an unlimited maximum repetition is
+ changed to be one more than the minimum. For example, {3,} is treated
+ as {3,4}.
- (1) If the quantifier is {0}, the assertion is never obeyed during
- matching. However, it may contain internal capture groups that are
- called from elsewhere via the subroutine mechanism.
-
- (2) If quantifier is {0,n} where n is greater than zero, it is treated
- as if it were {0,1}. At run time, the rest of the pattern match is
- tried with and without the assertion, the order depending on the greed-
- iness of the quantifier.
-
- (3) If the minimum repetition is greater than zero, the quantifier is
- ignored. The assertion is obeyed just once when encountered during
- matching.
-
Alphabetic assertion names
Traditionally, symbolic sequences such as (?= and (?<= have been used
@@ -9490,8 +9480,8 @@
REVISION
- Last updated: 29 December 2019
- Copyright (c) 1997-2019 University of Cambridge.
+ Last updated: 01 January 2020
+ Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------
Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/doc/pcre2pattern.3 2020-01-01 12:07:02 UTC (rev 1202)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "29 December 2019" "PCRE2 10.35"
+.TH PCRE2PATTERN 3 "01 January 2020" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -1902,8 +1902,8 @@
(?|(?<AA>aa)|(?<AA>bb))
.sp
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
-option at compile time, or by the use of (?J) within the pattern, as described
-in the section entitled
+option at compile time, or by the use of (?J) within the pattern, as described
+in the section entitled
.\" HTML <a href="#internaloptions">
.\" </a>
"Internal Option Setting"
@@ -1975,7 +1975,7 @@
an escape such as \ed or \epL that matches a single character
a character class
a backreference
- a parenthesized group (including most assertions)
+ a parenthesized group (including lookaround assertions)
a subroutine call (recursive or otherwise)
.sp
The general repetition quantifier specifies a minimum and maximum number of
@@ -2362,7 +2362,7 @@
.\" </a>
atomic group.
.\"
-This restriction no longer applies, and backtracking into such groups can occur
+This restriction no longer applies, and backtracking into such groups can occur
as normal.
.
.
@@ -2431,26 +2431,13 @@
control passes to the previous backtracking point, thus discarding any captured
strings within the assertion.
.P
-For compatibility with Perl, most assertion groups may be repeated; though it
-makes no sense to assert the same thing several times, the side effect of
-capturing may occasionally be useful. However, an assertion that forms the
-condition for a conditional group may not be quantified. In practice, for
-other assertions, there only three cases:
-.sp
-(1) If the quantifier is {0}, the assertion is never obeyed during matching.
-However, it may contain internal capture groups that are called from elsewhere
-via the
-.\" HTML <a href="#groupsassubroutines">
-.\" </a>
-subroutine mechanism.
-.\"
-.sp
-(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
-were {0,1}. At run time, the rest of the pattern match is tried with and
-without the assertion, the order depending on the greediness of the quantifier.
-.sp
-(3) If the minimum repetition is greater than zero, the quantifier is ignored.
-The assertion is obeyed just once when encountered during matching.
+Most assertion groups may be repeated; though it makes no sense to assert the
+same thing several times, the side effect of capturing in positive assertions
+may occasionally be useful. However, an assertion that forms the condition for
+a conditional group may not be quantified. PCRE2 used to restrict the
+repetition of assertions, but from release 10.35 the only restriction is that
+an unlimited maximum repetition is changed to be one more than the minimum. For
+example, {3,} is treated as {3,4}.
.
.
.SS "Alphabetic assertion names"
@@ -3884,6 +3871,6 @@
.rs
.sp
.nf
-Last updated: 29 December 2019
-Copyright (c) 1997-2019 University of Cambridge.
+Last updated: 01 January 2020
+Copyright (c) 1997-2020 University of Cambridge.
.fi
Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/src/pcre2_compile.c 2020-01-01 12:07:02 UTC (rev 1202)
@@ -7,7 +7,7 @@
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
- New API code Copyright (c) 2016-2019 University of Cambridge
+ New API code Copyright (c) 2016-2020 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -7074,15 +7074,18 @@
previous[GET(previous, 1)] != OP_ALT)
goto END_REPEAT;
- /* There is no sense in actually repeating assertions. The only
- potential use of repetition is in cases when the assertion is optional.
- Therefore, if the minimum is greater than zero, just ignore the repeat.
- If the maximum is not zero or one, set it to 1. */
+ /* Perl allows all assertions to be quantified, and when they contain
+ capturing parentheses and/or are optional there are potential uses for
+ this feature. PCRE2 used to force the maximum quantifier to 1 on the
+ invalid grounds that further repetition was never useful. This was
+ always a bit pointless, since an assertion could be wrapped with a
+ repeated group to achieve the effect. General repetition is now
+ permitted, but if the maximum is unlimited it is set to one more than
+ the minimum. */
if (op_previous < OP_ONCE) /* Assertion */
{
- if (repeat_min > 0) goto END_REPEAT;
- if (repeat_max > 1) repeat_max = 1;
+ if (repeat_max == REPEAT_UNLIMITED) repeat_max = repeat_min + 1;
}
/* The case of a zero minimum is special because of the need to stick
Modified: code/trunk/testdata/testinput1
===================================================================
--- code/trunk/testdata/testinput1 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/testdata/testinput1 2020-01-01 12:07:02 UTC (rev 1202)
@@ -6393,4 +6393,13 @@
/^((\1+)|\d)+133X$/
111133X
+/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
+ The quick brown fox jumps over the lazy dog.
+ Jackdaws love my big sphinx of quartz.
+ Pack my box with five dozen liquor jugs.
+\= Expect no match
+ The quick brown fox jumps over the lazy cat.
+ Hackdaws love my big sphinx of quartz.
+ Pack my fox with five dozen liquor jugs.
+
# End of testinput1
Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/testdata/testoutput1 2020-01-01 12:07:02 UTC (rev 1202)
@@ -10126,4 +10126,25 @@
1: 11
2: 11
+/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
+ The quick brown fox jumps over the lazy dog.
+ 0:
+ 1: quick brown fox jumps over the lazy dog.
+ 2: q
+ Jackdaws love my big sphinx of quartz.
+ 0:
+ 1: Jackdaws love my big sphinx of quartz.
+ 2: J
+ Pack my box with five dozen liquor jugs.
+ 0:
+ 1: Pack my box with five dozen liquor jugs.
+ 2: P
+\= Expect no match
+ The quick brown fox jumps over the lazy cat.
+No match
+ Hackdaws love my big sphinx of quartz.
+No match
+ Pack my fox with five dozen liquor jugs.
+No match
+
# End of testinput1
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/testdata/testoutput2 2020-01-01 12:07:02 UTC (rev 1202)
@@ -10962,8 +10962,14 @@
Assert
abc
Ket
+ Assert
abc
Ket
+ Assert
+ abc
+ Ket
+ abc
+ Ket
End
------------------------------------------------------------------
@@ -10973,8 +10979,12 @@
Assert
abc
Ket
+ Brazero
+ Assert
abc
Ket
+ abc
+ Ket
End
------------------------------------------------------------------
@@ -10981,11 +10991,17 @@
/(?=abc)++abc/B
------------------------------------------------------------------
Bra
+ Once
Assert
abc
Ket
+ Brazero
+ Assert
abc
Ket
+ Ket
+ abc
+ Ket
End
------------------------------------------------------------------
@@ -16610,6 +16626,19 @@
Assert
Any
Ket
+ Assert
+ Any
+ Ket
+ Assert
+ Any
+ Ket
+ Assert
+ Any
+ Ket
+ Brazero
+ Assert
+ Any
+ Ket
x
Ket
Ket