[Pcre-svn] [1202] code/trunk: Allow real repetition of asser…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1202] code/trunk: Allow real repetition of assertions.
Revision: 1202
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1202
Author:   ph10
Date:     2020-01-01 12:07:02 +0000 (Wed, 01 Jan 2020)
Log Message:
-----------
Allow real repetition of assertions.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2pattern.3
    code/trunk/src/pcre2_compile.c
    code/trunk/testdata/testinput1
    code/trunk/testdata/testoutput1
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/ChangeLog    2020-01-01 12:07:02 UTC (rev 1202)
@@ -32,7 +32,14 @@
 regex engine. The Perl regex folks are aware of this usage and have made a note 
 about it.


+9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
+1, believing that repeating an assertion is pointless. However, if a positive
+assertion contains capturing groups, repetition can be useful. In any case, an
+assertion could always be wrapped in a repeated group. The only restriction
+that is now imposed is that an unlimited maximum is changed to one more than
+the minimum.

+
Version 10.34 21-November-2019
------------------------------


Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/doc/html/pcre2pattern.html    2020-01-01 12:07:02 UTC (rev 1202)
@@ -1901,8 +1901,8 @@
   (?|(?<AA>aa)|(?<AA>bb))
 </pre>
 The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
-option at compile time, or by the use of (?J) within the pattern, as described 
-in the section entitled 
+option at compile time, or by the use of (?J) within the pattern, as described
+in the section entitled
 <a href="#internaloptions">"Internal Option Setting"</a>
 above.
 </P>
@@ -1968,7 +1968,7 @@
   an escape such as \d or \pL that matches a single character
   a character class
   a backreference
-  a parenthesized group (including most assertions)
+  a parenthesized group (including lookaround assertions)
   a subroutine call (recursive or otherwise)
 </pre>
 The general repetition quantifier specifies a minimum and maximum number of
@@ -2359,7 +2359,7 @@
 For versions of PCRE2 less than 10.25, backreferences of this type used to
 cause the group that they reference to be treated as an
 <a href="#atomicgroup">atomic group.</a>
-This restriction no longer applies, and backtracking into such groups can occur 
+This restriction no longer applies, and backtracking into such groups can occur
 as normal.
 <a name="bigassertions"></a></P>
 <br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
@@ -2420,26 +2420,13 @@
 strings within the assertion.
 </P>
 <P>
-For compatibility with Perl, most assertion groups may be repeated; though it
-makes no sense to assert the same thing several times, the side effect of
-capturing may occasionally be useful. However, an assertion that forms the
-condition for a conditional group may not be quantified. In practice, for
-other assertions, there only three cases:
-<br>
-<br>
-(1) If the quantifier is {0}, the assertion is never obeyed during matching.
-However, it may contain internal capture groups that are called from elsewhere
-via the
-<a href="#groupsassubroutines">subroutine mechanism.</a>
-<br>
-<br>
-(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
-were {0,1}. At run time, the rest of the pattern match is tried with and
-without the assertion, the order depending on the greediness of the quantifier.
-<br>
-<br>
-(3) If the minimum repetition is greater than zero, the quantifier is ignored.
-The assertion is obeyed just once when encountered during matching.
+Most assertion groups may be repeated; though it makes no sense to assert the
+same thing several times, the side effect of capturing in positive assertions
+may occasionally be useful. However, an assertion that forms the condition for
+a conditional group may not be quantified. PCRE2 used to restrict the
+repetition of assertions, but from release 10.35 the only restriction is that
+an unlimited maximum repetition is changed to be one more than the minimum. For
+example, {3,} is treated as {3,4}.
 </P>
 <br><b>
 Alphabetic assertion names
@@ -3840,9 +3827,9 @@
 </P>
 <br><a name="SEC32" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 29 December 2019
+Last updated: 01 January 2020
 <br>
-Copyright &copy; 1997-2019 University of Cambridge.
+Copyright &copy; 1997-2020 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/doc/pcre2.txt    2020-01-01 12:07:02 UTC (rev 1202)
@@ -7729,7 +7729,7 @@
          an escape such as \d or \pL that matches a single character
          a character class
          a backreference
-         a parenthesized group (including most assertions)
+         a parenthesized group (including lookaround assertions)
          a subroutine call (recursive or otherwise)


        The  general repetition quantifier specifies a minimum and maximum num-
@@ -8162,25 +8162,15 @@
        passes to the previous backtracking point, thus discarding any captured
        strings within the assertion.


-       For  compatibility  with  Perl,  most assertion groups may be repeated;
-       though it makes no sense to assert the same thing  several  times,  the
-       side effect of capturing may occasionally be useful. However, an asser-
-       tion that forms the condition for a conditional group may not be  quan-
-       tified. In practice, for other assertions, there only three cases:
+       Most  assertion groups may be repeated; though it makes no sense to as-
+       sert the same thing several times, the side effect of capturing in pos-
+       itive assertions may occasionally be useful. However, an assertion that
+       forms the condition for a conditional  group  may  not  be  quantified.
+       PCRE2  used  to restrict the repetition of assertions, but from release
+       10.35 the only restriction is that an unlimited maximum  repetition  is
+       changed  to  be one more than the minimum. For example, {3,} is treated
+       as {3,4}.


-       (1)  If  the  quantifier  is  {0}, the assertion is never obeyed during
-       matching.  However, it may contain internal  capture  groups  that  are
-       called from elsewhere via the subroutine mechanism.
-
-       (2)  If quantifier is {0,n} where n is greater than zero, it is treated
-       as if it were {0,1}. At run time, the rest  of  the  pattern  match  is
-       tried with and without the assertion, the order depending on the greed-
-       iness of the quantifier.
-
-       (3) If the minimum repetition is greater than zero, the  quantifier  is
-       ignored.   The  assertion  is  obeyed just once when encountered during
-       matching.
-
    Alphabetic assertion names


        Traditionally, symbolic sequences such as (?= and (?<= have  been  used
@@ -9490,8 +9480,8 @@


REVISION

-       Last updated: 29 December 2019
-       Copyright (c) 1997-2019 University of Cambridge.
+       Last updated: 01 January 2020
+       Copyright (c) 1997-2020 University of Cambridge.
 ------------------------------------------------------------------------------




Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/doc/pcre2pattern.3    2020-01-01 12:07:02 UTC (rev 1202)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "29 December 2019" "PCRE2 10.35"
+.TH PCRE2PATTERN 3 "01 January 2020" "PCRE2 10.35"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -1902,8 +1902,8 @@
   (?|(?<AA>aa)|(?<AA>bb))
 .sp
 The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
-option at compile time, or by the use of (?J) within the pattern, as described 
-in the section entitled 
+option at compile time, or by the use of (?J) within the pattern, as described
+in the section entitled
 .\" HTML <a href="#internaloptions">
 .\" </a>
 "Internal Option Setting"
@@ -1975,7 +1975,7 @@
   an escape such as \ed or \epL that matches a single character
   a character class
   a backreference
-  a parenthesized group (including most assertions)
+  a parenthesized group (including lookaround assertions)
   a subroutine call (recursive or otherwise)
 .sp
 The general repetition quantifier specifies a minimum and maximum number of
@@ -2362,7 +2362,7 @@
 .\" </a>
 atomic group.
 .\"
-This restriction no longer applies, and backtracking into such groups can occur 
+This restriction no longer applies, and backtracking into such groups can occur
 as normal.
 .
 .
@@ -2431,26 +2431,13 @@
 control passes to the previous backtracking point, thus discarding any captured
 strings within the assertion.
 .P
-For compatibility with Perl, most assertion groups may be repeated; though it
-makes no sense to assert the same thing several times, the side effect of
-capturing may occasionally be useful. However, an assertion that forms the
-condition for a conditional group may not be quantified. In practice, for
-other assertions, there only three cases:
-.sp
-(1) If the quantifier is {0}, the assertion is never obeyed during matching.
-However, it may contain internal capture groups that are called from elsewhere
-via the
-.\" HTML <a href="#groupsassubroutines">
-.\" </a>
-subroutine mechanism.
-.\"
-.sp
-(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
-were {0,1}. At run time, the rest of the pattern match is tried with and
-without the assertion, the order depending on the greediness of the quantifier.
-.sp
-(3) If the minimum repetition is greater than zero, the quantifier is ignored.
-The assertion is obeyed just once when encountered during matching.
+Most assertion groups may be repeated; though it makes no sense to assert the
+same thing several times, the side effect of capturing in positive assertions
+may occasionally be useful. However, an assertion that forms the condition for
+a conditional group may not be quantified. PCRE2 used to restrict the
+repetition of assertions, but from release 10.35 the only restriction is that
+an unlimited maximum repetition is changed to be one more than the minimum. For
+example, {3,} is treated as {3,4}.
 .
 .
 .SS "Alphabetic assertion names"
@@ -3884,6 +3871,6 @@
 .rs
 .sp
 .nf
-Last updated: 29 December 2019
-Copyright (c) 1997-2019 University of Cambridge.
+Last updated: 01 January 2020
+Copyright (c) 1997-2020 University of Cambridge.
 .fi


Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/src/pcre2_compile.c    2020-01-01 12:07:02 UTC (rev 1202)
@@ -7,7 +7,7 @@


                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2019 University of Cambridge
+          New API code Copyright (c) 2016-2020 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -7074,15 +7074,18 @@
             previous[GET(previous, 1)] != OP_ALT)
           goto END_REPEAT;


-        /* There is no sense in actually repeating assertions. The only
-        potential use of repetition is in cases when the assertion is optional.
-        Therefore, if the minimum is greater than zero, just ignore the repeat.
-        If the maximum is not zero or one, set it to 1. */
+        /* Perl allows all assertions to be quantified, and when they contain
+        capturing parentheses and/or are optional there are potential uses for
+        this feature. PCRE2 used to force the maximum quantifier to 1 on the
+        invalid grounds that further repetition was never useful. This was
+        always a bit pointless, since an assertion could be wrapped with a
+        repeated group to achieve the effect. General repetition is now
+        permitted, but if the maximum is unlimited it is set to one more than
+        the minimum. */


         if (op_previous < OP_ONCE)    /* Assertion */
           {
-          if (repeat_min > 0) goto END_REPEAT;
-          if (repeat_max > 1) repeat_max = 1;
+          if (repeat_max == REPEAT_UNLIMITED) repeat_max = repeat_min + 1;
           }


         /* The case of a zero minimum is special because of the need to stick


Modified: code/trunk/testdata/testinput1
===================================================================
--- code/trunk/testdata/testinput1    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/testdata/testinput1    2020-01-01 12:07:02 UTC (rev 1202)
@@ -6393,4 +6393,13 @@
 /^((\1+)|\d)+133X$/
     111133X


+/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
+    The quick brown fox jumps over the lazy dog.
+    Jackdaws love my big sphinx of quartz.
+    Pack my box with five dozen liquor jugs.
+\= Expect no match
+    The quick brown fox jumps over the lazy cat.
+    Hackdaws love my big sphinx of quartz.
+    Pack my fox with five dozen liquor jugs.
+
 # End of testinput1 


Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/testdata/testoutput1    2020-01-01 12:07:02 UTC (rev 1202)
@@ -10126,4 +10126,25 @@
  1: 11
  2: 11


+/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
+    The quick brown fox jumps over the lazy dog.
+ 0: 
+ 1: quick brown fox jumps over the lazy dog.
+ 2: q
+    Jackdaws love my big sphinx of quartz.
+ 0: 
+ 1: Jackdaws love my big sphinx of quartz.
+ 2: J
+    Pack my box with five dozen liquor jugs.
+ 0: 
+ 1: Pack my box with five dozen liquor jugs.
+ 2: P
+\= Expect no match
+    The quick brown fox jumps over the lazy cat.
+No match
+    Hackdaws love my big sphinx of quartz.
+No match
+    Pack my fox with five dozen liquor jugs.
+No match
+
 # End of testinput1 


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2019-12-29 11:56:45 UTC (rev 1201)
+++ code/trunk/testdata/testoutput2    2020-01-01 12:07:02 UTC (rev 1202)
@@ -10962,8 +10962,14 @@
         Assert
         abc
         Ket
+        Assert
         abc
         Ket
+        Assert
+        abc
+        Ket
+        abc
+        Ket
         End
 ------------------------------------------------------------------


@@ -10973,8 +10979,12 @@
         Assert
         abc
         Ket
+        Brazero
+        Assert
         abc
         Ket
+        abc
+        Ket
         End
 ------------------------------------------------------------------


@@ -10981,11 +10991,17 @@
 /(?=abc)++abc/B
 ------------------------------------------------------------------
         Bra
+        Once
         Assert
         abc
         Ket
+        Brazero
+        Assert
         abc
         Ket
+        Ket
+        abc
+        Ket
         End
 ------------------------------------------------------------------


@@ -16610,6 +16626,19 @@
         Assert
         Any
         Ket
+        Assert
+        Any
+        Ket
+        Assert
+        Any
+        Ket
+        Assert
+        Any
+        Ket
+        Brazero
+        Assert
+        Any
+        Ket
         x
         Ket
         Ket