[Pcre-svn] [962] code/trunk: Documentation and tests update …

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [962] code/trunk: Documentation and tests update and minor tweak to perltest.sh.
Revision: 962
          http://www.exim.org/viewvc/pcre2?view=rev&revision=962
Author:   ph10
Date:     2018-07-12 18:04:43 +0100 (Thu, 12 Jul 2018)
Log Message:
-----------
Documentation and tests update and minor tweak to perltest.sh. 


Modified Paths:
--------------
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2pattern.3
    code/trunk/perltest.sh
    code/trunk/testdata/testinput1
    code/trunk/testdata/testoutput1


Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/doc/html/pcre2pattern.html    2018-07-12 17:04:43 UTC (rev 962)
@@ -3227,13 +3227,13 @@
 </b><br>
 <P>
 The following verbs do nothing when they are encountered. Matching continues
-with what follows, but if there is no subsequent match, causing a backtrack to
-the verb, a failure is forced. That is, backtracking cannot pass to the left of
-the verb. However, when one of these verbs appears inside an atomic group or in
-an assertion that is true, its effect is confined to that group, because once
-the group has been matched, there is never any backtracking into it. In this
-situation, backtracking has to jump to the left of the entire atomic group or
-assertion.
+with what follows, but if there is a subsequent match failure, causing a
+backtrack to the verb, a failure is forced. That is, backtracking cannot pass
+to the left of the verb. However, when one of these verbs appears inside an
+atomic group or in a lookaround assertion that is true, its effect is confined
+to that group, because once the group has been matched, there is never any
+backtracking into it. Backtracking from beyond an assertion or an atomic group
+ignores the entire group, and seeks a preceeding backtracking point.
 </P>
 <P>
 These verbs differ in exactly what kind of failure occurs when backtracking
@@ -3321,14 +3321,39 @@
 <pre>
   (*SKIP:NAME)
 </pre>
-When (*SKIP) has an associated name, its behaviour is modified. When it is
-triggered, the previous path through the pattern is searched for the most
-recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
-is to the subject position that corresponds to that (*MARK) instead of to where
-(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
-(*SKIP) is ignored.
+When (*SKIP) has an associated name, its behaviour is modified. When such a
+(*SKIP) is triggered, the previous path through the pattern is searched for the
+most recent (*MARK) that has the same name. If one is found, the "bumpalong"
+advance is to the subject position that corresponds to that (*MARK) instead of
+to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
+the (*SKIP) is ignored.
 </P>
 <P>
+The search for a (*MARK) name uses the normal backtracking mechanism, which
+means that it does not see (*MARK) settings that are inside atomic groups or
+assertions, because they are never re-entered by backtracking. Compare the
+following <b>pcre2test</b> examples:
+<pre>
+    re&#62; /a(?&#62;(*MARK:X))(*SKIP:X)(*F)|(.)/
+  data: abc
+   0: a
+   1: a
+  data:
+    re&#62; /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+  data: abc
+   0: b
+   1: b
+</pre>
+In the first example, the (*MARK) setting is in an atomic group, so it is not 
+seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
+the second branch of the pattern to be tried at the first character position.
+In the second example, the (*MARK) setting is not in an atomic group. This
+allows (*SKIP:X) to immediately cause a new matching attempt to start at the
+second character. This time, the (*MARK) is never seen because "a" does not
+match "b", so the matcher immediately jumps to the second branch of the
+pattern.
+</P>
+<P>
 Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
 names that are set by (*PRUNE:NAME) or (*THEN:NAME).
 <pre>
@@ -3456,6 +3481,14 @@
 retained in both cases.
 </P>
 <P>
+The remaining verbs act only when a later failure causes a backtrack to 
+reach them. This means that their effect is confined to the assertion, 
+because lookaround assertions are atomic. A backtrack that occurs after an
+assertion is complete does not jump back into the assertion. Note in particular 
+that a (*MARK) name that is set in an assertion is not "seen" by an instance of 
+(*SKIP:NAME) latter in the pattern.
+</P>
+<P>
 The effect of (*THEN) is not allowed to escape beyond an assertion. If there
 are no more branches to try, (*THEN) causes a positive assertion to be false,
 and a negative assertion to be true.
@@ -3463,10 +3496,10 @@
 <P>
 The other backtracking verbs are not treated specially if they appear in a
 standalone positive assertion. In a conditional positive assertion,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
-false. However, for both standalone and conditional negative assertions,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
-true, without considering any further alternative branches.
+backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
+causes the condition to be false. However, for both standalone and conditional
+negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
+the assertion to be true, without considering any further alternative branches.
 <a name="btsub"></a></P>
 <br><b>
 Backtracking verbs in subroutines
@@ -3509,7 +3542,7 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 10 July 2018
+Last updated: 11 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/doc/pcre2.txt    2018-07-12 17:04:43 UTC (rev 962)
@@ -8695,14 +8695,14 @@
    Verbs that act after backtracking


        The following verbs do nothing when they are encountered. Matching con-
-       tinues with what follows, but if there is no subsequent match,  causing
-       a  backtrack  to  the  verb, a failure is forced. That is, backtracking
-       cannot pass to the left of the verb. However, when one of  these  verbs
-       appears  inside  an  atomic  group or in an assertion that is true, its
-       effect is confined to that group,  because  once  the  group  has  been
-       matched,  there  is  never any backtracking into it. In this situation,
-       backtracking has to jump to the left of  the  entire  atomic  group  or
-       assertion.
+       tinues with what follows, but if there is a subsequent  match  failure,
+       causing  a  backtrack  to the verb, a failure is forced. That is, back-
+       tracking cannot pass to the left of the  verb.  However,  when  one  of
+       these verbs appears inside an atomic group or in a lookaround assertion
+       that is true, its effect is confined to that group,  because  once  the
+       group  has been matched, there is never any backtracking into it. Back-
+       tracking from beyond an assertion or an atomic group ignores the entire
+       group, and seeks a preceeding backtracking point.


        These  verbs  differ  in exactly what kind of failure occurs when back-
        tracking reaches them. The behaviour described below  is  what  happens
@@ -8790,13 +8790,37 @@


          (*SKIP:NAME)


-       When (*SKIP) has an associated name, its behaviour is modified. When it
-       is triggered, the previous path through the pattern is searched for the
-       most  recent  (*MARK)  that  has  the  same  name. If one is found, the
-       "bumpalong" advance is to the subject position that corresponds to that
-       (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with
-       a matching name is found, the (*SKIP) is ignored.
+       When  (*SKIP)  has  an associated name, its behaviour is modified. When
+       such a (*SKIP) is triggered, the previous path through the  pattern  is
+       searched  for the most recent (*MARK) that has the same name. If one is
+       found, the "bumpalong" advance is to the subject position  that  corre-
+       sponds  to that (*MARK) instead of to where (*SKIP) was encountered. If
+       no (*MARK) with a matching name is found, the (*SKIP) is ignored.


+       The search for a (*MARK) name uses the normal  backtracking  mechanism,
+       which  means  that  it  does  not  see (*MARK) settings that are inside
+       atomic groups or assertions, because they are never re-entered by back-
+       tracking. Compare the following pcre2test examples:
+
+           re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+         data: abc
+          0: a
+          1: a
+         data:
+           re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+         data: abc
+          0: b
+          1: b
+
+       In  the first example, the (*MARK) setting is in an atomic group, so it
+       is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
+       This  allows  the second branch of the pattern to be tried at the first
+       character position.  In the second example, the (*MARK) setting is  not
+       in  an  atomic  group. This allows (*SKIP:X) to immediately cause a new
+       matching attempt to start at  the  second  character.  This  time,  the
+       (*MARK)  is  never  seen because "a" does not match "b", so the matcher
+       immediately jumps to the second branch of the pattern.
+
        Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME).  It
        ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).


@@ -8915,41 +8939,48 @@
        true  for  a  positive assertion and false for a negative one; captured
        substrings are retained in both cases.


-       The effect of (*THEN) is not allowed to escape beyond an assertion.  If
-       there  are no more branches to try, (*THEN) causes a positive assertion
+       The remaining verbs act only when a later failure causes a backtrack to
+       reach  them. This means that their effect is confined to the assertion,
+       because lookaround assertions are atomic. A backtrack that occurs after
+       an assertion is complete does not jump back into the assertion. Note in
+       particular that a (*MARK) name that is  set  in  an  assertion  is  not
+       "seen" by an instance of (*SKIP:NAME) latter in the pattern.
+
+       The  effect of (*THEN) is not allowed to escape beyond an assertion. If
+       there are no more branches to try, (*THEN) causes a positive  assertion
        to be false, and a negative assertion to be true.


-       The other backtracking verbs are not treated specially if  they  appear
-       in  a  standalone  positive assertion. In a conditional positive asser-
-       tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con-
-       dition  to be false. However, for both standalone and conditional nega-
-       tive assertions, backtracking  into  (*COMMIT),  (*SKIP),  or  (*PRUNE)
-       causes the assertion to be true, without considering any further alter-
-       native branches.
+       The  other  backtracking verbs are not treated specially if they appear
+       in a standalone positive assertion. In a  conditional  positive  asser-
+       tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
+       or (*PRUNE) causes the condition to be false. However, for both  stand-
+       alone and conditional negative assertions, backtracking into (*COMMIT),
+       (*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
+       ing any further alternative branches.


    Backtracking verbs in subroutines


-       These behaviours occur whether or not the subpattern is  called  recur-
+       These  behaviours  occur whether or not the subpattern is called recur-
        sively.  Perl's treatment of subroutines is different in some cases.


-       (*FAIL)  in  a subpattern called as a subroutine has its normal effect:
+       (*FAIL) in a subpattern called as a subroutine has its  normal  effect:
        it forces an immediate backtrack.


-       (*ACCEPT) in a subpattern called as a subroutine causes the  subroutine
-       match  to succeed without any further processing. Matching then contin-
+       (*ACCEPT)  in a subpattern called as a subroutine causes the subroutine
+       match to succeed without any further processing. Matching then  contin-
        ues after the subroutine call.


        (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
        cause the subroutine match to fail.


-       (*THEN)  skips to the next alternative in the innermost enclosing group
-       within the subpattern that has alternatives. If there is no such  group
+       (*THEN) skips to the next alternative in the innermost enclosing  group
+       within  the subpattern that has alternatives. If there is no such group
        within the subpattern, (*THEN) causes the subroutine match to fail.



SEE ALSO

-       pcre2api(3),    pcre2callout(3),    pcre2matching(3),   pcre2syntax(3),
+       pcre2api(3),   pcre2callout(3),    pcre2matching(3),    pcre2syntax(3),
        pcre2(3).



@@ -8962,7 +8993,7 @@

REVISION

-       Last updated: 10 July 2018
+       Last updated: 11 July 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------



Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/doc/pcre2pattern.3    2018-07-12 17:04:43 UTC (rev 962)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "10 July 2018" "PCRE2 10.32"
+.TH PCRE2PATTERN 3 "11 July 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -3262,13 +3262,13 @@
 .rs
 .sp
 The following verbs do nothing when they are encountered. Matching continues
-with what follows, but if there is no subsequent match, causing a backtrack to
-the verb, a failure is forced. That is, backtracking cannot pass to the left of
-the verb. However, when one of these verbs appears inside an atomic group or in
-an assertion that is true, its effect is confined to that group, because once
-the group has been matched, there is never any backtracking into it. In this
-situation, backtracking has to jump to the left of the entire atomic group or
-assertion.
+with what follows, but if there is a subsequent match failure, causing a
+backtrack to the verb, a failure is forced. That is, backtracking cannot pass
+to the left of the verb. However, when one of these verbs appears inside an
+atomic group or in a lookaround assertion that is true, its effect is confined
+to that group, because once the group has been matched, there is never any
+backtracking into it. Backtracking from beyond an assertion or an atomic group
+ignores the entire group, and seeks a preceeding backtracking point.
 .P
 These verbs differ in exactly what kind of failure occurs when backtracking
 reaches them. The behaviour described below is what happens when the verb is
@@ -3352,13 +3352,37 @@
 .sp
   (*SKIP:NAME)
 .sp
-When (*SKIP) has an associated name, its behaviour is modified. When it is
-triggered, the previous path through the pattern is searched for the most
-recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
-is to the subject position that corresponds to that (*MARK) instead of to where
-(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
-(*SKIP) is ignored.
+When (*SKIP) has an associated name, its behaviour is modified. When such a
+(*SKIP) is triggered, the previous path through the pattern is searched for the
+most recent (*MARK) that has the same name. If one is found, the "bumpalong"
+advance is to the subject position that corresponds to that (*MARK) instead of
+to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
+the (*SKIP) is ignored.
 .P
+The search for a (*MARK) name uses the normal backtracking mechanism, which
+means that it does not see (*MARK) settings that are inside atomic groups or
+assertions, because they are never re-entered by backtracking. Compare the
+following \fBpcre2test\fP examples:
+.sp
+    re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+  data: abc
+   0: a
+   1: a
+  data:
+    re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+  data: abc
+   0: b
+   1: b
+.sp    
+In the first example, the (*MARK) setting is in an atomic group, so it is not 
+seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
+the second branch of the pattern to be tried at the first character position.
+In the second example, the (*MARK) setting is not in an atomic group. This
+allows (*SKIP:X) to immediately cause a new matching attempt to start at the
+second character. This time, the (*MARK) is never seen because "a" does not
+match "b", so the matcher immediately jumps to the second branch of the
+pattern.
+.P
 Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
 names that are set by (*PRUNE:NAME) or (*THEN:NAME).
 .sp
@@ -3481,6 +3505,13 @@
 a positive assertion and false for a negative one; captured substrings are
 retained in both cases.
 .P
+The remaining verbs act only when a later failure causes a backtrack to 
+reach them. This means that their effect is confined to the assertion, 
+because lookaround assertions are atomic. A backtrack that occurs after an
+assertion is complete does not jump back into the assertion. Note in particular 
+that a (*MARK) name that is set in an assertion is not "seen" by an instance of 
+(*SKIP:NAME) latter in the pattern.
+.P
 The effect of (*THEN) is not allowed to escape beyond an assertion. If there
 are no more branches to try, (*THEN) causes a positive assertion to be false,
 and a negative assertion to be true.
@@ -3487,10 +3518,10 @@
 .P
 The other backtracking verbs are not treated specially if they appear in a
 standalone positive assertion. In a conditional positive assertion,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
-false. However, for both standalone and conditional negative assertions,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
-true, without considering any further alternative branches.
+backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
+causes the condition to be false. However, for both standalone and conditional
+negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
+the assertion to be true, without considering any further alternative branches.
 .
 .
 .\" HTML <a name="btsub"></a>
@@ -3536,6 +3567,6 @@
 .rs
 .sp
 .nf
-Last updated: 10 July 2018
+Last updated: 11 July 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi


Modified: code/trunk/perltest.sh
===================================================================
--- code/trunk/perltest.sh    2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/perltest.sh    2018-07-12 17:04:43 UTC (rev 962)
@@ -43,7 +43,7 @@
 #   afteralltext       ignored
 #   dupnames           ignored (Perl always allows)
 #   jitstack           ignored
-#   mark               ignored
+#   mark               show mark information
 #   no_auto_possess    ignored
 #   no_start_optimize  ignored
 #   subject_literal    does not process subjects for escapes
@@ -172,9 +172,9 @@


$mod =~ s/jitstack=\d+,?//;

- # Remove "mark" (asks pcre2test to check MARK data) */
+ # The "mark" modifier requests checking of MARK data */

- $mod =~ s/mark,?//;
+ $show_mark = ($mod =~ s/mark,?//);

# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl

@@ -279,7 +279,7 @@
     elsif (scalar(@subs) == 0)
       {
       printf $outfile "No match";
-      if (defined $REGERROR && $REGERROR != 1)
+      if ($show_mark && defined $REGERROR && $REGERROR != 1)
         { printf $outfile (", mark = %s", &pchars($REGERROR)); }
       printf $outfile "\n";
       }
@@ -307,7 +307,7 @@
       # set and the input pattern was a UTF-8 string. We can, however, force
       # it to be so marked.


-      if (defined $REGMARK && $REGMARK != 1)
+      if ($show_mark && defined $REGMARK && $REGMARK != 1)
         {
         $xx = $REGMARK;
         $xx = Encode::decode_utf8($xx) if $utf8;


Modified: code/trunk/testdata/testinput1
===================================================================
--- code/trunk/testdata/testinput1    2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/testdata/testinput1    2018-07-12 17:04:43 UTC (rev 962)
@@ -6202,4 +6202,13 @@


/(?<=(?=.){4,5}x)/

+/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
+    abc
+
+/a(?>(*:X))(*SKIP:X)(*F)|(.)/
+    abc
+
+/a(?:(*:X))(*SKIP:X)(*F)|(.)/
+    abc
+
 # End of testinput1 


Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1    2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/testdata/testoutput1    2018-07-12 17:04:43 UTC (rev 962)
@@ -9841,4 +9841,19 @@


/(?<=(?=.){4,5}x)/

+/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
+    abc
+ 0: a
+ 1: a
+
+/a(?>(*:X))(*SKIP:X)(*F)|(.)/
+    abc
+ 0: a
+ 1: a
+
+/a(?:(*:X))(*SKIP:X)(*F)|(.)/
+    abc
+ 0: b
+ 1: b
+
 # End of testinput1