Revision: 962
http://www.exim.org/viewvc/pcre2?view=rev&revision=962
Author: ph10
Date: 2018-07-12 18:04:43 +0100 (Thu, 12 Jul 2018)
Log Message:
-----------
Documentation and tests update and minor tweak to perltest.sh.
Modified Paths:
--------------
code/trunk/doc/html/pcre2pattern.html
code/trunk/doc/pcre2.txt
code/trunk/doc/pcre2pattern.3
code/trunk/perltest.sh
code/trunk/testdata/testinput1
code/trunk/testdata/testoutput1
Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html 2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/doc/html/pcre2pattern.html 2018-07-12 17:04:43 UTC (rev 962)
@@ -3227,13 +3227,13 @@
</b><br>
<P>
The following verbs do nothing when they are encountered. Matching continues
-with what follows, but if there is no subsequent match, causing a backtrack to
-the verb, a failure is forced. That is, backtracking cannot pass to the left of
-the verb. However, when one of these verbs appears inside an atomic group or in
-an assertion that is true, its effect is confined to that group, because once
-the group has been matched, there is never any backtracking into it. In this
-situation, backtracking has to jump to the left of the entire atomic group or
-assertion.
+with what follows, but if there is a subsequent match failure, causing a
+backtrack to the verb, a failure is forced. That is, backtracking cannot pass
+to the left of the verb. However, when one of these verbs appears inside an
+atomic group or in a lookaround assertion that is true, its effect is confined
+to that group, because once the group has been matched, there is never any
+backtracking into it. Backtracking from beyond an assertion or an atomic group
+ignores the entire group, and seeks a preceeding backtracking point.
</P>
<P>
These verbs differ in exactly what kind of failure occurs when backtracking
@@ -3321,14 +3321,39 @@
<pre>
(*SKIP:NAME)
</pre>
-When (*SKIP) has an associated name, its behaviour is modified. When it is
-triggered, the previous path through the pattern is searched for the most
-recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
-is to the subject position that corresponds to that (*MARK) instead of to where
-(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
-(*SKIP) is ignored.
+When (*SKIP) has an associated name, its behaviour is modified. When such a
+(*SKIP) is triggered, the previous path through the pattern is searched for the
+most recent (*MARK) that has the same name. If one is found, the "bumpalong"
+advance is to the subject position that corresponds to that (*MARK) instead of
+to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
+the (*SKIP) is ignored.
</P>
<P>
+The search for a (*MARK) name uses the normal backtracking mechanism, which
+means that it does not see (*MARK) settings that are inside atomic groups or
+assertions, because they are never re-entered by backtracking. Compare the
+following <b>pcre2test</b> examples:
+<pre>
+ re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: a
+ 1: a
+ data:
+ re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: b
+ 1: b
+</pre>
+In the first example, the (*MARK) setting is in an atomic group, so it is not
+seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
+the second branch of the pattern to be tried at the first character position.
+In the second example, the (*MARK) setting is not in an atomic group. This
+allows (*SKIP:X) to immediately cause a new matching attempt to start at the
+second character. This time, the (*MARK) is never seen because "a" does not
+match "b", so the matcher immediately jumps to the second branch of the
+pattern.
+</P>
+<P>
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
<pre>
@@ -3456,6 +3481,14 @@
retained in both cases.
</P>
<P>
+The remaining verbs act only when a later failure causes a backtrack to
+reach them. This means that their effect is confined to the assertion,
+because lookaround assertions are atomic. A backtrack that occurs after an
+assertion is complete does not jump back into the assertion. Note in particular
+that a (*MARK) name that is set in an assertion is not "seen" by an instance of
+(*SKIP:NAME) latter in the pattern.
+</P>
+<P>
The effect of (*THEN) is not allowed to escape beyond an assertion. If there
are no more branches to try, (*THEN) causes a positive assertion to be false,
and a negative assertion to be true.
@@ -3463,10 +3496,10 @@
<P>
The other backtracking verbs are not treated specially if they appear in a
standalone positive assertion. In a conditional positive assertion,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
-false. However, for both standalone and conditional negative assertions,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
-true, without considering any further alternative branches.
+backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
+causes the condition to be false. However, for both standalone and conditional
+negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
+the assertion to be true, without considering any further alternative branches.
<a name="btsub"></a></P>
<br><b>
Backtracking verbs in subroutines
@@ -3509,7 +3542,7 @@
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 10 July 2018
+Last updated: 11 July 2018
<br>
Copyright © 1997-2018 University of Cambridge.
<br>
Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt 2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/doc/pcre2.txt 2018-07-12 17:04:43 UTC (rev 962)
@@ -8695,14 +8695,14 @@
Verbs that act after backtracking
The following verbs do nothing when they are encountered. Matching con-
- tinues with what follows, but if there is no subsequent match, causing
- a backtrack to the verb, a failure is forced. That is, backtracking
- cannot pass to the left of the verb. However, when one of these verbs
- appears inside an atomic group or in an assertion that is true, its
- effect is confined to that group, because once the group has been
- matched, there is never any backtracking into it. In this situation,
- backtracking has to jump to the left of the entire atomic group or
- assertion.
+ tinues with what follows, but if there is a subsequent match failure,
+ causing a backtrack to the verb, a failure is forced. That is, back-
+ tracking cannot pass to the left of the verb. However, when one of
+ these verbs appears inside an atomic group or in a lookaround assertion
+ that is true, its effect is confined to that group, because once the
+ group has been matched, there is never any backtracking into it. Back-
+ tracking from beyond an assertion or an atomic group ignores the entire
+ group, and seeks a preceeding backtracking point.
These verbs differ in exactly what kind of failure occurs when back-
tracking reaches them. The behaviour described below is what happens
@@ -8790,13 +8790,37 @@
(*SKIP:NAME)
- When (*SKIP) has an associated name, its behaviour is modified. When it
- is triggered, the previous path through the pattern is searched for the
- most recent (*MARK) that has the same name. If one is found, the
- "bumpalong" advance is to the subject position that corresponds to that
- (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with
- a matching name is found, the (*SKIP) is ignored.
+ When (*SKIP) has an associated name, its behaviour is modified. When
+ such a (*SKIP) is triggered, the previous path through the pattern is
+ searched for the most recent (*MARK) that has the same name. If one is
+ found, the "bumpalong" advance is to the subject position that corre-
+ sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
+ no (*MARK) with a matching name is found, the (*SKIP) is ignored.
+ The search for a (*MARK) name uses the normal backtracking mechanism,
+ which means that it does not see (*MARK) settings that are inside
+ atomic groups or assertions, because they are never re-entered by back-
+ tracking. Compare the following pcre2test examples:
+
+ re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: a
+ 1: a
+ data:
+ re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: b
+ 1: b
+
+ In the first example, the (*MARK) setting is in an atomic group, so it
+ is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
+ This allows the second branch of the pattern to be tried at the first
+ character position. In the second example, the (*MARK) setting is not
+ in an atomic group. This allows (*SKIP:X) to immediately cause a new
+ matching attempt to start at the second character. This time, the
+ (*MARK) is never seen because "a" does not match "b", so the matcher
+ immediately jumps to the second branch of the pattern.
+
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
@@ -8915,41 +8939,48 @@
true for a positive assertion and false for a negative one; captured
substrings are retained in both cases.
- The effect of (*THEN) is not allowed to escape beyond an assertion. If
- there are no more branches to try, (*THEN) causes a positive assertion
+ The remaining verbs act only when a later failure causes a backtrack to
+ reach them. This means that their effect is confined to the assertion,
+ because lookaround assertions are atomic. A backtrack that occurs after
+ an assertion is complete does not jump back into the assertion. Note in
+ particular that a (*MARK) name that is set in an assertion is not
+ "seen" by an instance of (*SKIP:NAME) latter in the pattern.
+
+ The effect of (*THEN) is not allowed to escape beyond an assertion. If
+ there are no more branches to try, (*THEN) causes a positive assertion
to be false, and a negative assertion to be true.
- The other backtracking verbs are not treated specially if they appear
- in a standalone positive assertion. In a conditional positive asser-
- tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con-
- dition to be false. However, for both standalone and conditional nega-
- tive assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE)
- causes the assertion to be true, without considering any further alter-
- native branches.
+ The other backtracking verbs are not treated specially if they appear
+ in a standalone positive assertion. In a conditional positive asser-
+ tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
+ or (*PRUNE) causes the condition to be false. However, for both stand-
+ alone and conditional negative assertions, backtracking into (*COMMIT),
+ (*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
+ ing any further alternative branches.
Backtracking verbs in subroutines
- These behaviours occur whether or not the subpattern is called recur-
+ These behaviours occur whether or not the subpattern is called recur-
sively. Perl's treatment of subroutines is different in some cases.
- (*FAIL) in a subpattern called as a subroutine has its normal effect:
+ (*FAIL) in a subpattern called as a subroutine has its normal effect:
it forces an immediate backtrack.
- (*ACCEPT) in a subpattern called as a subroutine causes the subroutine
- match to succeed without any further processing. Matching then contin-
+ (*ACCEPT) in a subpattern called as a subroutine causes the subroutine
+ match to succeed without any further processing. Matching then contin-
ues after the subroutine call.
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
cause the subroutine match to fail.
- (*THEN) skips to the next alternative in the innermost enclosing group
- within the subpattern that has alternatives. If there is no such group
+ (*THEN) skips to the next alternative in the innermost enclosing group
+ within the subpattern that has alternatives. If there is no such group
within the subpattern, (*THEN) causes the subroutine match to fail.
SEE ALSO
- pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
+ pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
pcre2(3).
@@ -8962,7 +8993,7 @@
REVISION
- Last updated: 10 July 2018
+ Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3 2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/doc/pcre2pattern.3 2018-07-12 17:04:43 UTC (rev 962)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "10 July 2018" "PCRE2 10.32"
+.TH PCRE2PATTERN 3 "11 July 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -3262,13 +3262,13 @@
.rs
.sp
The following verbs do nothing when they are encountered. Matching continues
-with what follows, but if there is no subsequent match, causing a backtrack to
-the verb, a failure is forced. That is, backtracking cannot pass to the left of
-the verb. However, when one of these verbs appears inside an atomic group or in
-an assertion that is true, its effect is confined to that group, because once
-the group has been matched, there is never any backtracking into it. In this
-situation, backtracking has to jump to the left of the entire atomic group or
-assertion.
+with what follows, but if there is a subsequent match failure, causing a
+backtrack to the verb, a failure is forced. That is, backtracking cannot pass
+to the left of the verb. However, when one of these verbs appears inside an
+atomic group or in a lookaround assertion that is true, its effect is confined
+to that group, because once the group has been matched, there is never any
+backtracking into it. Backtracking from beyond an assertion or an atomic group
+ignores the entire group, and seeks a preceeding backtracking point.
.P
These verbs differ in exactly what kind of failure occurs when backtracking
reaches them. The behaviour described below is what happens when the verb is
@@ -3352,13 +3352,37 @@
.sp
(*SKIP:NAME)
.sp
-When (*SKIP) has an associated name, its behaviour is modified. When it is
-triggered, the previous path through the pattern is searched for the most
-recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
-is to the subject position that corresponds to that (*MARK) instead of to where
-(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
-(*SKIP) is ignored.
+When (*SKIP) has an associated name, its behaviour is modified. When such a
+(*SKIP) is triggered, the previous path through the pattern is searched for the
+most recent (*MARK) that has the same name. If one is found, the "bumpalong"
+advance is to the subject position that corresponds to that (*MARK) instead of
+to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
+the (*SKIP) is ignored.
.P
+The search for a (*MARK) name uses the normal backtracking mechanism, which
+means that it does not see (*MARK) settings that are inside atomic groups or
+assertions, because they are never re-entered by backtracking. Compare the
+following \fBpcre2test\fP examples:
+.sp
+ re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: a
+ 1: a
+ data:
+ re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: b
+ 1: b
+.sp
+In the first example, the (*MARK) setting is in an atomic group, so it is not
+seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
+the second branch of the pattern to be tried at the first character position.
+In the second example, the (*MARK) setting is not in an atomic group. This
+allows (*SKIP:X) to immediately cause a new matching attempt to start at the
+second character. This time, the (*MARK) is never seen because "a" does not
+match "b", so the matcher immediately jumps to the second branch of the
+pattern.
+.P
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
.sp
@@ -3481,6 +3505,13 @@
a positive assertion and false for a negative one; captured substrings are
retained in both cases.
.P
+The remaining verbs act only when a later failure causes a backtrack to
+reach them. This means that their effect is confined to the assertion,
+because lookaround assertions are atomic. A backtrack that occurs after an
+assertion is complete does not jump back into the assertion. Note in particular
+that a (*MARK) name that is set in an assertion is not "seen" by an instance of
+(*SKIP:NAME) latter in the pattern.
+.P
The effect of (*THEN) is not allowed to escape beyond an assertion. If there
are no more branches to try, (*THEN) causes a positive assertion to be false,
and a negative assertion to be true.
@@ -3487,10 +3518,10 @@
.P
The other backtracking verbs are not treated specially if they appear in a
standalone positive assertion. In a conditional positive assertion,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
-false. However, for both standalone and conditional negative assertions,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
-true, without considering any further alternative branches.
+backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
+causes the condition to be false. However, for both standalone and conditional
+negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
+the assertion to be true, without considering any further alternative branches.
.
.
.\" HTML <a name="btsub"></a>
@@ -3536,6 +3567,6 @@
.rs
.sp
.nf
-Last updated: 10 July 2018
+Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi
Modified: code/trunk/perltest.sh
===================================================================
--- code/trunk/perltest.sh 2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/perltest.sh 2018-07-12 17:04:43 UTC (rev 962)
@@ -43,7 +43,7 @@
# afteralltext ignored
# dupnames ignored (Perl always allows)
# jitstack ignored
-# mark ignored
+# mark show mark information
# no_auto_possess ignored
# no_start_optimize ignored
# subject_literal does not process subjects for escapes
@@ -172,9 +172,9 @@
$mod =~ s/jitstack=\d+,?//;
- # Remove "mark" (asks pcre2test to check MARK data) */
+ # The "mark" modifier requests checking of MARK data */
- $mod =~ s/mark,?//;
+ $show_mark = ($mod =~ s/mark,?//);
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
@@ -279,7 +279,7 @@
elsif (scalar(@subs) == 0)
{
printf $outfile "No match";
- if (defined $REGERROR && $REGERROR != 1)
+ if ($show_mark && defined $REGERROR && $REGERROR != 1)
{ printf $outfile (", mark = %s", &pchars($REGERROR)); }
printf $outfile "\n";
}
@@ -307,7 +307,7 @@
# set and the input pattern was a UTF-8 string. We can, however, force
# it to be so marked.
- if (defined $REGMARK && $REGMARK != 1)
+ if ($show_mark && defined $REGMARK && $REGMARK != 1)
{
$xx = $REGMARK;
$xx = Encode::decode_utf8($xx) if $utf8;
Modified: code/trunk/testdata/testinput1
===================================================================
--- code/trunk/testdata/testinput1 2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/testdata/testinput1 2018-07-12 17:04:43 UTC (rev 962)
@@ -6202,4 +6202,13 @@
/(?<=(?=.){4,5}x)/
+/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+
+/a(?>(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+
+/a(?:(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+
# End of testinput1
Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1 2018-07-11 10:06:51 UTC (rev 961)
+++ code/trunk/testdata/testoutput1 2018-07-12 17:04:43 UTC (rev 962)
@@ -9841,4 +9841,19 @@
/(?<=(?=.){4,5}x)/
+/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+ 0: a
+ 1: a
+
+/a(?>(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+ 0: a
+ 1: a
+
+/a(?:(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+ 0: b
+ 1: b
+
# End of testinput1