[Pcre-svn] [968] code/trunk: Allow :NAME on (*ACCEPT), (*FAI…

Page principale
Supprimer ce message
Auteur: Subversion repository
Date:  
À: pcre-svn
Sujet: [Pcre-svn] [968] code/trunk: Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)
Revision: 968
          http://www.exim.org/viewvc/pcre2?view=rev&revision=968
Author:   ph10
Date:     2018-07-21 15:34:51 +0100 (Sat, 21 Jul 2018)
Log Message:
-----------
Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK) 
followed by (*ACCEPT) in an assertion. More small updates to perltest.sh.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/HACKING
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/html/pcre2syntax.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2pattern.3
    code/trunk/doc/pcre2syntax.3
    code/trunk/doc/pcre2test.1
    code/trunk/doc/pcre2test.txt
    code/trunk/perltest.sh
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_compile.c
    code/trunk/src/pcre2_dfa_match.c
    code/trunk/src/pcre2_error.c
    code/trunk/src/pcre2_internal.h
    code/trunk/src/pcre2_jit_compile.c
    code/trunk/src/pcre2_match.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput1
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput1
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/ChangeLog    2018-07-21 14:34:51 UTC (rev 968)
@@ -119,9 +119,16 @@
 shouldn't find a MARK (because is in an atomic group), but it did.


26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
-certain modifiers that the script recognizes; (2) Unsupported #command lines
-give a warning when they are ignored; (3) Mark data is output only if the
-"mark" modifier is present.
+a list of modifiers for all subsequent patterns - only those that the script
+recognizes are meaningful; (2) #subject lines can be used to set or unset a
+default "mark" modifier; (3) Unsupported #command lines give a warning when
+they are ignored; (4) Mark data is output only if the "mark" modifier is
+present.
+
+27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
+
+28. A (*MARK) name was not being passed back for positive assertions that were
+terminated by (*ACCEPT).


Version 10.31 12-February-2018

Modified: code/trunk/HACKING
===================================================================
--- code/trunk/HACKING    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/HACKING    2018-07-21 14:34:51 UTC (rev 968)
@@ -256,6 +256,7 @@
 values (which should match with the length):


 META_MARK             (*MARK:xxxx)
+META_COMMIT_ARG       )*COMMIT:xxxx)
 META_PRUNE_ARG        (*PRUNE:xxx)
 META_SKIP_ARG         (*SKIP:xxxx)
 META_THEN_ARG         (*THEN:xxxx)
@@ -382,7 +383,7 @@
 Opcodes with no following data
 ------------------------------


-These items are all just one unit long
+These items are all just one unit long:

   OP_END                 end of pattern
   OP_ANY                 match any one character other than newline
@@ -430,16 +431,24 @@
 (PCRE2_ALLOW_EMPTY_CLASS is set).



-Backtracking control verbs with optional data
----------------------------------------------
+Backtracking control verbs
+--------------------------

-(*THEN) without an argument generates the opcode OP_THEN and no following data.
-OP_MARK is followed by the mark name, preceded by a length in one code unit,
-and followed by a binary zero. For (*PRUNE), (*SKIP), and (*THEN) with
-arguments, the opcodes OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used,
-with the name following in the same format as OP_MARK.
+Verbs with no arguments generate opcodes with no following data (as listed
+in the section above).

+(*MARK:NAME) generates OP_MARK followed by the mark name, preceded by a
+length in one code unit, and followed by a binary zero. The name length is
+limited by the size of the code unit.

+(*ACCEPT:NAME) and (*FAIL:NAME) are compiled as (*MARK:NAME)(*ACCEPT) and
+(*MARK:NAME)(*FAIL) respectively.
+
+For (*COMMIT:NAME), (*PRUNE:NAME), (*SKIP:NAME), and (*THEN:NAME), the opcodes
+OP_COMMIT_ARG, OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used, with the
+name following in the same format as for OP_MARK.
+
+
Matching literal characters
---------------------------

@@ -814,4 +823,4 @@
opcode are the correct length, in order to catch updating errors.

Philip Hazel
-21 April 2017
+20 July 2018

Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/html/pcre2pattern.html    2018-07-21 14:34:51 UTC (rev 968)
@@ -3122,17 +3122,16 @@
 documentation.
 </P>
 <P>
-Experiments with Perl suggest that it too has similar optimizations, sometimes
-leading to anomalous results.
+Experiments with Perl suggest that it too has similar optimizations, and like 
+PCRE2, turning them off can change the result of a match.
 </P>
 <br><b>
 Verbs that act immediately
 </b><br>
 <P>
-The following verbs act as soon as they are encountered. They may not be
-followed by a name.
+The following verbs act as soon as they are encountered.
 <pre>
-   (*ACCEPT)
+   (*ACCEPT) or (*ACCEPT:NAME)
 </pre>
 This verb causes the match to end successfully, skipping the remainder of the
 pattern. However, when it is inside a subpattern that is called as a
@@ -3149,13 +3148,13 @@
 This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
 the outer parentheses.
 <pre>
-  (*FAIL) or (*F)
+  (*FAIL) or (*FAIL:NAME)
 </pre>
-This verb causes a matching failure, forcing backtracking to occur. It is
-equivalent to (?!) but easier to read. The Perl documentation notes that it is
-probably useful only when combined with (?{}) or (??{}). Those are, of course,
-Perl features that are not present in PCRE2. The nearest equivalent is the
-callout feature, as for example in this pattern:
+This verb causes a matching failure, forcing backtracking to occur. It may be 
+abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
+documentation notes that it is probably useful only when combined with (?{}) or
+(??{}). Those are, of course, Perl features that are not present in PCRE2. The
+nearest equivalent is the callout feature, as for example in this pattern:
 <pre>
   a+(?C)(*FAIL)
 </pre>
@@ -3162,6 +3161,10 @@
 A match with the string "aaaa" always fails, but the callout is taken before
 each backtrack happens (in this example, 10 times).
 </P>
+<P>
+(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as 
+(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
+</P>
 <br><b>
 Recording which path was taken
 </b><br>
@@ -3186,9 +3189,9 @@
 (*MARK) is used in conjunction with (*SKIP) as described below.)
 </P>
 <P>
-As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
-arguments. Whichever is last on the matching path is passed back. See below for 
-more details of these other verbs.
+As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
+associated NAME arguments. Whichever is last on the matching path is passed
+back. See below for more details of these other verbs.
 </P>
 <P>
 Here is an example of <b>pcre2test</b> output, where the "mark" modifier
@@ -3250,24 +3253,27 @@
 not in a subroutine or an assertion. Subsequent sections cover these special
 cases.
 <pre>
-  (*COMMIT)
+  (*COMMIT) or (*COMMIT:NAME)
 </pre>
-This verb, which may not be followed by a name, causes the whole match to fail
-outright if there is a later matching failure that causes backtracking to reach
-it. Even if the pattern is unanchored, no further attempts to find a match by
-advancing the starting point take place. If (*COMMIT) is the only backtracking
-verb that is encountered, once it has been passed <b>pcre2_match()</b> is
-committed to finding a match at the current starting point, or not at all. For
-example:
+This verb causes the whole match to fail outright if there is a later matching
+failure that causes backtracking to reach it. Even if the pattern is
+unanchored, no further attempts to find a match by advancing the starting point
+take place. If (*COMMIT) is the only backtracking verb that is encountered,
+once it has been passed <b>pcre2_match()</b> is committed to finding a match at
+the current starting point, or not at all. For example:
 <pre>
   a+(*COMMIT)b
 </pre>
 This matches "xxaab" but not "aacaab". It can be thought of as a kind of
-dynamic anchor, or "I've started, so I must finish." The name of the most
-recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
-match failure.
+dynamic anchor, or "I've started, so I must finish." 
 </P>
 <P>
+The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
+caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
+</P>
+<P>
 If there is more than one backtracking verb in a pattern, a different one that
 follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
 match does not always guarantee that a match must be at this starting point.
@@ -3309,7 +3315,7 @@
 The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
 like (*MARK:NAME) in that the name is remembered for passing back to the
 caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) or (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
 <pre>
   (*SKIP)
 </pre>
@@ -3317,7 +3323,7 @@
 pattern is unanchored, the "bumpalong" advance is not to the next character,
 but to the position in the subject where (*SKIP) was encountered. (*SKIP)
 signifies that whatever text was matched leading up to it cannot be part of a
-successful match. Consider:
+successful match if there is a later mismatch. Consider:
 <pre>
   a+(*SKIP)b
 </pre>
@@ -3364,7 +3370,7 @@
 </P>
 <P>
 Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
-names that are set by (*PRUNE:NAME) or (*THEN:NAME).
+names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
 <pre>
   (*THEN) or (*THEN:NAME)
 </pre>
@@ -3383,10 +3389,10 @@
 group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
 </P>
 <P>
-The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
-It is like (*MARK:NAME) in that the name is remembered for passing back to the
+The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
 caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) and (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
 </P>
 <P>
 A subpattern that does not contain a | character is just a part of the
@@ -3461,13 +3467,14 @@
 Backtracking verbs in repeated groups
 </b><br>
 <P>
-PCRE2 differs from Perl in its handling of backtracking verbs in repeated
-groups. For example, consider:
+PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
+repeated groups. For example, consider:
 <pre>
   /(a(*COMMIT)b)+ac/
 </pre>
-If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
-in the second repeat of the group acts.
+If the subject is "abac", Perl matches unless its optimizations are disabled,
+but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
+acts.
 <a name="btassert"></a></P>
 <br><b>
 Backtracking verbs in assertions
@@ -3480,9 +3487,10 @@
 </P>
 <P>
 (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
-without any further processing; captured strings are retained. In a standalone
-negative assertion, (*ACCEPT) causes the assertion to fail without any further
-processing; captured substrings are discarded.
+without any further processing; captured strings and a (*MARK) name (if set)
+are retained. In a standalone negative assertion, (*ACCEPT) causes the
+assertion to fail without any further processing; captured substrings and any 
+(*MARK) name are discarded.
 </P>
 <P>
 If the assertion is a condition, (*ACCEPT) causes the condition to be true for
@@ -3515,18 +3523,18 @@
 </b><br>
 <P>
 These behaviours occur whether or not the subpattern is called recursively.
-Perl's treatment of subroutines is different in some cases.
 </P>
 <P>
+(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
+succeed without any further processing. Matching then continues after the
+subroutine call. Perl documents this behaviour. Perl's treatment of the other
+verbs in subroutines is different in some cases.
+</P>
+<P>
 (*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
 an immediate backtrack.
 </P>
 <P>
-(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
-succeed without any further processing. Matching then continues after the
-subroutine call.
-</P>
-<P>
 (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
 the subroutine match to fail.
 </P>
@@ -3551,7 +3559,7 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 16 July 2018
+Last updated: 20 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2syntax.html
===================================================================
--- code/trunk/doc/html/pcre2syntax.html    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/html/pcre2syntax.html    2018-07-21 14:34:51 UTC (rev 968)
@@ -569,7 +569,11 @@
 </P>
 <br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
-The following act immediately they are reached:
+All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
+name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
+if :NAME is present. The others just set a name for passing back to the caller,
+but this is not a name that (*SKIP) can see. The following act immediately they
+are reached:
 <pre>
   (*ACCEPT)       force successful match
   (*FAIL)         force backtrack; synonym (*F)
@@ -582,13 +586,13 @@
 <pre>
   (*COMMIT)       overall failure, no advance of starting point
   (*PRUNE)        advance to next starting character
-  (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
   (*SKIP)         advance to current matching position
   (*SKIP:NAME)    advance to position corresponding to an earlier
                   (*MARK:NAME); if not found, the (*SKIP) is ignored
   (*THEN)         local failure, backtrack to next alternation
-  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
-</PRE>
+</pre>
+The effect of one of these verbs in a group called as a subroutine is confined 
+to the subroutine call.   
 </P>
 <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
 <P>
@@ -617,7 +621,7 @@
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 07 July 2018
+Last updated: 21 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/html/pcre2test.html    2018-07-21 14:34:51 UTC (rev 968)
@@ -410,10 +410,11 @@
 The appearance of this line causes all subsequent modifier settings to be
 checked for compatibility with the <b>perltest.sh</b> script, which is used to
 confirm that Perl gives the same results as PCRE2. Also, apart from comment
-lines, none of the other command lines are permitted, because they and many
-of the modifiers are specific to <b>pcre2test</b>, and should not be used in
-test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
-command helps detect tests that are accidentally put in the wrong file.
+lines, #pattern commands, and #subject commands that set or unset "mark", no
+command lines are permitted, because they and many of the modifiers are
+specific to <b>pcre2test</b>, and should not be used in test files that are also
+processed by <b>perltest.sh</b>. The <b>#perltest</b> command helps detect tests
+that are accidentally put in the wrong file.
 <pre>
   #pop [&#60;modifiers&#62;]
   #popcopy [&#60;modifiers&#62;]
@@ -2003,7 +2004,7 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 16 July 2018
+Last updated: 21 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/pcre2.txt    2018-07-21 14:34:51 UTC (rev 968)
@@ -8601,44 +8601,46 @@
        in the pcre2api documentation.


        Experiments  with  Perl  suggest that it too has similar optimizations,
-       sometimes leading to anomalous results.
+       and like PCRE2, turning them off can change the result of a match.


    Verbs that act immediately


-       The following verbs act as soon as they are encountered. They  may  not
-       be followed by a name.
+       The following verbs act as soon as they are encountered.


-          (*ACCEPT)
+          (*ACCEPT) or (*ACCEPT:NAME)


-       This  verb causes the match to end successfully, skipping the remainder
-       of the pattern. However, when it is inside a subpattern that is  called
-       as  a  subroutine, only that subpattern is ended successfully. Matching
+       This verb causes the match to end successfully, skipping the  remainder
+       of  the pattern. However, when it is inside a subpattern that is called
+       as a subroutine, only that subpattern is ended  successfully.  Matching
        then continues at the outer level. If (*ACCEPT) in triggered in a posi-
-       tive  assertion,  the  assertion succeeds; in a negative assertion, the
+       tive assertion, the assertion succeeds; in a  negative  assertion,  the
        assertion fails.


-       If (*ACCEPT) is inside capturing parentheses, the data so far  is  cap-
+       If  (*ACCEPT)  is inside capturing parentheses, the data so far is cap-
        tured. For example:


          A((?:A|B(*ACCEPT)|C)D)


-       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
        tured by the outer parentheses.


-         (*FAIL) or (*F)
+         (*FAIL) or (*FAIL:NAME)


-       This verb causes a matching failure, forcing backtracking to occur.  It
-       is  equivalent to (?!) but easier to read. The Perl documentation notes
-       that it is probably useful only when combined  with  (?{})  or  (??{}).
-       Those  are, of course, Perl features that are not present in PCRE2. The
-       nearest equivalent is the callout feature, as for example in this  pat-
-       tern:
+       This  verb causes a matching failure, forcing backtracking to occur. It
+       may be abbreviated to (*F). It is equivalent  to  (?!)  but  easier  to
+       read. The Perl documentation notes that it is probably useful only when
+       combined with (?{}) or (??{}). Those are, of course, Perl features that
+       are  not  present  in PCRE2. The nearest equivalent is the callout fea-
+       ture, as for example in this pattern:


          a+(?C)(*FAIL)


-       A  match  with the string "aaaa" always fails, but the callout is taken
+       A match with the string "aaaa" always fails, but the callout  is  taken
        before each backtrack happens (in this example, 10 times).


+       (*ACCEPT:NAME)   and   (*FAIL:NAME)   behave   exactly   the   same  as
+       (*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
+
    Recording which path was taken


        There is one verb whose main purpose  is  to  track  how  a  match  was
@@ -8659,9 +8661,9 @@
        cases  when  (*MARK)  is  used in conjunction with (*SKIP) as described
        below.)


-       As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have  associated
-       NAME  arguments. Whichever is last on the matching path is passed back.
-       See below for more details of these other verbs.
+       As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may  have
+       associated  NAME  arguments.  Whichever is last on the matching path is
+       passed back. See below for more details of these other verbs.


        Here is an example of  pcre2test  output,  where  the  "mark"  modifier
        requests the retrieval and outputting of (*MARK) data:
@@ -8717,23 +8719,27 @@
        when  the  verb is not in a subroutine or an assertion. Subsequent sec-
        tions cover these special cases.


-         (*COMMIT)
+         (*COMMIT) or (*COMMIT:NAME)


-       This verb, which may not be followed by a name, causes the whole  match
-       to fail outright if there is a later matching failure that causes back-
-       tracking to reach it. Even if the pattern  is  unanchored,  no  further
-       attempts to find a match by advancing the starting point take place. If
-       (*COMMIT) is the only backtracking verb that is  encountered,  once  it
-       has  been  passed  pcre2_match() is committed to finding a match at the
-       current starting point, or not at all. For example:
+       This verb causes the whole match to fail outright if there is  a  later
+       matching failure that causes backtracking to reach it. Even if the pat-
+       tern is unanchored, no further attempts to find a  match  by  advancing
+       the  starting  point  take place. If (*COMMIT) is the only backtracking
+       verb that is encountered, once it has been passed pcre2_match() is com-
+       mitted to finding a match at the current starting point, or not at all.
+       For example:


          a+(*COMMIT)b


        This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
-       of dynamic anchor, or "I've started, so I must finish." The name of the
-       most recently passed (*MARK) in the path is passed back when  (*COMMIT)
-       forces a match failure.
+       of dynamic anchor, or "I've started, so I must finish."


+       The  behaviour  of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
+       MIT). It is like (*MARK:NAME) in that the name is remembered for  pass-
+       ing  back  to the caller. However, (*SKIP:NAME) searches only for names
+       set with  (*MARK),  ignoring  those  set  by  (*COMMIT),  (*PRUNE)  and
+       (*THEN).
+
        If  there  is more than one backtracking verb in a pattern, a different
        one that follows (*COMMIT) may be triggered first,  so  merely  passing
        (*COMMIT) during a match does not always guarantee that a match must be
@@ -8776,7 +8782,7 @@
        The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
        It is like (*MARK:NAME) in that the name is remembered for passing back
        to the caller. However, (*SKIP:NAME) searches only for names  set  with
-       (*MARK), ignoring those set by (*PRUNE) or (*THEN).
+       (*MARK), ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).


          (*SKIP)


@@ -8784,29 +8790,30 @@
        the pattern is unanchored, the "bumpalong" advance is not to  the  next
        character, but to the position in the subject where (*SKIP) was encoun-
        tered. (*SKIP) signifies that whatever text was matched leading  up  to
-       it cannot be part of a successful match. Consider:
+       it  cannot  be part of a successful match if there is a later mismatch.
+       Consider:


          a+(*SKIP)b


-       If  the  subject  is  "aaaac...",  after  the first match attempt fails
-       (starting at the first character in the  string),  the  starting  point
+       If the subject is "aaaac...",  after  the  first  match  attempt  fails
+       (starting  at  the  first  character in the string), the starting point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer does not have the same effect as this example; although it  would
-       suppress  backtracking  during  the  first  match  attempt,  the second
-       attempt would start at the second character instead of skipping  on  to
+       tifer  does not have the same effect as this example; although it would
+       suppress backtracking  during  the  first  match  attempt,  the  second
+       attempt  would  start at the second character instead of skipping on to
        "c".


          (*SKIP:NAME)


-       When  (*SKIP)  has  an associated name, its behaviour is modified. When
-       such a (*SKIP) is triggered, the previous path through the  pattern  is
-       searched  for the most recent (*MARK) that has the same name. If one is
-       found, the "bumpalong" advance is to the subject position  that  corre-
-       sponds  to that (*MARK) instead of to where (*SKIP) was encountered. If
+       When (*SKIP) has an associated name, its behaviour  is  modified.  When
+       such  a  (*SKIP) is triggered, the previous path through the pattern is
+       searched for the most recent (*MARK) that has the same name. If one  is
+       found,  the  "bumpalong" advance is to the subject position that corre-
+       sponds to that (*MARK) instead of to where (*SKIP) was encountered.  If
        no (*MARK) with a matching name is found, the (*SKIP) is ignored.


-       The search for a (*MARK) name uses the normal  backtracking  mechanism,
-       which  means  that  it  does  not  see (*MARK) settings that are inside
+       The  search  for a (*MARK) name uses the normal backtracking mechanism,
+       which means that it does not  see  (*MARK)  settings  that  are  inside
        atomic groups or assertions, because they are never re-entered by back-
        tracking. Compare the following pcre2test examples:


@@ -8820,18 +8827,19 @@
           0: b
           1: b


-       In  the first example, the (*MARK) setting is in an atomic group, so it
+       In the first example, the (*MARK) setting is in an atomic group, so  it
        is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
-       This  allows  the second branch of the pattern to be tried at the first
-       character position.  In the second example, the (*MARK) setting is  not
-       in  an  atomic group. This allows (*SKIP:X) to find the (*MARK) when it
+       This allows the second branch of the pattern to be tried at  the  first
+       character  position.  In the second example, the (*MARK) setting is not
+       in an atomic group. This allows (*SKIP:X) to find the (*MARK)  when  it
        backtracks, and this causes a new matching attempt to start at the sec-
-       ond  character.  This  time, the (*MARK) is never seen because "a" does
+       ond character. This time, the (*MARK) is never seen  because  "a"  does
        not match "b", so the matcher immediately jumps to the second branch of
        the pattern.


-       Note  that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
-       ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
+       Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME).  It
+       ignores   names  that  are  set  by  (*COMMIT:NAME),  (*PRUNE:NAME)  or
+       (*THEN:NAME).


          (*THEN) or (*THEN:NAME)


@@ -8850,87 +8858,87 @@
        track to whatever came before the  entire  group.  If  (*THEN)  is  not
        inside an alternation, it acts like (*PRUNE).


-       The    behaviour   of   (*THEN:NAME)   is   the   not   the   same   as
-       (*MARK:NAME)(*THEN).  It is like  (*MARK:NAME)  in  that  the  name  is
-       remembered  for  passing  back  to  the  caller.  However, (*SKIP:NAME)
-       searches only for  names  set  with  (*MARK),  ignoring  those  set  by
-       (*PRUNE) and (*THEN).
+       The  behaviour  of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
+       It is like (*MARK:NAME) in that the name is remembered for passing back
+       to  the  caller. However, (*SKIP:NAME) searches only for names set with
+       (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).


-       A  subpattern that does not contain a | character is just a part of the
-       enclosing alternative; it is not a nested  alternation  with  only  one
-       alternative.  The effect of (*THEN) extends beyond such a subpattern to
-       the enclosing alternative. Consider this pattern, where A, B, etc.  are
-       complex  pattern fragments that do not contain any | characters at this
+       A subpattern that does not contain a | character is just a part of  the
+       enclosing  alternative;  it  is  not a nested alternation with only one
+       alternative. The effect of (*THEN) extends beyond such a subpattern  to
+       the  enclosing alternative. Consider this pattern, where A, B, etc. are
+       complex pattern fragments that do not contain any | characters at  this
        level:


          A (B(*THEN)C) | D


-       If A and B are matched, but there is a failure in C, matching does  not
+       If  A and B are matched, but there is a failure in C, matching does not
        backtrack into A; instead it moves to the next alternative, that is, D.
-       However, if the subpattern containing (*THEN) is given an  alternative,
+       However,  if the subpattern containing (*THEN) is given an alternative,
        it behaves differently:


          A (B(*THEN)C | (*FAIL)) | D


-       The  effect of (*THEN) is now confined to the inner subpattern. After a
+       The effect of (*THEN) is now confined to the inner subpattern. After  a
        failure in C, matching moves to (*FAIL), which causes the whole subpat-
-       tern  to  fail  because  there are no more alternatives to try. In this
+       tern to fail because there are no more alternatives  to  try.  In  this
        case, matching does now backtrack into A.


-       Note that a conditional subpattern is  not  considered  as  having  two
-       alternatives,  because  only  one  is  ever used. In other words, the |
+       Note  that  a  conditional  subpattern  is not considered as having two
+       alternatives, because only one is ever used.  In  other  words,  the  |
        character in a conditional subpattern has a different meaning. Ignoring
        white space, consider:


          ^.*? (?(?=a) a | b(*THEN)c )


-       If  the  subject  is  "ba", this pattern does not match. Because .*? is
-       ungreedy, it initially matches zero  characters.  The  condition  (?=a)
-       then  fails,  the  character  "b"  is  matched, but "c" is not. At this
-       point, matching does not backtrack to .*? as might perhaps be  expected
-       from  the  presence  of  the | character. The conditional subpattern is
+       If the subject is "ba", this pattern does not  match.  Because  .*?  is
+       ungreedy,  it  initially  matches  zero characters. The condition (?=a)
+       then fails, the character "b" is matched,  but  "c"  is  not.  At  this
+       point,  matching does not backtrack to .*? as might perhaps be expected
+       from the presence of the | character.  The  conditional  subpattern  is
        part of the single alternative that comprises the whole pattern, and so
-       the  match  fails.  (If  there was a backtrack into .*?, allowing it to
+       the match fails. (If there was a backtrack into  .*?,  allowing  it  to
        match "b", the match would succeed.)


-       The verbs just described provide four different "strengths" of  control
+       The  verbs just described provide four different "strengths" of control
        when subsequent matching fails. (*THEN) is the weakest, carrying on the
-       match at the next alternative. (*PRUNE) comes next, failing  the  match
-       at  the  current starting position, but allowing an advance to the next
-       character (for an unanchored pattern). (*SKIP) is similar, except  that
+       match  at  the next alternative. (*PRUNE) comes next, failing the match
+       at the current starting position, but allowing an advance to  the  next
+       character  (for an unanchored pattern). (*SKIP) is similar, except that
        the advance may be more than one character. (*COMMIT) is the strongest,
        causing the entire match to fail.


    More than one backtracking verb


-       If more than one backtracking verb is present in  a  pattern,  the  one
-       that  is  backtracked  onto first acts. For example, consider this pat-
+       If  more  than  one  backtracking verb is present in a pattern, the one
+       that is backtracked onto first acts. For example,  consider  this  pat-
        tern, where A, B, etc. are complex pattern fragments:


          (A(*COMMIT)B(*THEN)C|ABD)


-       If A matches but B fails, the backtrack to (*COMMIT) causes the  entire
+       If  A matches but B fails, the backtrack to (*COMMIT) causes the entire
        match to fail. However, if A and B match, but C fails, the backtrack to
-       (*THEN) causes the next alternative (ABD) to be tried.  This  behaviour
-       is  consistent,  but is not always the same as Perl's. It means that if
-       two or more backtracking verbs appear in succession, all the  the  last
+       (*THEN)  causes  the next alternative (ABD) to be tried. This behaviour
+       is consistent, but is not always the same as Perl's. It means  that  if
+       two  or  more backtracking verbs appear in succession, all the the last
        of them has no effect. Consider this example:


          ...(*COMMIT)(*PRUNE)...


        If there is a matching failure to the right, backtracking onto (*PRUNE)
-       causes it to be triggered, and its action is taken. There can never  be
+       causes  it to be triggered, and its action is taken. There can never be
        a backtrack onto (*COMMIT).


    Backtracking verbs in repeated groups


-       PCRE2  differs  from  Perl  in  its  handling  of backtracking verbs in
-       repeated groups. For example, consider:
+       PCRE2 sometimes differs from Perl in its handling of backtracking verbs
+       in repeated groups. For example, consider:


          /(a(*COMMIT)b)+ac/


-       If the subject is "abac", Perl matches, but  PCRE2  fails  because  the
-       (*COMMIT) in the second repeat of the group acts.
+       If  the  subject  is  "abac", Perl matches unless its optimizations are
+       disabled, but PCRE2 always fails because the (*COMMIT)  in  the  second
+       repeat of the group acts.


    Backtracking verbs in assertions


@@ -8940,29 +8948,30 @@
        in a conditional subpattern.


        (*ACCEPT) in a standalone positive assertion causes  the  assertion  to
-       succeed  without any further processing; captured strings are retained.
-       In a standalone negative assertion, (*ACCEPT) causes the  assertion  to
-       fail without any further processing; captured substrings are discarded.
+       succeed  without any further processing; captured strings and a (*MARK)
+       name (if  set)  are  retained.  In  a  standalone  negative  assertion,
+       (*ACCEPT)  causes the assertion to fail without any further processing;
+       captured substrings and any (*MARK) name are discarded.


-       If  the  assertion is a condition, (*ACCEPT) causes the condition to be
-       true for a positive assertion and false for a  negative  one;  captured
+       If the assertion is a condition, (*ACCEPT) causes the condition  to  be
+       true  for  a  positive assertion and false for a negative one; captured
        substrings are retained in both cases.


        The remaining verbs act only when a later failure causes a backtrack to
-       reach them. This means that their effect is confined to the  assertion,
+       reach  them. This means that their effect is confined to the assertion,
        because lookaround assertions are atomic. A backtrack that occurs after
        an assertion is complete does not jump back into the assertion. Note in
-       particular  that  a  (*MARK)  name  that  is set in an assertion is not
+       particular that a (*MARK) name that is  set  in  an  assertion  is  not
        "seen" by an instance of (*SKIP:NAME) latter in the pattern.


-       The effect of (*THEN) is not allowed to escape beyond an assertion.  If
-       there  are no more branches to try, (*THEN) causes a positive assertion
+       The  effect of (*THEN) is not allowed to escape beyond an assertion. If
+       there are no more branches to try, (*THEN) causes a positive  assertion
        to be false, and a negative assertion to be true.


-       The other backtracking verbs are not treated specially if  they  appear
-       in  a  standalone  positive assertion. In a conditional positive asser-
+       The  other  backtracking verbs are not treated specially if they appear
+       in a standalone positive assertion. In a  conditional  positive  asser-
        tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
-       or  (*PRUNE) causes the condition to be false. However, for both stand-
+       or (*PRUNE) causes the condition to be false. However, for both  stand-
        alone and conditional negative assertions, backtracking into (*COMMIT),
        (*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
        ing any further alternative branches.
@@ -8969,16 +8978,17 @@


    Backtracking verbs in subroutines


-       These behaviours occur whether or not the subpattern is  called  recur-
-       sively.  Perl's treatment of subroutines is different in some cases.
+       These  behaviours  occur whether or not the subpattern is called recur-
+       sively.


+       (*ACCEPT) in a subpattern called as a subroutine causes the  subroutine
+       match  to succeed without any further processing. Matching then contin-
+       ues after the subroutine call. Perl documents  this  behaviour.  Perl's
+       treatment of the other verbs in subroutines is different in some cases.
+
        (*FAIL)  in  a subpattern called as a subroutine has its normal effect:
        it forces an immediate backtrack.


-       (*ACCEPT) in a subpattern called as a subroutine causes the  subroutine
-       match  to succeed without any further processing. Matching then contin-
-       ues after the subroutine call.
-
        (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
        cause the subroutine match to fail.


@@ -9002,7 +9012,7 @@

REVISION

-       Last updated: 16 July 2018
+       Last updated: 20 July 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -10226,7 +10236,11 @@

BACKTRACKING CONTROL

-       The following act immediately they are reached:
+       All backtracking control verbs may be in  the  form  (*VERB:NAME).  For
+       (*MARK)  the  name is mandatory, for the others it is optional. (*SKIP)
+       changes its behaviour if :NAME is present. The others just set  a  name
+       for passing back to the caller, but this is not a name that (*SKIP) can
+       see. The following act immediately they are reached:


          (*ACCEPT)       force successful match
          (*FAIL)         force backtrack; synonym (*F)
@@ -10239,14 +10253,15 @@


          (*COMMIT)       overall failure, no advance of starting point
          (*PRUNE)        advance to next starting character
-         (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
          (*SKIP)         advance to current matching position
          (*SKIP:NAME)    advance to position corresponding to an earlier
                          (*MARK:NAME); if not found, the (*SKIP) is ignored
          (*THEN)         local failure, backtrack to next alternation
-         (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)


+       The  effect  of one of these verbs in a group called as a subroutine is
+       confined to the subroutine call.


+
CALLOUTS

          (?C)            callout (assumed number 0)
@@ -10254,14 +10269,14 @@
          (?C"text")      callout with string data


        The allowed string delimiters are ` ' " ^ % # $ (which are the same for
-       the start and the end), and the starting delimiter { matched  with  the
-       ending  delimiter  }. To encode the ending delimiter within the string,
+       the  start  and the end), and the starting delimiter { matched with the
+       ending delimiter }. To encode the ending delimiter within  the  string,
        double it.



SEE ALSO

-       pcre2pattern(3),   pcre2api(3),   pcre2callout(3),    pcre2matching(3),
+       pcre2pattern(3),    pcre2api(3),   pcre2callout(3),   pcre2matching(3),
        pcre2(3).



@@ -10274,7 +10289,7 @@

REVISION

-       Last updated: 07 July 2018
+       Last updated: 21 July 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------



Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/pcre2pattern.3    2018-07-21 14:34:51 UTC (rev 968)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "16 July 2018" "PCRE2 10.32"
+.TH PCRE2PATTERN 3 "20 July 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -3154,17 +3154,16 @@
 .\"
 documentation.
 .P
-Experiments with Perl suggest that it too has similar optimizations, sometimes
-leading to anomalous results.
+Experiments with Perl suggest that it too has similar optimizations, and like 
+PCRE2, turning them off can change the result of a match.
 .
 .
 .SS "Verbs that act immediately"
 .rs
 .sp
-The following verbs act as soon as they are encountered. They may not be
-followed by a name.
+The following verbs act as soon as they are encountered.
 .sp
-   (*ACCEPT)
+   (*ACCEPT) or (*ACCEPT:NAME)
 .sp
 This verb causes the match to end successfully, skipping the remainder of the
 pattern. However, when it is inside a subpattern that is called as a
@@ -3180,18 +3179,21 @@
 This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
 the outer parentheses.
 .sp
-  (*FAIL) or (*F)
+  (*FAIL) or (*FAIL:NAME)
 .sp
-This verb causes a matching failure, forcing backtracking to occur. It is
-equivalent to (?!) but easier to read. The Perl documentation notes that it is
-probably useful only when combined with (?{}) or (??{}). Those are, of course,
-Perl features that are not present in PCRE2. The nearest equivalent is the
-callout feature, as for example in this pattern:
+This verb causes a matching failure, forcing backtracking to occur. It may be 
+abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
+documentation notes that it is probably useful only when combined with (?{}) or
+(??{}). Those are, of course, Perl features that are not present in PCRE2. The
+nearest equivalent is the callout feature, as for example in this pattern:
 .sp
   a+(?C)(*FAIL)
 .sp
 A match with the string "aaaa" always fails, but the callout is taken before
 each backtrack happens (in this example, 10 times).
+.P
+(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as 
+(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
 .
 .
 .SS "Recording which path was taken"
@@ -3220,9 +3222,9 @@
 assertions and atomic groups. (There are differences in those cases when 
 (*MARK) is used in conjunction with (*SKIP) as described below.)
 .P
-As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
-arguments. Whichever is last on the matching path is passed back. See below for 
-more details of these other verbs.
+As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
+associated NAME arguments. Whichever is last on the matching path is passed
+back. See below for more details of these other verbs.
 .P
 Here is an example of \fBpcre2test\fP output, where the "mark" modifier
 requests the retrieval and outputting of (*MARK) data:
@@ -3282,23 +3284,25 @@
 not in a subroutine or an assertion. Subsequent sections cover these special
 cases.
 .sp
-  (*COMMIT)
+  (*COMMIT) or (*COMMIT:NAME)
 .sp
-This verb, which may not be followed by a name, causes the whole match to fail
-outright if there is a later matching failure that causes backtracking to reach
-it. Even if the pattern is unanchored, no further attempts to find a match by
-advancing the starting point take place. If (*COMMIT) is the only backtracking
-verb that is encountered, once it has been passed \fBpcre2_match()\fP is
-committed to finding a match at the current starting point, or not at all. For
-example:
+This verb causes the whole match to fail outright if there is a later matching
+failure that causes backtracking to reach it. Even if the pattern is
+unanchored, no further attempts to find a match by advancing the starting point
+take place. If (*COMMIT) is the only backtracking verb that is encountered,
+once it has been passed \fBpcre2_match()\fP is committed to finding a match at
+the current starting point, or not at all. For example:
 .sp
   a+(*COMMIT)b
 .sp
 This matches "xxaab" but not "aacaab". It can be thought of as a kind of
-dynamic anchor, or "I've started, so I must finish." The name of the most
-recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
-match failure.
+dynamic anchor, or "I've started, so I must finish." 
 .P
+The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
+caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
+.P
 If there is more than one backtracking verb in a pattern, a different one that
 follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
 match does not always guarantee that a match must be at this starting point.
@@ -3338,7 +3342,7 @@
 The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
 like (*MARK:NAME) in that the name is remembered for passing back to the
 caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) or (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
 .sp
   (*SKIP)
 .sp
@@ -3346,7 +3350,7 @@
 pattern is unanchored, the "bumpalong" advance is not to the next character,
 but to the position in the subject where (*SKIP) was encountered. (*SKIP)
 signifies that whatever text was matched leading up to it cannot be part of a
-successful match. Consider:
+successful match if there is a later mismatch. Consider:
 .sp
   a+(*SKIP)b
 .sp
@@ -3391,7 +3395,7 @@
 the second branch of the pattern.
 .P
 Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
-names that are set by (*PRUNE:NAME) or (*THEN:NAME).
+names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
 .sp
   (*THEN) or (*THEN:NAME)
 .sp
@@ -3409,10 +3413,10 @@
 more alternatives, so there is a backtrack to whatever came before the entire
 group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
 .P
-The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
-It is like (*MARK:NAME) in that the name is remembered for passing back to the
+The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
 caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
-ignoring those set by (*PRUNE) and (*THEN).
+ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
 .P
 A subpattern that does not contain a | character is just a part of the
 enclosing alternative; it is not a nested alternation with only one
@@ -3485,13 +3489,14 @@
 .SS "Backtracking verbs in repeated groups"
 .rs
 .sp
-PCRE2 differs from Perl in its handling of backtracking verbs in repeated
-groups. For example, consider:
+PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
+repeated groups. For example, consider:
 .sp
   /(a(*COMMIT)b)+ac/
 .sp
-If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
-in the second repeat of the group acts.
+If the subject is "abac", Perl matches unless its optimizations are disabled,
+but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
+acts.
 .
 .
 .\" HTML <a name="btassert"></a>
@@ -3504,9 +3509,10 @@
 subpattern.
 .P
 (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
-without any further processing; captured strings are retained. In a standalone
-negative assertion, (*ACCEPT) causes the assertion to fail without any further
-processing; captured substrings are discarded.
+without any further processing; captured strings and a (*MARK) name (if set)
+are retained. In a standalone negative assertion, (*ACCEPT) causes the
+assertion to fail without any further processing; captured substrings and any 
+(*MARK) name are discarded.
 .P
 If the assertion is a condition, (*ACCEPT) causes the condition to be true for
 a positive assertion and false for a negative one; captured substrings are
@@ -3536,15 +3542,15 @@
 .rs
 .sp
 These behaviours occur whether or not the subpattern is called recursively.
-Perl's treatment of subroutines is different in some cases.
 .P
+(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
+succeed without any further processing. Matching then continues after the
+subroutine call. Perl documents this behaviour. Perl's treatment of the other
+verbs in subroutines is different in some cases.
+.P
 (*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
 an immediate backtrack.
 .P
-(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
-succeed without any further processing. Matching then continues after the
-subroutine call.
-.P
 (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
 the subroutine match to fail.
 .P
@@ -3574,6 +3580,6 @@
 .rs
 .sp
 .nf
-Last updated: 16 July 2018
+Last updated: 20 July 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2syntax.3
===================================================================
--- code/trunk/doc/pcre2syntax.3    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/pcre2syntax.3    2018-07-21 14:34:51 UTC (rev 968)
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "07 July 2018" "PCRE2 10.32"
+.TH PCRE2SYNTAX 3 "21 July 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -410,8 +410,6 @@
   (?>...)         atomic, non-capturing group
 .
 .
-.
-.
 .SH "COMMENT"
 .rs
 .sp
@@ -552,7 +550,11 @@
 .SH "BACKTRACKING CONTROL"
 .rs
 .sp
-The following act immediately they are reached:
+All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
+name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
+if :NAME is present. The others just set a name for passing back to the caller,
+but this is not a name that (*SKIP) can see. The following act immediately they
+are reached:
 .sp
   (*ACCEPT)       force successful match
   (*FAIL)         force backtrack; synonym (*F)
@@ -565,12 +567,13 @@
 .sp
   (*COMMIT)       overall failure, no advance of starting point
   (*PRUNE)        advance to next starting character
-  (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
   (*SKIP)         advance to current matching position
   (*SKIP:NAME)    advance to position corresponding to an earlier
                   (*MARK:NAME); if not found, the (*SKIP) is ignored
   (*THEN)         local failure, backtrack to next alternation
-  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
+.sp
+The effect of one of these verbs in a group called as a subroutine is confined 
+to the subroutine call.   
 .
 .
 .SH "CALLOUTS"
@@ -606,6 +609,6 @@
 .rs
 .sp
 .nf
-Last updated: 07 July 2018
+Last updated: 21 July 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/pcre2test.1    2018-07-21 14:34:51 UTC (rev 968)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "16 July 2018" "PCRE 10.32"
+.TH PCRE2TEST 1 "21 July 2018" "PCRE 10.32"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -360,10 +360,11 @@
 The appearance of this line causes all subsequent modifier settings to be
 checked for compatibility with the \fBperltest.sh\fP script, which is used to
 confirm that Perl gives the same results as PCRE2. Also, apart from comment
-lines, none of the other command lines are permitted, because they and many
-of the modifiers are specific to \fBpcre2test\fP, and should not be used in
-test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
-command helps detect tests that are accidentally put in the wrong file.
+lines, #pattern commands, and #subject commands that set or unset "mark", no
+command lines are permitted, because they and many of the modifiers are
+specific to \fBpcre2test\fP, and should not be used in test files that are also
+processed by \fBperltest.sh\fP. The \fB#perltest\fP command helps detect tests
+that are accidentally put in the wrong file.
 .sp
   #pop [<modifiers>]
   #popcopy [<modifiers>]
@@ -1981,6 +1982,6 @@
 .rs
 .sp
 .nf
-Last updated: 16 July 2018
+Last updated: 21 July 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/doc/pcre2test.txt    2018-07-21 14:34:51 UTC (rev 968)
@@ -344,11 +344,11 @@
        The  appearance of this line causes all subsequent modifier settings to
        be checked for compatibility with the perltest.sh script, which is used
        to  confirm that Perl gives the same results as PCRE2. Also, apart from
-       comment lines, none of the other command lines are  permitted,  because
-       they  and  many  of the modifiers are specific to pcre2test, and should
-       not be used in test files that are also processed by  perltest.sh.  The
-       #perltest  command  helps detect tests that are accidentally put in the
-       wrong file.
+       comment lines, #pattern commands, and #subject  commands  that  set  or
+       unset  "mark", no command lines are permitted, because they and many of
+       the modifiers are specific to pcre2test, and should not be used in test
+       files  that  are  also  processed by perltest.sh. The #perltest command
+       helps detect tests that are accidentally put in the wrong file.


          #pop [<modifiers>]
          #popcopy [<modifiers>]
@@ -1818,5 +1818,5 @@


REVISION

-       Last updated: 16 July 2018
+       Last updated: 21 July 2018
        Copyright (c) 1997-2018 University of Cambridge.


Modified: code/trunk/perltest.sh
===================================================================
--- code/trunk/perltest.sh    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/perltest.sh    2018-07-21 14:34:51 UTC (rev 968)
@@ -45,17 +45,19 @@
 #   jitstack           ignored
 #   mark               show mark information
 #   no_auto_possess    ignored
-#   no_start_optimize  insert ({""}) at pattern start (disable Perl optimizing)
+#   no_start_optimize  insert (??{""}) at pattern start (disables optimizing)
 #   subject_literal    does not process subjects for escapes
 #   ucp                sets Perl's /u modifier
 #   utf                invoke UTF-8 functionality
 #
 # Comment lines are ignored. The #pattern command can be used to set modifiers
-# that will be added to each subsequent pattern. NOTE: this is different to
-# pcre2test where #pattern sets defaults, some of which can be overridden on
-# individual patterns. The #perltest, #forbid_utf, and #newline_default
-# commands, which are needed in the relevant pcre2test files, are ignored. Any
-# other #-command is ignored, with a warning message.
+# that will be added to each subsequent pattern, after any modifiers it may
+# already have. NOTE: this is different to pcre2test where #pattern sets
+# defaults which can be overridden on individual patterns. The #subject command
+# may be used to set or unset a default "mark" modifier for data lines. This is
+# the only use of #subject that is supported. The #perltest, #forbid_utf, and
+# #newline_default commands, which are needed in the relevant pcre2test files,
+# are ignored. Any other #-command is ignored, with a warning message.
 #
 # The data lines must not have any pcre2test modifiers. Unless
 # "subject_literal" is on the pattern, data lines are processed as
@@ -135,23 +137,39 @@
   last if ! ($_ = <$infile>);
   printf $outfile "$_" if ! $interact;
   next if ($_ =~ /^\s*$/ || $_ =~ /^#[\s!]/);
-  
+
   # A few of pcre2test's #-commands are supported, or just ignored. Any others
-  # cause an error.  
-   
+  # cause an error.
+
   if ($_ =~ /^#pattern(.*)/)
     {
     $extra_modifiers = $1;
-    chomp($extra_modifiers); 
+    chomp($extra_modifiers);
     $extra_modifiers =~ s/\s+$//;
     next;
-    }  
+    }
+  elsif ($_ =~ /^#subject(.*)/)
+    {
+    $mod = $1;
+    chomp($mod);
+    $mod =~ s/\s+$//;
+    if ($mod =~ s/(-?)mark,?//)
+      {
+      $minus = $1;
+      $default_show_mark = ($minus =~ /^$/);
+      }
+    if ($mod !~ /^\s*$/)
+      {
+      printf $outfile "** Warning: \"$mod\" in #subject ignored\n";
+      }
+    next;
+    }
   elsif ($_ =~ /^#/)
     {
-    if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/)    
+    if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/)
       {
       printf $outfile "** Warning: #-command ignored: %s", $_;
-      }   
+      }
     next;
     }


@@ -172,9 +190,9 @@

$pattern =~ /^\s*((.).*\2)(.*)$/s;
$pat = $1;
+ $del = $2;
$mod = "$3,$extra_modifiers";
- $mod =~ s/^,\s*//;
- $del = $2;
+ $mod =~ s/^,\s*//;

# The private "aftertext" modifier means "print $' afterwards".

@@ -202,7 +220,7 @@

# The "mark" modifier requests checking of MARK data */

- $show_mark = ($mod =~ s/mark,?//);
+ $show_mark = $default_show_mark | ($mod =~ s/mark,?//);

# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl

@@ -214,7 +232,7 @@

# Use no_start_optimize (disable PCRE2 start-up optimization) to disable Perl
# optimization by inserting (??{""}) at the start of the pattern.
-
+
if ($mod =~ s/no_start_optimize,?//) { $pat =~ s/$del/$del(??{""})/; }

# Add back retained modifiers and check that the pattern is valid.

Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2.h.in    2018-07-21 14:34:51 UTC (rev 968)
@@ -281,6 +281,7 @@
 #define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE       156
 #define PCRE2_ERROR_BACKSLASH_G_SYNTAX             157
 #define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158
+/* Error 159 is obsolete and should now never occur */
 #define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED      159
 #define PCRE2_ERROR_VERB_UNKNOWN                   160
 #define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG      161


Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2_compile.c    2018-07-21 14:34:51 UTC (rev 968)
@@ -7,7 +7,7 @@


                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2017 University of Cambridge
+          New API code Copyright (c) 2016-2018 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -250,34 +250,35 @@
 #define META_LOOKBEHINDNOT    0x80250000u  /* (?<! */


/* These must be kept in this order, with consecutive values, and the _ARG
-versions of PRUNE, SKIP, and THEN immediately after their non-argument
+versions of COMMIT, PRUNE, SKIP, and THEN immediately after their non-argument
versions. */

 #define META_MARK             0x80260000u  /* (*MARK) */
 #define META_ACCEPT           0x80270000u  /* (*ACCEPT) */
-#define META_COMMIT           0x80280000u  /* (*COMMIT) */
-#define META_FAIL             0x80290000u  /* (*FAIL) */
-#define META_PRUNE            0x802a0000u  /* These pairs must    */
-#define META_PRUNE_ARG        0x802b0000u  /*   be                */
-#define META_SKIP             0x802c0000u  /*     kept            */
-#define META_SKIP_ARG         0x802d0000u  /*         in          */
-#define META_THEN             0x802e0000u  /*           this      */
-#define META_THEN_ARG         0x802f0000u  /*               order */
+#define META_FAIL             0x80280000u  /* (*FAIL) */
+#define META_COMMIT           0x80290000u  /* These               */
+#define META_COMMIT_ARG       0x802a0000u  /*   pairs             */
+#define META_PRUNE            0x802b0000u  /*     must            */
+#define META_PRUNE_ARG        0x802c0000u  /*       be            */
+#define META_SKIP             0x802d0000u  /*         kept        */
+#define META_SKIP_ARG         0x802e0000u  /*           in        */
+#define META_THEN             0x802f0000u  /*             this    */
+#define META_THEN_ARG         0x80300000u  /*               order */  


/* These must be kept in groups of adjacent 3 values, and all together. */

-#define META_ASTERISK         0x80300000u  /* *  */
-#define META_ASTERISK_PLUS    0x80310000u  /* *+ */
-#define META_ASTERISK_QUERY   0x80320000u  /* *? */
-#define META_PLUS             0x80330000u  /* +  */
-#define META_PLUS_PLUS        0x80340000u  /* ++ */
-#define META_PLUS_QUERY       0x80350000u  /* +? */
-#define META_QUERY            0x80360000u  /* ?  */
-#define META_QUERY_PLUS       0x80370000u  /* ?+ */
-#define META_QUERY_QUERY      0x80380000u  /* ?? */
-#define META_MINMAX           0x80390000u  /* {n,m}  repeat */
-#define META_MINMAX_PLUS      0x803a0000u  /* {n,m}+ repeat */
-#define META_MINMAX_QUERY     0x803b0000u  /* {n,m}? repeat */
+#define META_ASTERISK         0x80310000u  /* *  */
+#define META_ASTERISK_PLUS    0x80320000u  /* *+ */
+#define META_ASTERISK_QUERY   0x80330000u  /* *? */
+#define META_PLUS             0x80340000u  /* +  */
+#define META_PLUS_PLUS        0x80350000u  /* ++ */
+#define META_PLUS_QUERY       0x80360000u  /* +? */
+#define META_QUERY            0x80370000u  /* ?  */
+#define META_QUERY_PLUS       0x80380000u  /* ?+ */
+#define META_QUERY_QUERY      0x80390000u  /* ?? */
+#define META_MINMAX           0x803a0000u  /* {n,m}  repeat */
+#define META_MINMAX_PLUS      0x803b0000u  /* {n,m}+ repeat */
+#define META_MINMAX_QUERY     0x803c0000u  /* {n,m}? repeat */


 #define META_FIRST_QUANTIFIER META_ASTERISK
 #define META_LAST_QUANTIFIER  META_MINMAX_QUERY
@@ -327,8 +328,9 @@
   SIZEOFFSET,    /* META_LOOKBEHINDNOT */
   1,             /* META_MARK - plus the string length */
   0,             /* META_ACCEPT */
+  0,             /* META_FAIL */
   0,             /* META_COMMIT */
-  0,             /* META_FAIL */
+  1,             /* META_COMMIT_ARG - plus the string length */ 
   0,             /* META_PRUNE */
   1,             /* META_PRUNE_ARG - plus the string length */
   0,             /* META_SKIP */
@@ -586,9 +588,9 @@
   "\0"                       /* Empty name is a shorthand for MARK */
   STRING_MARK0
   STRING_ACCEPT0
-  STRING_COMMIT0
   STRING_F0
   STRING_FAIL0
+  STRING_COMMIT0
   STRING_PRUNE0
   STRING_SKIP0
   STRING_THEN;
@@ -596,11 +598,11 @@
 static const verbitem verbs[] = {
   { 0, META_MARK,   +1 },  /* > 0 => must have an argument */
   { 4, META_MARK,   +1 },
-  { 6, META_ACCEPT, -1 },  /* < 0 => must not have an argument */
-  { 6, META_COMMIT, -1 },
+  { 6, META_ACCEPT, -1 },  /* < 0 => Optional argument, convert to pre-MARK */
   { 1, META_FAIL,   -1 },
   { 4, META_FAIL,   -1 },
-  { 5, META_PRUNE,   0 },  /* Argument is optional; bump META code if found */
+  { 6, META_COMMIT,  0 },
+  { 5, META_PRUNE,   0 },  /* Optional argument; bump META code if found */
   { 4, META_SKIP,    0 },
   { 4, META_THEN,    0 }
 };
@@ -610,8 +612,8 @@
 /* Verb opcodes, indexed by their META code offset from META_MARK. */


static const uint32_t verbops[] = {
- OP_MARK, OP_ACCEPT, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_PRUNE_ARG, OP_SKIP,
- OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
+ OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
+ OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };

/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */

@@ -976,8 +978,8 @@
     case META_POSIX_NEG: fprintf(stderr, "META_POSIX_NEG %d", *pptr++); break;


     case META_ACCEPT: fprintf(stderr, "META (*ACCEPT)"); break;
+    case META_FAIL: fprintf(stderr, "META (*FAIL)"); break;
     case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break;
-    case META_FAIL: fprintf(stderr, "META (*FAIL)"); break;
     case META_PRUNE: fprintf(stderr, "META (*PRUNE)"); break;
     case META_SKIP: fprintf(stderr, "META (*SKIP)"); break;
     case META_THEN: fprintf(stderr, "META (*THEN)"); break;
@@ -1067,6 +1069,10 @@
     fprintf(stderr, "META (*MARK:");
     goto SHOWARG;


+    case META_COMMIT_ARG:
+    fprintf(stderr, "META (*COMMIT:");
+    goto SHOWARG;
+
     case META_PRUNE_ARG:
     fprintf(stderr, "META (*PRUNE:");
     goto SHOWARG;
@@ -2290,6 +2296,7 @@
 uint32_t *parsed_pattern = cb->parsed_pattern;
 uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
 uint32_t meta_quantifier = 0;
+uint32_t add_after_mark = 0;
 uint16_t nest_depth = 0;
 int after_manual_callout = 0;
 int expect_cond_assert = 0;
@@ -2461,6 +2468,16 @@
         goto FAILED;
         }
       *verblengthptr = (uint32_t)verbnamelength;
+      
+      /* If this name was on a verb such as (*ACCEPT) which does not continue,
+      a (*MARK) was generated for the name. We now add the original verb as the 
+      next item. */  
+
+      if (add_after_mark != 0)
+        {
+        *parsed_pattern++ = add_after_mark;
+        add_after_mark = 0;   
+        }
       break;


       case CHAR_BACKSLASH:
@@ -3454,13 +3471,25 @@


         if (*ptr++ == CHAR_COLON)   /* Skip past : or ) */
           {
-          if (verbs[i].has_arg < 0)  /* Argument is forbidden */
+          /* Some optional arguments can be treated as a preceding (*MARK) */
+ 
+          if (verbs[i].has_arg < 0)
             {
-            errorcode = ERR59;
-            goto FAILED;
+            add_after_mark = verbs[i].meta;
+            *parsed_pattern++ = META_MARK; 
             }
-          *parsed_pattern++ = verbs[i].meta +
-            ((verbs[i].meta != META_MARK)? 0x00010000u:0);
+            
+          /* The remaining verbs with arguments (except *MARK) need a different
+          opcode. */
+          
+          else
+            {  
+            *parsed_pattern++ = verbs[i].meta +
+              ((verbs[i].meta != META_MARK)? 0x00010000u:0);
+            }   
+            
+          /* Set up for reading the name in the main loop. */
+
           verblengthptr = parsed_pattern++;
           verbnamestart = ptr;
           inverbname = TRUE;
@@ -5654,6 +5683,7 @@
     cb->had_pruneorskip = TRUE;
     /* Fall through */
     case META_MARK:
+    case META_COMMIT_ARG: 
     VERB_ARG:
     *code++ = verbops[(meta - META_MARK) >> 16];
     /* The length is in characters. */
@@ -8002,6 +8032,7 @@
       break;


       case OP_MARK:
+      case OP_COMMIT_ARG: 
       case OP_PRUNE_ARG:
       case OP_SKIP_ARG:
       case OP_THEN_ARG:
@@ -8310,6 +8341,7 @@
     break;


     case META_MARK:     /* Add the length of the name. */
+    case META_COMMIT_ARG: 
     case META_PRUNE_ARG:
     case META_SKIP_ARG:
     case META_THEN_ARG:
@@ -8500,6 +8532,7 @@
     goto EXIT;


     case META_MARK:
+    case META_COMMIT_ARG: 
     case META_PRUNE_ARG:
     case META_SKIP_ARG:
     case META_THEN_ARG:
@@ -8967,6 +9000,7 @@
     break;


     case META_MARK:
+    case META_COMMIT_ARG: 
     case META_PRUNE_ARG:
     case META_SKIP_ARG:
     case META_THEN_ARG:


Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2_dfa_match.c    2018-07-21 14:34:51 UTC (rev 968)
@@ -181,7 +181,8 @@
   0, 0, 0,                       /* BRAZERO, BRAMINZERO, BRAPOSZERO        */
   0, 0, 0,                       /* MARK, PRUNE, PRUNE_ARG                 */
   0, 0, 0, 0,                    /* SKIP, SKIP_ARG, THEN, THEN_ARG         */
-  0, 0, 0, 0,                    /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT    */
+  0, 0,                          /* COMMIT, COMMIT_ARG                     */
+  0, 0, 0,                       /* FAIL, ACCEPT, ASSERT_ACCEPT            */
   0, 0, 0                        /* CLOSE, SKIPZERO, DEFINE                */
 };


@@ -254,7 +255,8 @@
   0, 0, 0,                       /* BRAZERO, BRAMINZERO, BRAPOSZERO        */
   0, 0, 0,                       /* MARK, PRUNE, PRUNE_ARG                 */
   0, 0, 0, 0,                    /* SKIP, SKIP_ARG, THEN, THEN_ARG         */
-  0, 0, 0, 0,                    /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT    */
+  0, 0,                          /* COMMIT, COMMIT_ARG                     */
+  0, 0, 0,                       /* FAIL, ACCEPT, ASSERT_ACCEPT            */
   0, 0, 0                        /* CLOSE, SKIPZERO, DEFINE                */
 };



Modified: code/trunk/src/pcre2_error.c
===================================================================
--- code/trunk/src/pcre2_error.c    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2_error.c    2018-07-21 14:34:51 UTC (rev 968)
@@ -133,7 +133,8 @@
   "internal error: unknown newline setting\0"
   "\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
   "(?R (recursive pattern call) must be followed by a closing parenthesis\0"
-  "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0"
+  /* "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0" */
+  "obsolete error (should not occur)\0"  /* Was the above */
   /* 60 */
   "(*VERB) not recognized or malformed\0"
   "group number is too big\0"


Modified: code/trunk/src/pcre2_internal.h
===================================================================
--- code/trunk/src/pcre2_internal.h    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2_internal.h    2018-07-21 14:34:51 UTC (rev 968)
@@ -7,7 +7,7 @@


                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2017 University of Cambridge
+          New API code Copyright (c) 2016-2018 University of Cambridge


-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -253,7 +253,7 @@

#define START_FRAMES_SIZE 20480

-/* Similarly, for DFA matching, an initial internal workspace vector is
+/* Similarly, for DFA matching, an initial internal workspace vector is
allocated on the stack. */

 #define DFA_START_RWS_SIZE 30720
@@ -1583,23 +1583,26 @@
   OP_THEN,           /* 155 */
   OP_THEN_ARG,       /* 156 same, but with argument */
   OP_COMMIT,         /* 157 */
+  OP_COMMIT_ARG,     /* 158 same, but with argument */


- /* These are forced failure and success verbs */
+ /* These are forced failure and success verbs. FAIL and ACCEPT do accept an
+ argument, but these cases can be compiled as, for example, (*MARK:X)(*FAIL)
+ without the need for a special opcode. */

-  OP_FAIL,           /* 158 */
-  OP_ACCEPT,         /* 159 */
-  OP_ASSERT_ACCEPT,  /* 160 Used inside assertions */
-  OP_CLOSE,          /* 161 Used before OP_ACCEPT to close open captures */
+  OP_FAIL,           /* 159 */
+  OP_ACCEPT,         /* 160 */
+  OP_ASSERT_ACCEPT,  /* 161 Used inside assertions */
+  OP_CLOSE,          /* 162 Used before OP_ACCEPT to close open captures */


/* This is used to skip a subpattern with a {0} quantifier */

-  OP_SKIPZERO,       /* 162 */
+  OP_SKIPZERO,       /* 163 */


/* This is used to identify a DEFINE group during compilation so that it can
be checked for having only one branch. It is changed to OP_FALSE before
compilation finishes. */

-  OP_DEFINE,         /* 163 */
+  OP_DEFINE,         /* 164 */


   /* This is not an opcode, but is used to check that tables indexed by opcode
   are the correct length, in order to catch updating errors - there have been
@@ -1655,7 +1658,7 @@
   "Cond false", "Cond true",                                      \
   "Brazero", "Braminzero", "Braposzero",                          \
   "*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP",                  \
-  "*THEN", "*THEN", "*COMMIT", "*FAIL",                           \
+  "*THEN", "*THEN", "*COMMIT", "*COMMIT", "*FAIL",                \
   "*ACCEPT", "*ASSERT_ACCEPT",                                    \
   "Close", "Skip zero", "Define"


@@ -1747,7 +1750,8 @@
   3, 1, 3,                       /* MARK, PRUNE, PRUNE_ARG                 */ \
   1, 3,                          /* SKIP, SKIP_ARG                         */ \
   1, 3,                          /* THEN, THEN_ARG                         */ \
-  1, 1, 1, 1,                    /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT    */ \
+  1, 3,                          /* COMMIT, COMMIT_ARG                     */ \
+  1, 1, 1,                       /* FAIL, ACCEPT, ASSERT_ACCEPT            */ \
   1+IMM2_SIZE, 1,                /* CLOSE, SKIPZERO                        */ \
   1                              /* DEFINE                                 */



Modified: code/trunk/src/pcre2_jit_compile.c
===================================================================
--- code/trunk/src/pcre2_jit_compile.c    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2_jit_compile.c    2018-07-21 14:34:51 UTC (rev 968)
@@ -839,6 +839,7 @@
 #endif


   case OP_MARK:
+  case OP_COMMIT_ARG: 
   case OP_PRUNE_ARG:
   case OP_SKIP_ARG:
   case OP_THEN_ARG:
@@ -939,6 +940,7 @@
     common->control_head_ptr = 1;
     /* Fall through. */


+    case OP_COMMIT_ARG: 
     case OP_PRUNE_ARG:
     case OP_MARK:
     if (common->mark_ptr == 0)
@@ -1553,6 +1555,7 @@
     break;


     case OP_MARK:
+    case OP_COMMIT_ARG: 
     case OP_PRUNE_ARG:
     case OP_THEN_ARG:
     SLJIT_ASSERT(common->mark_ptr != 0);
@@ -1733,6 +1736,7 @@
     break;


     case OP_MARK:
+    case OP_COMMIT_ARG: 
     case OP_PRUNE_ARG:
     case OP_THEN_ARG:
     SLJIT_ASSERT(common->mark_ptr != 0);
@@ -2041,6 +2045,7 @@
     break;


     case OP_MARK:
+    case OP_COMMIT_ARG: 
     case OP_PRUNE_ARG:
     case OP_THEN_ARG:
     SLJIT_ASSERT(common->mark_ptr != 0);
@@ -2428,6 +2433,7 @@
     break;


     case OP_MARK:
+    case OP_COMMIT_ARG: 
     case OP_PRUNE_ARG:
     case OP_THEN_ARG:
     SLJIT_ASSERT(common->mark_ptr != 0);
@@ -10350,7 +10356,8 @@
 PCRE2_UCHAR opcode = *cc;
 PCRE2_SPTR ccend = cc + 1;


-if (opcode == OP_PRUNE_ARG || opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG)
+if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG || 
+    opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG)
   ccend += 2 + cc[1];


PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL);
@@ -10362,7 +10369,7 @@
return ccend;
}

-if (opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG)
+if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG)
   {
   OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0);
   OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2));
@@ -10681,6 +10688,7 @@
     case OP_THEN:
     case OP_THEN_ARG:
     case OP_COMMIT:
+    case OP_COMMIT_ARG: 
     cc = compile_control_verb_matchingpath(common, cc, parent);
     break;


@@ -11755,6 +11763,7 @@
     break;


     case OP_COMMIT:
+    case OP_COMMIT_ARG: 
     if (!common->local_quit_available)
       OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH);
     if (common->quit_label == NULL)


Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2_match.c    2018-07-21 14:34:51 UTC (rev 968)
@@ -149,7 +149,7 @@
 enum { RM1=1, RM2,  RM3,  RM4,  RM5,  RM6,  RM7,  RM8,  RM9,  RM10,
        RM11,  RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20,
        RM21,  RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30,
-       RM31,  RM32, RM33, RM34, RM35 };
+       RM31,  RM32, RM33, RM34, RM35, RM36 };


 #ifdef SUPPORT_WIDE_CHARS
 enum { RM100=100, RM101 };
@@ -770,7 +770,7 @@
     /* ===================================================================== */
     /* Real or forced end of the pattern, assertion, or recursion. In an
     assertion ACCEPT, update the last used pointer and remember the current
-    frame so that the captures can be fished out of it. */
+    frame so that the captures and mark can be fished out of it. */


     case OP_ASSERT_ACCEPT:
     if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
@@ -5119,7 +5119,7 @@
     /* Positive assertions are like other groups except that PCRE doesn't allow
     the effect of (*THEN) to escape beyond an assertion; it is therefore
     treated as NOMATCH. (*ACCEPT) is treated as successful assertion, with its
-    captures retained. Any other return is an error. */
+    captures and mark retained. Any other return is an error. */


#define Lframe_type F->temp_32[0]

@@ -5136,6 +5136,7 @@
               (char *)assert_accept_frame + offsetof(heapframe, ovector),
               assert_accept_frame->offset_top * sizeof(PCRE2_SIZE));
         Foffset_top = assert_accept_frame->offset_top;
+        Fmark = assert_accept_frame->mark; 
         break;
         }
       if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) RRETURN(rrc);
@@ -5837,6 +5838,13 @@
     mb->verb_current_recurse = Fcurrent_recurse;
     RRETURN(MATCH_COMMIT);


+    case OP_COMMIT_ARG:
+    Fmark = mb->nomatch_mark = Fecode + 2;
+    RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM36);
+    if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+    mb->verb_current_recurse = Fcurrent_recurse;
+    RRETURN(MATCH_COMMIT);
+
     case OP_PRUNE:
     RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM14);
     if (rrc != MATCH_NOMATCH) RRETURN(rrc);
@@ -5942,7 +5950,7 @@
   LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16)
   LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24)
   LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32)
-  LBL(33) LBL(34) LBL(35)
+  LBL(33) LBL(34) LBL(35) LBL(36)


#ifdef SUPPORT_WIDE_CHARS
LBL(100) LBL(101)

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/src/pcre2test.c    2018-07-21 14:34:51 UTC (rev 968)
@@ -4678,12 +4678,6 @@
 const char *cmdname;
 uint8_t *argptr, *serial;


-if (restrict_for_perl_test)
- {
- fprintf(outfile, "** #-commands are not allowed after #perltest\n");
- return PR_ABEND;
- }
-
yield = PR_OK;
cmd = CMD_UNKNOWN;
cmdlen = 0;
@@ -4702,6 +4696,12 @@

argptr = buffer + cmdlen + 1;

+if (restrict_for_perl_test && cmd != CMD_PATTERN && cmd != CMD_SUBJECT)
+ {
+ fprintf(outfile, "** #%s is not allowed after #perltest\n", cmdname);
+ return PR_ABEND;
+ }
+
switch(cmd)
{
case CMD_UNKNOWN:

Modified: code/trunk/testdata/testinput1
===================================================================
--- code/trunk/testdata/testinput1    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/testdata/testinput1    2018-07-21 14:34:51 UTC (rev 968)
@@ -6203,10 +6203,47 @@
 /a(?:(*:X))(*SKIP:X)(*F)|(.)/
     abc


-/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize
+#pattern no_start_optimize
+
+/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
     abc


-/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize
+/(?>a(*:1))(?>b)(*SKIP:1)x|.*/
     abc


+#subject mark
+
+/a(*ACCEPT:X)b/
+    abc
+    
+/(?=a(*ACCEPT:QQ)bc)axyz/
+    axyz
+
+/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
+    abc
+    
+/a(*F:X)b/
+    abc
+    
+/(?(DEFINE)(a(*F:X)))(?1)b/
+    abc
+
+/a(*COMMIT:X)b/
+    abc
+    
+/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
+    abc
+    
+/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
+    aaaabd
+
+/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
+    aaaabd
+
+/a(*COMMIT:X)b/
+    axabc
+
+#pattern -no_start_optimize
+#subject -mark 
+
 # End of testinput1 


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/testdata/testinput2    2018-07-21 14:34:51 UTC (rev 968)
@@ -2949,10 +2949,9 @@


/abc(*:)pqr/

-/abc(*FAIL:123)xyz/
-
# This should, and does, fail. In Perl, it does not, which I think is a
# bug because replacing the B in the pattern by (B|D) does make it fail.
+# Turning off Perl's optimization by inserting (??{""}) also makes it fail.

/A(*COMMIT)B/aftertext,mark
\= Expect no match

Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/testdata/testoutput1    2018-07-21 14:34:51 UTC (rev 968)
@@ -9846,12 +9846,64 @@
  0: b
  1: b


-/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize
+#pattern no_start_optimize
+
+/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
     abc
  0: abc


-/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize
+/(?>a(*:1))(?>b)(*SKIP:1)x|.*/
     abc
  0: abc


+#subject mark
+
+/a(*ACCEPT:X)b/
+    abc
+ 0: a
+MK: X
+    
+/(?=a(*ACCEPT:QQ)bc)axyz/
+    axyz
+ 0: axyz
+MK: QQ
+
+/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
+    abc
+ 0: ab
+MK: X
+    
+/a(*F:X)b/
+    abc
+No match, mark = X
+    
+/(?(DEFINE)(a(*F:X)))(?1)b/
+    abc
+No match, mark = X
+
+/a(*COMMIT:X)b/
+    abc
+ 0: ab
+MK: X
+    
+/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
+    abc
+ 0: ab
+MK: X
+    
+/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
+    aaaabd
+ 0: bd
+
+/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
+    aaaabd
+No match, mark = X
+
+/a(*COMMIT:X)b/
+    axabc
+No match, mark = X
+
+#pattern -no_start_optimize
+#subject -mark 
+
 # End of testinput1 


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2018-07-17 16:00:09 UTC (rev 967)
+++ code/trunk/testdata/testoutput2    2018-07-21 14:34:51 UTC (rev 968)
@@ -10154,11 +10154,9 @@
 /abc(*:)pqr/
 Failed: error 166 at offset 6: (*MARK) must have an argument


-/abc(*FAIL:123)xyz/
-Failed: error 159 at offset 10: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
-
# This should, and does, fail. In Perl, it does not, which I think is a
# bug because replacing the B in the pattern by (B|D) does make it fail.
+# Turning off Perl's optimization by inserting (??{""}) also makes it fail.

/A(*COMMIT)B/aftertext,mark
\= Expect no match