[Pcre-svn] [1039] code/trunk: Upgrade the as yet unreleased …

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1039] code/trunk: Upgrade the as yet unreleased substitute callout facility.
Revision: 1039
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1039
Author:   ph10
Date:     2018-11-12 16:02:01 +0000 (Mon, 12 Nov 2018)
Log Message:
-----------
Upgrade the as yet unreleased substitute callout facility.


Modified Paths:
--------------
    code/trunk/doc/html/pcre2_set_substitute_callout.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_set_substitute_callout.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2test.1
    code/trunk/doc/pcre2test.txt
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_context.c
    code/trunk/src/pcre2_intmodedep.h
    code/trunk/src/pcre2_substitute.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput10
    code/trunk/testdata/testoutput12-16
    code/trunk/testdata/testoutput12-32
    code/trunk/testdata/testoutput2


Modified: code/trunk/doc/html/pcre2_set_substitute_callout.html
===================================================================
--- code/trunk/doc/html/pcre2_set_substitute_callout.html    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/html/pcre2_set_substitute_callout.html    2018-11-12 16:02:01 UTC (rev 1039)
@@ -20,7 +20,7 @@
 </P>
 <P>
 <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
-<b>  void (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b>
+<b>  int (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b>
 <b>  void *<i>callout_data</i>);</b>
 </P>
 <br><b>


Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/html/pcre2api.html    2018-11-12 16:02:01 UTC (rev 1039)
@@ -183,7 +183,7 @@
 <br>
 <br>
 <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
-<b>  void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
+<b>  int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
 <b>  void *<i>callout_data</i>);</b>
 <br>
 <br>
@@ -924,7 +924,7 @@
 <br>
 <br>
 <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
-<b>  void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
+<b>  int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
 <b>  void *<i>callout_data</i>);</b>
 <br>
 <br>
@@ -3413,9 +3413,9 @@
 groups in the extended syntax forms to be treated as unset.
 </P>
 <P>
-If successful, <b>pcre2_substitute()</b> returns the number of replacements that
-were made. This may be zero if no matches were found, and is never greater than
-1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
+If successful, <b>pcre2_substitute()</b> returns the number of successful 
+matches. This may be zero if no matches were found, and is never greater than 1
+unless PCRE2_SUBSTITUTE_GLOBAL is set.
 </P>
 <P>
 In the event of an error, a negative error code is returned. Except for
@@ -3457,16 +3457,16 @@
 </b><br>
 <P>
 <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
-<b>  void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
+<b>  int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
 <b>  void *<i>callout_data</i>);</b>
 <br>
 <br>
 The <b>pcre2_set_substitution_callout()</b> function can be used to specify a
 callout function for <b>pcre2_substitute()</b>. This information is passed in
-a match context. The callout function is called after each substitution. It is
-not called for simulated substitutions that happen as a result of the
-PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return 
-any value.
+a match context. The callout function is called after each substitution has 
+been processed, but it can cause the replacement not to happen. The callout 
+function is not called for simulated substitutions that happen as a result of
+the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
 </P>
 <P>
 The first argument of the callout function is a pointer to a substitute callout
@@ -3474,7 +3474,11 @@
 order:
 <pre>
   uint32_t    <i>version</i>;
-  PCRE2_SIZE  <i>input_offsets[2]</i>;
+  uint32_t    <i>subscount</i>;  
+  PCRE2_SPTR  <i>input</i>;
+  PCRE2_SPTR  <i>output</i>; 
+  PCRE2_SIZE <i>*ovector</i>; 
+  uint32_t    <i>oveccount</i>;
   PCRE2_SIZE  <i>output_offsets[2]</i>;
 </pre>
 The <i>version</i> field contains the version number of the block format. The
@@ -3482,14 +3486,35 @@
 are added, but the intention is never to remove any of the existing fields.
 </P>
 <P>
-The <i>input_offsets</i> vector contains the code unit offsets in the input
-string of the matched substring, and the <i>output_offsets</i> vector contains
-the offsets of the replacement in the output string.
+The <i>subscount</i> field is the number of the current match. It is 1 for the
+first callout, 2 for the second, and so on. The <i>input</i> and <i>output</i>
+pointers are copies of the values passed to <b>pcre2_substitute()</b>.
 </P>
 <P>
+The <i>ovector</i> field points to the ovector, which contains the result of the 
+most recent match. The <i>oveccount</i> field contains the number of pairs that 
+are set in the ovector, and is always greater than zero. 
+</P>
+<P>
+The <i>output_offsets</i> vector contains the offsets of the replacement in the
+output string. This has already been processed for dollar and (if requested)
+backslash substitutions as described above.
+</P>
+<P>
 The second argument of the callout function is the value passed as
-<i>callout_data</i> when the function was registered.
+<i>callout_data</i> when the function was registered. The value returned by the
+callout function is interpreted as follows:
 </P>
+<P>
+If the value is zero, the replacement is accepted, and, if
+PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
+match. If the value is not zero, the current replacement is not accepted. If
+the value is greater than zero, processing continues when
+PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
+PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
+output and the call to <b>pcre2_substitute()</b> exits, returning the number of
+matches so far.
+</P>
 <br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
 <P>
 <b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
@@ -3757,7 +3782,7 @@
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 19 October 2018
+Last updated: 12 November 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/html/pcre2test.html    2018-11-12 16:02:01 UTC (rev 1039)
@@ -1052,7 +1052,9 @@
       startchar                  show starting character when relevant
       substitute_callout         use substitution callouts 
       substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_skip=&#60;n&#62;        skip substitution number n 
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_stop=&#60;n&#62;        skip substitution number n and greater
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
       substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
 </pre>
@@ -1220,7 +1222,9 @@
       startoffset=&#60;n&#62;            same as offset=&#60;n&#62;
       substitute_callout         use substitution callouts 
       substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_skip=&#60;n&#62;        skip substitution number n 
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_stop=&#60;n&#62;        skip substitution number n and greater
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
       substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
       zero_terminate             pass the subject as zero-terminated
@@ -1410,16 +1414,6 @@
       =abc=abc=\=global
    2: =xxx=xxx=
 </pre>
-If the <b>substitute_callout</b> modifier is set, a substitution callout 
-function is set up. When it is called (after each substitution), the offsets in
-the input and output strings are output. For example:
-<pre>
-  /abc/g,replace=&#60;$0&#62;,substitute_callout
-      abcdefabcpqr
-  Old 0 3  New 0 5
-  Old 6 9  New 8 13
-   2: &#60;abc&#62;def&#60;abc&#62;pqr
-</pre>
 Subject and replacement strings should be kept relatively short (fewer than 256
 characters) for substitution tests, as fixed-size buffers are used. To make it
 easy to test for buffer overflow, if the replacement string starts with a
@@ -1451,6 +1445,47 @@
 <b>pcre2_substitute()</b>.
 </P>
 <br><b>
+Testing substitute callouts
+</b><br>
+<P>
+If the <b>substitute_callout</b> modifier is set, a substitution callout 
+function is set up. When it is called (after each substitution), details of the
+the input and output strings are output. For example:
+<pre>
+  /abc/g,replace=&#60;$0&#62;,substitute_callout
+      abcdefabcpqr
+   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62;"
+   2(1) Old 6 9 "abc" New 8 13 "&#60;abc&#62;" 
+   2: &#60;abc&#62;def&#60;abc&#62;pqr
+</pre>
+The first number on each callout line is the count of matches. The
+parenthesized number is the number of pairs that are set in the ovector (that
+is, one more than the number of capturing groups that were set). Then are
+listed the offsets of the old substring, its contents, and the same for the
+replacement.
+</P>
+<P>
+By default, the substitution callout function returns zero, which accepts the 
+replacement and causes matching to continue if /g was used. Two further 
+modifiers can be used to test other return values. If <b>substitute_skip</b> is 
+set to a value greater than zero the callout function returns +1 for the match 
+of that number, and similarly <b>substitute_stop</b> returns -1. These cause the 
+replacement to be rejected, and -1 causes no further matching to take place. If
+either of them are set, <b>substitute_callout</b> is assumed. For example:
+<pre>
+  /abc/g,replace=&#60;$0&#62;,substitute_skip=1
+      abcdefabcpqr
+   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; SKIPPED"
+   2(1) Old 6 9 "abc" New 6 11 "&#60;abc&#62;"
+   2: abcdef&#60;abc&#62;pqr
+      abcdefabcpqr\=substitute_stop=1
+   1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; STOPPED"
+   1: abcdefabcpqr
+</pre>
+If both are set for the same number, stop takes precedence. Only a single skip 
+or stop is supported, which is sufficient for testing that the feature works.
+</P>
+<br><b>
 Setting the JIT stack size
 </b><br>
 <P>
@@ -2040,7 +2075,7 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 21 September 2018
+Last updated: 12 November 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/pcre2.txt    2018-11-12 16:02:01 UTC (rev 1039)
@@ -294,7 +294,7 @@
          void *callout_data);


        int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
-         void (*callout_function)(pcre2_substitute_callout_block *, void *),
+         int (*callout_function)(pcre2_substitute_callout_block *, void *),
          void *callout_data);


        int pcre2_set_offset_limit(pcre2_match_context *mcontext,
@@ -942,7 +942,7 @@
        umentation.


        int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
-         void (*callout_function)(pcre2_substitute_callout_block *, void *),
+         int (*callout_function)(pcre2_substitute_callout_block *, void *),
          void *callout_data);


        This  sets up a callout function for PCRE2 to call after each substitu-
@@ -3318,8 +3318,8 @@
        substitutions.   However,   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  does  cause
        unknown groups in the extended syntax forms to be treated as unset.


-       If successful, pcre2_substitute() returns the  number  of  replacements
-       that were made. This may be zero if no matches were found, and is never
+       If successful, pcre2_substitute()  returns  the  number  of  successful
+       matches.  This  may  be  zero  if  no  matches were found, and is never
        greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.


        In the event of an error, a negative error code is returned. Except for
@@ -3355,15 +3355,15 @@
    Substitution callouts


        int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
-         void (*callout_function)(pcre2_substitute_callout_block *, void *),
+         int (*callout_function)(pcre2_substitute_callout_block *, void *),
          void *callout_data);


        The pcre2_set_substitution_callout() function can be used to specify  a
        callout  function for pcre2_substitute(). This information is passed in
-       a match context. The callout function is called  after  each  substitu-
-       tion.  It  is  not  called for simulated substitutions that happen as a
-       result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout  func-
-       tion should not return any value.
+       a match context. The callout function is called after each substitution
+       has been processed, but it can cause the replacement not to happen. The
+       callout function is not called for simulated substitutions that  happen
+       as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.


        The first argument of the callout function is a pointer to a substitute
        callout block structure, which contains the following fields, not  nec-
@@ -3370,7 +3370,11 @@
        essarily in this order:


          uint32_t    version;
-         PCRE2_SIZE  input_offsets[2];
+         uint32_t    subscount;
+         PCRE2_SPTR  input;
+         PCRE2_SPTR  output;
+         PCRE2_SIZE *ovector;
+         uint32_t    oveccount;
          PCRE2_SIZE  output_offsets[2];


        The  version field contains the version number of the block format. The
@@ -3378,14 +3382,32 @@
        more  fields are added, but the intention is never to remove any of the
        existing fields.


-       The input_offsets vector contains the code unit offsets  in  the  input
-       string of the matched substring, and the output_offsets vector contains
-       the offsets of the replacement in the output string.
+       The subscount field is the number of the current match. It is 1 for the
+       first callout, 2 for the second, and so on. The input and output point-
+       ers are copies of the values passed to pcre2_substitute().


+       The ovector field points to the ovector, which contains the  result  of
+       the most recent match. The oveccount field contains the number of pairs
+       that are set in the ovector, and is always greater than zero.
+
+       The output_offsets vector contains the offsets of  the  replacement  in
+       the  output  string. This has already been processed for dollar and (if
+       requested) backslash substitutions as described above.
+
        The second argument of the callout function  is  the  value  passed  as
-       callout_data when the function was registered.
+       callout_data  when  the  function was registered. The value returned by
+       the callout function is interpreted as follows:


+       If the value is zero, the replacement is accepted, and,  if  PCRE2_SUB-
+       STITUTE_GLOBAL  is set, processing continues with a search for the next
+       match. If the value  is  not  zero,  the  current  replacement  is  not
+       accepted.  If the value is greater than zero, processing continues when
+       PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than  zero
+       or  PCRE2_SUBSTITUTE_GLOBAL  is  not set), the the rest of the input is
+       copied to the output and the call to pcre2_substitute() exits,  return-
+       ing the number of matches so far.


+
DUPLICATE SUBPATTERN NAMES

        int pcre2_substring_nametable_scan(const pcre2_code *code,
@@ -3633,7 +3655,7 @@


REVISION

-       Last updated: 19 October 2018
+       Last updated: 12 November 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------



Modified: code/trunk/doc/pcre2_set_substitute_callout.3
===================================================================
--- code/trunk/doc/pcre2_set_substitute_callout.3    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/pcre2_set_substitute_callout.3    2018-11-12 16:02:01 UTC (rev 1039)
@@ -1,4 +1,4 @@
-.TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "17 September 2018" "PCRE2 10.33"
+.TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "12 November 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -8,7 +8,7 @@
 .PP
 .nf
 .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
-.B "  void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *),"
+.B "  int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *),"
 .B "  void *\fIcallout_data\fP);"
 .fi
 .


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/pcre2api.3    2018-11-12 16:02:01 UTC (rev 1039)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "19 October 2018" "PCRE2 10.33"
+.TH PCRE2API 3 "12 November 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -124,7 +124,7 @@
 .B "  void *\fIcallout_data\fP);"
 .sp
 .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
-.B "  void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
+.B "  int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
 .B "  void *\fIcallout_data\fP);"
 .sp
 .B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
@@ -860,7 +860,7 @@
 .sp
 .nf
 .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
-.B "  void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
+.B "  int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
 .B "  void *\fIcallout_data\fP);"
 .fi
 .sp
@@ -3412,9 +3412,9 @@
 substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
 groups in the extended syntax forms to be treated as unset.
 .P
-If successful, \fBpcre2_substitute()\fP returns the number of replacements that
-were made. This may be zero if no matches were found, and is never greater than
-1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
+If successful, \fBpcre2_substitute()\fP returns the number of successful 
+matches. This may be zero if no matches were found, and is never greater than 1
+unless PCRE2_SUBSTITUTE_GLOBAL is set.
 .P
 In the event of an error, a negative error code is returned. Except for
 PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
@@ -3454,16 +3454,16 @@
 .sp
 .nf
 .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
-.B "  void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
+.B "  int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
 .B "  void *\fIcallout_data\fP);"
 .fi
 .sp
 The \fBpcre2_set_substitution_callout()\fP function can be used to specify a
 callout function for \fBpcre2_substitute()\fP. This information is passed in
-a match context. The callout function is called after each substitution. It is
-not called for simulated substitutions that happen as a result of the
-PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return 
-any value.
+a match context. The callout function is called after each substitution has 
+been processed, but it can cause the replacement not to happen. The callout 
+function is not called for simulated substitutions that happen as a result of
+the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
 .P
 The first argument of the callout function is a pointer to a substitute callout
 block structure, which contains the following fields, not necessarily in this
@@ -3470,7 +3470,11 @@
 order:
 .sp
   uint32_t    \fIversion\fP;
-  PCRE2_SIZE  \fIinput_offsets[2]\fP;
+  uint32_t    \fIsubscount\fP;  
+  PCRE2_SPTR  \fIinput\fP;
+  PCRE2_SPTR  \fIoutput\fP; 
+  PCRE2_SIZE \fI*ovector\fP; 
+  uint32_t    \fIoveccount\fP;
   PCRE2_SIZE  \fIoutput_offsets[2]\fP;
 .sp
 The \fIversion\fP field contains the version number of the block format. The
@@ -3477,12 +3481,30 @@
 current version is 0. The version number will increase in future if more fields
 are added, but the intention is never to remove any of the existing fields.
 .P
-The \fIinput_offsets\fP vector contains the code unit offsets in the input
-string of the matched substring, and the \fIoutput_offsets\fP vector contains
-the offsets of the replacement in the output string.
+The \fIsubscount\fP field is the number of the current match. It is 1 for the
+first callout, 2 for the second, and so on. The \fIinput\fP and \fIoutput\fP
+pointers are copies of the values passed to \fBpcre2_substitute()\fP.
 .P
+The \fIovector\fP field points to the ovector, which contains the result of the 
+most recent match. The \fIoveccount\fP field contains the number of pairs that 
+are set in the ovector, and is always greater than zero. 
+.P
+The \fIoutput_offsets\fP vector contains the offsets of the replacement in the
+output string. This has already been processed for dollar and (if requested)
+backslash substitutions as described above.
+.P
 The second argument of the callout function is the value passed as
-\fIcallout_data\fP when the function was registered.
+\fIcallout_data\fP when the function was registered. The value returned by the
+callout function is interpreted as follows:
+.P
+If the value is zero, the replacement is accepted, and, if
+PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
+match. If the value is not zero, the current replacement is not accepted. If
+the value is greater than zero, processing continues when
+PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
+PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
+output and the call to \fBpcre2_substitute()\fP exits, returning the number of
+matches so far.
 .
 .
 .SH "DUPLICATE SUBPATTERN NAMES"
@@ -3768,6 +3790,6 @@
 .rs
 .sp
 .nf
-Last updated: 19 October 2018
+Last updated: 12 November 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/pcre2test.1    2018-11-12 16:02:01 UTC (rev 1039)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "21 September 2018" "PCRE 10.33"
+.TH PCRE2TEST 1 "12 November 2018" "PCRE 10.33"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -1014,7 +1014,9 @@
       startchar                  show starting character when relevant
       substitute_callout         use substitution callouts 
       substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_skip=<n>        skip substitution number n 
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_stop=<n>        skip substitution number n and greater
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
       substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
 .sp
@@ -1189,7 +1191,9 @@
       startoffset=<n>            same as offset=<n>
       substitute_callout         use substitution callouts 
       substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_skip=<n>        skip substitution number n 
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_stop=<n>        skip substitution number n and greater
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
       substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
       zero_terminate             pass the subject as zero-terminated
@@ -1377,16 +1381,6 @@
       =abc=abc=\e=global
    2: =xxx=xxx=
 .sp
-If the \fBsubstitute_callout\fP modifier is set, a substitution callout 
-function is set up. When it is called (after each substitution), the offsets in
-the input and output strings are output. For example:
-.sp
-  /abc/g,replace=<$0>,substitute_callout
-      abcdefabcpqr
-  Old 0 3  New 0 5
-  Old 6 9  New 8 13
-   2: <abc>def<abc>pqr
-.sp
 Subject and replacement strings should be kept relatively short (fewer than 256
 characters) for substitution tests, as fixed-size buffers are used. To make it
 easy to test for buffer overflow, if the replacement string starts with a
@@ -1418,6 +1412,46 @@
 \fBpcre2_substitute()\fP.
 .
 .
+.SS "Testing substitute callouts"
+.rs
+.sp
+If the \fBsubstitute_callout\fP modifier is set, a substitution callout 
+function is set up. When it is called (after each substitution), details of the
+the input and output strings are output. For example:
+.sp
+  /abc/g,replace=<$0>,substitute_callout
+      abcdefabcpqr
+   1(1) Old 0 3 "abc" New 0 5 "<abc>"
+   2(1) Old 6 9 "abc" New 8 13 "<abc>" 
+   2: <abc>def<abc>pqr
+.sp
+The first number on each callout line is the count of matches. The
+parenthesized number is the number of pairs that are set in the ovector (that
+is, one more than the number of capturing groups that were set). Then are
+listed the offsets of the old substring, its contents, and the same for the
+replacement.
+.P
+By default, the substitution callout function returns zero, which accepts the 
+replacement and causes matching to continue if /g was used. Two further 
+modifiers can be used to test other return values. If \fBsubstitute_skip\fP is 
+set to a value greater than zero the callout function returns +1 for the match 
+of that number, and similarly \fBsubstitute_stop\fP returns -1. These cause the 
+replacement to be rejected, and -1 causes no further matching to take place. If
+either of them are set, \fBsubstitute_callout\fP is assumed. For example:
+.sp
+  /abc/g,replace=<$0>,substitute_skip=1
+      abcdefabcpqr
+   1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
+   2(1) Old 6 9 "abc" New 6 11 "<abc>"
+   2: abcdef<abc>pqr
+      abcdefabcpqr\e=substitute_stop=1
+   1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
+   1: abcdefabcpqr
+.sp
+If both are set for the same number, stop takes precedence. Only a single skip 
+or stop is supported, which is sufficient for testing that the feature works.
+.
+.
 .SS "Setting the JIT stack size"
 .rs
 .sp
@@ -2022,6 +2056,6 @@
 .rs
 .sp
 .nf
-Last updated: 21 September 2018
+Last updated: 12 November 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/doc/pcre2test.txt    2018-11-12 16:02:01 UTC (rev 1039)
@@ -940,7 +940,9 @@
              startchar                  show starting character when relevant
              substitute_callout         use substitution callouts
              substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
+             substitute_skip=<n>        skip substitution number n
              substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+             substitute_stop=<n>        skip substitution number n and greater
              substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
              substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY


@@ -1092,7 +1094,9 @@
              startoffset=<n>            same as offset=<n>
              substitute_callout         use substitution callouts
              substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
+             substitute_skip=<n>        skip substitution number n
              substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+             substitute_stop=<n>        skip substitution number n and greater
              substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
              substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
              zero_terminate             pass the subject as zero-terminated
@@ -1263,16 +1267,6 @@
              =abc=abc=\=global
           2: =xxx=xxx=


-       If the substitute_callout modifier is set, a substitution callout func-
-       tion  is  set up. When it is called (after each substitution), the off-
-       sets in the input and output strings are output. For example:
-
-         /abc/g,replace=<$0>,substitute_callout
-             abcdefabcpqr
-         Old 0 3  New 0 5
-         Old 6 9  New 8 13
-          2: <abc>def<abc>pqr
-
        Subject and replacement strings should be kept relatively short  (fewer
        than  256 characters) for substitution tests, as fixed-size buffers are
        used. To make it easy to test for buffer overflow, if  the  replacement
@@ -1305,57 +1299,97 @@
        partial  matching  provokes  an  error return ("bad option value") from
        pcre2_substitute().


+   Testing substitute callouts
+
+       If the substitute_callout modifier is set, a substitution callout func-
+       tion is set up. When it is called (after each substitution), details of
+       the the input and output strings are output. For example:
+
+         /abc/g,replace=<$0>,substitute_callout
+             abcdefabcpqr
+          1(1) Old 0 3 "abc" New 0 5 "<abc>"
+          2(1) Old 6 9 "abc" New 8 13 "<abc>"
+          2: <abc>def<abc>pqr
+
+       The first number on each callout line is  the  count  of  matches.  The
+       parenthesized number is the number of pairs that are set in the ovector
+       (that is, one more than the number of capturing groups that were  set).
+       Then are listed the offsets of the old substring, its contents, and the
+       same for the replacement.
+
+       By default, the  substitution  callout  function  returns  zero,  which
+       accepts the replacement and causes matching to continue if /g was used.
+       Two further modifiers can be used to test other return values. If  sub-
+       stitute_skip  is  set to a value greater than zero the callout function
+       returns +1 for the match of that number, and similarly  substitute_stop
+       returns  -1.  These cause the replacement to be rejected, and -1 causes
+       no further matching to take place. If either of them are  set,  substi-
+       tute_callout is assumed. For example:
+
+         /abc/g,replace=<$0>,substitute_skip=1
+             abcdefabcpqr
+          1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
+          2(1) Old 6 9 "abc" New 6 11 "<abc>"
+          2: abcdef<abc>pqr
+             abcdefabcpqr\=substitute_stop=1
+          1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
+          1: abcdefabcpqr
+
+       If both are set for the same number, stop takes precedence. Only a sin-
+       gle skip or stop is supported, which is sufficient for testing that the
+       feature works.
+
    Setting the JIT stack size


-       The jitstack modifier provides a way of setting the maximum stack  size
-       that  is  used  by the just-in-time optimization code. It is ignored if
-       JIT optimization is not being used. The value is a number of  kibibytes
-       (units  of  1024  bytes). Setting zero reverts to the default of 32KiB.
+       The  jitstack modifier provides a way of setting the maximum stack size
+       that is used by the just-in-time optimization code. It  is  ignored  if
+       JIT  optimization is not being used. The value is a number of kibibytes
+       (units of 1024 bytes). Setting zero reverts to the  default  of  32KiB.
        Providing a stack that is larger than the default is necessary only for
-       very  complicated  patterns.  If  jitstack is set non-zero on a subject
+       very complicated patterns. If jitstack is set  non-zero  on  a  subject
        line it overrides any value that was set on the pattern.


    Setting heap, match, and depth limits


-       The heap_limit, match_limit, and depth_limit modifiers set  the  appro-
-       priate  limits  in the match context. These values are ignored when the
+       The  heap_limit,  match_limit, and depth_limit modifiers set the appro-
+       priate limits in the match context. These values are ignored  when  the
        find_limits modifier is specified.


    Finding minimum limits


-       If the find_limits modifier is present on  a  subject  line,  pcre2test
-       calls  the  relevant matching function several times, setting different
-       values   in   the    match    context    via    pcre2_set_heap_limit(),
-       pcre2_set_match_limit(),  or pcre2_set_depth_limit() until it finds the
-       minimum values for each parameter that allows  the  match  to  complete
+       If  the  find_limits  modifier  is present on a subject line, pcre2test
+       calls the relevant matching function several times,  setting  different
+       values    in    the    match    context   via   pcre2_set_heap_limit(),
+       pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds  the
+       minimum  values  for  each  parameter that allows the match to complete
        without error. If JIT is being used, only the match limit is relevant.


        When using this modifier, the pattern should not contain any limit set-
-       tings such as (*LIMIT_MATCH=...)  within  it.  If  such  a  setting  is
+       tings  such  as  (*LIMIT_MATCH=...)  within  it.  If  such a setting is
        present and is lower than the minimum matching value, the minimum value
-       cannot be found because pcre2_set_match_limit() etc. are only  able  to
+       cannot  be  found because pcre2_set_match_limit() etc. are only able to
        reduce the value of an in-pattern limit; they cannot increase it.


-       For  non-DFA  matching,  the minimum depth_limit number is a measure of
+       For non-DFA matching, the minimum depth_limit number is  a  measure  of
        how much nested backtracking happens (that is, how deeply the pattern's
-       tree  is  searched).  In the case of DFA matching, depth_limit controls
-       the depth of recursive calls of the internal function that is used  for
+       tree is searched). In the case of DFA  matching,  depth_limit  controls
+       the  depth of recursive calls of the internal function that is used for
        handling pattern recursion, lookaround assertions, and atomic groups.


        For non-DFA matching, the match_limit number is a measure of the amount
        of backtracking that takes place, and learning the minimum value can be
-       instructive.  For  most  simple matches, the number is quite small, but
-       for patterns with very large numbers of matching possibilities, it  can
-       become  large very quickly with increasing length of subject string. In
-       the case of DFA matching, match_limit  controls  the  total  number  of
+       instructive. For most simple matches, the number is  quite  small,  but
+       for  patterns with very large numbers of matching possibilities, it can
+       become large very quickly with increasing length of subject string.  In
+       the  case  of  DFA  matching,  match_limit controls the total number of
        calls, both recursive and non-recursive, to the internal matching func-
        tion, thus controlling the overall amount of computing resource that is
        used.


-       For  both  kinds  of  matching,  the  heap_limit  number,  which  is in
-       kibibytes (units of 1024 bytes), limits the amount of heap memory  used
+       For both  kinds  of  matching,  the  heap_limit  number,  which  is  in
+       kibibytes  (units of 1024 bytes), limits the amount of heap memory used
        for matching. A value of zero disables the use of any heap memory; many
-       simple pattern matches can be done without using the heap, so  zero  is
+       simple  pattern  matches can be done without using the heap, so zero is
        not an unreasonable setting.


    Showing MARK names
@@ -1362,50 +1396,50 @@



        The mark modifier causes the names from backtracking control verbs that
-       are returned from calls to pcre2_match() to be displayed. If a mark  is
-       returned  for a match, non-match, or partial match, pcre2test shows it.
-       For a match, it is on a line by itself, tagged with  "MK:".  Otherwise,
+       are  returned from calls to pcre2_match() to be displayed. If a mark is
+       returned for a match, non-match, or partial match, pcre2test shows  it.
+       For  a  match, it is on a line by itself, tagged with "MK:". Otherwise,
        it is added to the non-match message.


    Showing memory usage


-       The  memory modifier causes pcre2test to log the sizes of all heap mem-
-       ory  allocation  and  freeing  calls  that  occur  during  a  call   to
-       pcre2_match()  or  pcre2_dfa_match().  These  occur  only  when a match
-       requires a bigger vector than the default for remembering  backtracking
-       points  (pcre2_match())  or for internal workspace (pcre2_dfa_match()).
-       In many cases there will be no heap memory used and therefore no  addi-
+       The memory modifier causes pcre2test to log the sizes of all heap  mem-
+       ory   allocation  and  freeing  calls  that  occur  during  a  call  to
+       pcre2_match() or pcre2_dfa_match().  These  occur  only  when  a  match
+       requires  a bigger vector than the default for remembering backtracking
+       points (pcre2_match()) or for internal  workspace  (pcre2_dfa_match()).
+       In  many cases there will be no heap memory used and therefore no addi-
        tional output. No heap memory is allocated during matching with JIT, so
-       in that case the memory modifier never has any effect. For  this  modi-
-       fier  to  work,  the  null_context modifier must not be set on both the
+       in  that  case the memory modifier never has any effect. For this modi-
+       fier to work, the null_context modifier must not be  set  on  both  the
        pattern and the subject, though it can be set on one or the other.


    Setting a starting offset


-       The offset modifier sets an offset  in  the  subject  string  at  which
+       The  offset  modifier  sets  an  offset  in the subject string at which
        matching starts. Its value is a number of code units, not characters.


    Setting an offset limit


-       The  offset_limit  modifier  sets  a limit for unanchored matches. If a
+       The offset_limit modifier sets a limit for  unanchored  matches.  If  a
        match cannot be found starting at or before this offset in the subject,
        a "no match" return is given. The data value is a number of code units,
-       not characters. When this modifier is used, the use_offset_limit  modi-
+       not  characters. When this modifier is used, the use_offset_limit modi-
        fier must have been set for the pattern; if not, an error is generated.


    Setting the size of the output vector


-       The  ovector  modifier  applies  only  to  the subject line in which it
-       appears, though of course it can also be used to set  a  default  in  a
-       #subject  command. It specifies the number of pairs of offsets that are
+       The ovector modifier applies only to  the  subject  line  in  which  it
+       appears,  though  of  course  it can also be used to set a default in a
+       #subject command. It specifies the number of pairs of offsets that  are
        available for storing matching information. The default is 15.


-       A value of zero is useful when testing the POSIX API because it  causes
+       A  value of zero is useful when testing the POSIX API because it causes
        regexec() to be called with a NULL capture vector. When not testing the
-       POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
-       ate_from_pattern()  to  be  called, in order to create a match block of
+       POSIX  API,  a  value  of  zero  is used to cause pcre2_match_data_cre-
+       ate_from_pattern() to be called, in order to create a  match  block  of
        exactly the right size for the pattern. (It is not possible to create a
-       match  block  with  a zero-length ovector; there is always at least one
+       match block with a zero-length ovector; there is always  at  least  one
        pair of offsets.)


    Passing the subject as zero-terminated
@@ -1412,55 +1446,55 @@


        By default, the subject string is passed to a native API matching func-
        tion with its correct length. In order to test the facility for passing
-       a zero-terminated string, the zero_terminate modifier is  provided.  It
-       causes  the length to be passed as PCRE2_ZERO_TERMINATED. When matching
+       a  zero-terminated  string, the zero_terminate modifier is provided. It
+       causes the length to be passed as PCRE2_ZERO_TERMINATED. When  matching
        via the POSIX interface, this modifier is ignored, with a warning.


-       When testing pcre2_substitute(), this modifier also has the  effect  of
+       When  testing  pcre2_substitute(), this modifier also has the effect of
        passing the replacement string as zero-terminated.


    Passing a NULL context


-       Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
+       Normally,  pcre2test  passes  a   context   block   to   pcre2_match(),
        pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
-       set,  however,  NULL  is  passed. This is for testing that the matching
+       set, however, NULL is passed. This is for  testing  that  the  matching
        functions behave correctly in this case (they use default values). This
-       modifier  cannot  be used with the find_limits modifier or when testing
+       modifier cannot be used with the find_limits modifier or  when  testing
        the substitution function.



THE ALTERNATIVE MATCHING FUNCTION

-       By default,  pcre2test  uses  the  standard  PCRE2  matching  function,
+       By  default,  pcre2test  uses  the  standard  PCRE2  matching function,
        pcre2_match() to match each subject line. PCRE2 also supports an alter-
-       native matching function, pcre2_dfa_match(), which operates in  a  dif-
-       ferent  way, and has some restrictions. The differences between the two
+       native  matching  function, pcre2_dfa_match(), which operates in a dif-
+       ferent way, and has some restrictions. The differences between the  two
        functions are described in the pcre2matching documentation.


-       If the dfa modifier is set, the alternative matching function is  used.
-       This  function  finds all possible matches at a given point in the sub-
-       ject. If, however, the dfa_shortest modifier is set,  processing  stops
-       after  the  first  match is found. This is always the shortest possible
+       If  the dfa modifier is set, the alternative matching function is used.
+       This function finds all possible matches at a given point in  the  sub-
+       ject.  If,  however, the dfa_shortest modifier is set, processing stops
+       after the first match is found. This is always  the  shortest  possible
        match.



DEFAULT OUTPUT FROM pcre2test

-       This section describes the output when the  normal  matching  function,
+       This  section  describes  the output when the normal matching function,
        pcre2_match(), is being used.


-       When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
-       strings, starting with number 0 for the string that matched  the  whole
-       pattern.    Otherwise,  it  outputs  "No  match"  when  the  return  is
-       PCRE2_ERROR_NOMATCH, or "Partial  match:"  followed  by  the  partially
-       matching  substring  when the return is PCRE2_ERROR_PARTIAL. (Note that
-       this is the entire substring that  was  inspected  during  the  partial
-       match;  it  may  include  characters before the actual match start if a
+       When a match succeeds, pcre2test outputs  the  list  of  captured  sub-
+       strings,  starting  with number 0 for the string that matched the whole
+       pattern.   Otherwise,  it  outputs  "No  match"  when  the  return   is
+       PCRE2_ERROR_NOMATCH,  or  "Partial  match:"  followed  by the partially
+       matching substring when the return is PCRE2_ERROR_PARTIAL.  (Note  that
+       this  is  the  entire  substring  that was inspected during the partial
+       match; it may include characters before the actual  match  start  if  a
        lookbehind assertion, \K, \b, or \B was involved.)


        For any other return, pcre2test outputs the PCRE2 negative error number
-       and  a  short  descriptive  phrase. If the error is a failed UTF string
-       check, the code unit offset of the start of the  failing  character  is
+       and a short descriptive phrase. If the error is  a  failed  UTF  string
+       check,  the  code  unit offset of the start of the failing character is
        also output. Here is an example of an interactive pcre2test run.


          $ pcre2test
@@ -1476,8 +1510,8 @@
        Unset capturing substrings that are not followed by one that is set are
        not shown by pcre2test unless the allcaptures modifier is specified. In
        the following example, there are two capturing substrings, but when the
-       first data line is matched, the second, unset substring is  not  shown.
-       An  "internal" unset substring is shown as "<unset>", as for the second
+       first  data  line is matched, the second, unset substring is not shown.
+       An "internal" unset substring is shown as "<unset>", as for the  second
        data line.


            re> /(a)|(b)/
@@ -1489,11 +1523,11 @@
           1: <unset>
           2: b


-       If the strings contain any non-printing characters, they are output  as
-       \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
+       If  the strings contain any non-printing characters, they are output as
+       \xhh escapes if the value is less than 256 and UTF  mode  is  not  set.
        Otherwise they are output as \x{hh...} escapes. See below for the defi-
-       nition  of  non-printing  characters. If the aftertext modifier is set,
-       the output for substring 0 is followed by the the rest of  the  subject
+       nition of non-printing characters. If the aftertext  modifier  is  set,
+       the  output  for substring 0 is followed by the the rest of the subject
        string, identified by "0+" like this:


            re> /cat/aftertext
@@ -1501,7 +1535,7 @@
           0: cat
           0+ aract


-       If  global  matching  is  requested, the results of successive matching
+       If global matching is requested, the  results  of  successive  matching
        attempts are output in sequence, like this:


            re> /\Bi(\w\w)/g
@@ -1513,8 +1547,8 @@
           0: ipp
           1: pp


-       "No match" is output only if the first match attempt fails. Here is  an
-       example  of  a  failure  message (the offset 4 that is specified by the
+       "No  match" is output only if the first match attempt fails. Here is an
+       example of a failure message (the offset 4 that  is  specified  by  the
        offset modifier is past the end of the subject string):


            re> /xyz/
@@ -1522,7 +1556,7 @@
          Error -24 (bad offset value)


        Note that whereas patterns can be continued over several lines (a plain
-       ">"  prompt  is used for continuations), subject lines may not. However
+       ">" prompt is used for continuations), subject lines may  not.  However
        newlines can be included in a subject by means of the \n escape (or \r,
        \r\n, etc., depending on the newline sequence setting).


@@ -1530,7 +1564,7 @@
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION

        When the alternative matching function, pcre2_dfa_match(), is used, the
-       output consists of a list of all the matches that start  at  the  first
+       output  consists  of  a list of all the matches that start at the first
        point in the subject where there is at least one match. For example:


            re> /(tang|tangerine|tan)/
@@ -1539,11 +1573,11 @@
           1: tang
           2: tan


-       Using  the normal matching function on this data finds only "tang". The
-       longest matching string is always  given  first  (and  numbered  zero).
-       After  a  PCRE2_ERROR_PARTIAL  return,  the output is "Partial match:",
-       followed by the partially matching substring. Note  that  this  is  the
-       entire  substring  that  was inspected during the partial match; it may
+       Using the normal matching function on this data finds only "tang".  The
+       longest  matching  string  is  always  given first (and numbered zero).
+       After a PCRE2_ERROR_PARTIAL return, the  output  is  "Partial  match:",
+       followed  by  the  partially  matching substring. Note that this is the
+       entire substring that was inspected during the partial  match;  it  may
        include characters before the actual match start if a lookbehind asser-
        tion, \b, or \B was involved. (\K is not supported for DFA matching.)


@@ -1559,16 +1593,16 @@
           1: tan
           0: tan


-       The alternative matching function does not support  substring  capture,
-       so  the  modifiers  that are concerned with captured substrings are not
+       The  alternative  matching function does not support substring capture,
+       so the modifiers that are concerned with captured  substrings  are  not
        relevant.



RESTARTING AFTER A PARTIAL MATCH

-       When the alternative matching function has given  the  PCRE2_ERROR_PAR-
+       When  the  alternative matching function has given the PCRE2_ERROR_PAR-
        TIAL return, indicating that the subject partially matched the pattern,
-       you can restart the match with additional subject data by means of  the
+       you  can restart the match with additional subject data by means of the
        dfa_restart modifier. For example:


            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -1577,7 +1611,7 @@
          data> n05\=dfa,dfa_restart
           0: n05


-       For  further  information  about partial matching, see the pcre2partial
+       For further information about partial matching,  see  the  pcre2partial
        documentation.



@@ -1584,30 +1618,30 @@
CALLOUTS

        If the pattern contains any callout requests, pcre2test's callout func-
-       tion  is  called during matching unless callout_none is specified. This
+       tion is called during matching unless callout_none is  specified.  This
        works with both matching functions, and with JIT, though there are some
-       differences  in behaviour. The output for callouts with numerical argu-
+       differences in behaviour. The output for callouts with numerical  argu-
        ments and those with string arguments is slightly different.


    Callouts with numerical arguments


        By default, the callout function displays the callout number, the start
-       and  current positions in the subject text at the callout time, and the
+       and current positions in the subject text at the callout time, and  the
        next pattern item to be tested. For example:


          --->pqrabcdef
            0    ^  ^     \d


-       This output indicates that  callout  number  0  occurred  for  a  match
-       attempt  starting  at  the fourth character of the subject string, when
-       the pointer was at the seventh character, and  when  the  next  pattern
-       item  was  \d.  Just  one circumflex is output if the start and current
-       positions are the same, or if the current position precedes  the  start
+       This  output  indicates  that  callout  number  0  occurred for a match
+       attempt starting at the fourth character of the  subject  string,  when
+       the  pointer  was  at  the seventh character, and when the next pattern
+       item was \d. Just one circumflex is output if  the  start  and  current
+       positions  are  the same, or if the current position precedes the start
        position, which can happen if the callout is in a lookbehind assertion.


        Callouts numbered 255 are assumed to be automatic callouts, inserted as
        a result of the auto_callout pattern modifier. In this case, instead of
-       showing  the  callout  number, the offset in the pattern, preceded by a
+       showing the callout number, the offset in the pattern,  preceded  by  a
        plus, is output. For example:


            re> /\d?[A-E]\*/auto_callout
@@ -1620,7 +1654,7 @@
           0: E*


        If a pattern contains (*MARK) items, an additional line is output when-
-       ever  a  change  of  latest mark is passed to the callout function. For
+       ever a change of latest mark is passed to  the  callout  function.  For
        example:


            re> /a(*MARK:X)bc/auto_callout
@@ -1634,17 +1668,17 @@
          +12 ^  ^
           0: abc


-       The mark changes between matching "a" and "b", but stays the  same  for
-       the  rest  of  the match, so nothing more is output. If, as a result of
-       backtracking, the mark reverts to being unset, the  text  "<unset>"  is
+       The  mark  changes between matching "a" and "b", but stays the same for
+       the rest of the match, so nothing more is output. If, as  a  result  of
+       backtracking,  the  mark  reverts to being unset, the text "<unset>" is
        output.


    Callouts with string arguments


        The output for a callout with a string argument is similar, except that
-       instead of outputting a callout number before the position  indicators,
-       the  callout  string  and  its  offset in the pattern string are output
-       before the reflection of the subject string, and the subject string  is
+       instead  of outputting a callout number before the position indicators,
+       the callout string and its offset in  the  pattern  string  are  output
+       before  the reflection of the subject string, and the subject string is
        reflected for each callout. For example:


            re> /^ab(?C'first')cd(?C"second")ef/
@@ -1660,26 +1694,26 @@


    Callout modifiers


-       The  callout  function in pcre2test returns zero (carry on matching) by
-       default, but you can use a callout_fail modifier in a subject  line  to
+       The callout function in pcre2test returns zero (carry on  matching)  by
+       default,  but  you can use a callout_fail modifier in a subject line to
        change this and other parameters of the callout (see below).


        If the callout_capture modifier is set, the current captured groups are
        output when a callout occurs. This is useful only for non-DFA matching,
-       as  pcre2_dfa_match()  does  not  support capturing, so no captures are
+       as pcre2_dfa_match() does not support capturing,  so  no  captures  are
        ever shown.


        The normal callout output, showing the callout number or pattern offset
-       (as  described above) is suppressed if the callout_no_where modifier is
+       (as described above) is suppressed if the callout_no_where modifier  is
        set.


-       When using the interpretive  matching  function  pcre2_match()  without
-       JIT,  setting  the callout_extra modifier causes additional output from
-       pcre2test's callout function to be generated. For the first callout  in
-       a  match  attempt at a new starting position in the subject, "New match
-       attempt" is output. If there has been a backtrack since the last  call-
+       When  using  the  interpretive  matching function pcre2_match() without
+       JIT, setting the callout_extra modifier causes additional  output  from
+       pcre2test's  callout function to be generated. For the first callout in
+       a match attempt at a new starting position in the subject,  "New  match
+       attempt"  is output. If there has been a backtrack since the last call-
        out (or start of matching if this is the first callout), "Backtrack" is
-       output, followed by "No other matching paths" if  the  backtrack  ended
+       output,  followed  by  "No other matching paths" if the backtrack ended
        the previous match attempt. For example:


           re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
@@ -1716,39 +1750,39 @@
           +1    ^    a+
          No match


-       Notice  that  various  optimizations must be turned off if you want all
-       possible matching paths to be  scanned.  If  no_start_optimize  is  not
-       used,  there  is an immediate "no match", without any callouts, because
-       the starting optimization fails to find "b" in the  subject,  which  it
-       knows  must  be  present for any match. If no_auto_possess is not used,
-       the "a+" item is turned into "a++", which reduces the number  of  back-
+       Notice that various optimizations must be turned off if  you  want  all
+       possible  matching  paths  to  be  scanned. If no_start_optimize is not
+       used, there is an immediate "no match", without any  callouts,  because
+       the  starting  optimization  fails to find "b" in the subject, which it
+       knows must be present for any match. If no_auto_possess  is  not  used,
+       the  "a+"  item is turned into "a++", which reduces the number of back-
        tracks.


-       The  callout_extra modifier has no effect if used with the DFA matching
+       The callout_extra modifier has no effect if used with the DFA  matching
        function, or with JIT.


    Return values from callouts


-       The default return from the callout  function  is  zero,  which  allows
+       The  default  return  from  the  callout function is zero, which allows
        matching to continue. The callout_fail modifier can be given one or two
        numbers. If there is only one number, 1 is returned instead of 0 (caus-
        ing matching to backtrack) when a callout of that number is reached. If
-       two numbers (<n>:<m>) are given, 1 is  returned  when  callout  <n>  is
-       reached  and  there  have been at least <m> callouts. The callout_error
+       two  numbers  (<n>:<m>)  are  given,  1 is returned when callout <n> is
+       reached and there have been at least <m>  callouts.  The  callout_error
        modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
-       ing  the entire matching process to be aborted. If both these modifiers
-       are set for the same callout number,  callout_error  takes  precedence.
-       Note  that  callouts  with string arguments are always given the number
+       ing the entire matching process to be aborted. If both these  modifiers
+       are  set  for  the same callout number, callout_error takes precedence.
+       Note that callouts with string arguments are always  given  the  number
        zero.


-       The callout_data modifier can be given an unsigned or a  negative  num-
-       ber.   This  is  set  as the "user data" that is passed to the matching
-       function, and passed back when the callout  function  is  invoked.  Any
-       value  other  than  zero  is  used as a return from pcre2test's callout
+       The  callout_data  modifier can be given an unsigned or a negative num-
+       ber.  This is set as the "user data" that is  passed  to  the  matching
+       function,  and  passed  back  when the callout function is invoked. Any
+       value other than zero is used as  a  return  from  pcre2test's  callout
        function.


        Inserting callouts can be helpful when using pcre2test to check compli-
-       cated  regular expressions. For further information about callouts, see
+       cated regular expressions. For further information about callouts,  see
        the pcre2callout documentation.



@@ -1755,47 +1789,47 @@
NON-PRINTING CHARACTERS

        When pcre2test is outputting text in the compiled version of a pattern,
-       bytes  other  than 32-126 are always treated as non-printing characters
+       bytes other than 32-126 are always treated as  non-printing  characters
        and are therefore shown as hex escapes.


-       When pcre2test is outputting text that is a matched part of  a  subject
-       string,  it behaves in the same way, unless a different locale has been
-       set for the pattern (using the locale  modifier).  In  this  case,  the
-       isprint()  function  is  used  to distinguish printing and non-printing
+       When  pcre2test  is outputting text that is a matched part of a subject
+       string, it behaves in the same way, unless a different locale has  been
+       set  for  the  pattern  (using  the locale modifier). In this case, the
+       isprint() function is used to  distinguish  printing  and  non-printing
        characters.



SAVING AND RESTORING COMPILED PATTERNS

-       It is possible to save compiled patterns  on  disc  or  elsewhere,  and
+       It  is  possible  to  save  compiled patterns on disc or elsewhere, and
        reload them later, subject to a number of restrictions. JIT data cannot
-       be saved. The host on which the patterns are reloaded must  be  running
+       be  saved.  The host on which the patterns are reloaded must be running
        the same version of PCRE2, with the same code unit width, and must also
-       have the same endianness, pointer width  and  PCRE2_SIZE  type.  Before
-       compiled  patterns  can be saved they must be serialized, that is, con-
-       verted to a stream of bytes. A single byte stream may contain any  num-
-       ber  of  compiled  patterns,  but  they must all use the same character
+       have  the  same  endianness,  pointer width and PCRE2_SIZE type. Before
+       compiled patterns can be saved they must be serialized, that  is,  con-
+       verted  to a stream of bytes. A single byte stream may contain any num-
+       ber of compiled patterns, but they must  all  use  the  same  character
        tables. A single copy of the tables is included in the byte stream (its
        size is 1088 bytes).


-       The  functions  whose  names  begin  with pcre2_serialize_ are used for
-       serializing and de-serializing. They are described in the  pcre2serial-
+       The functions whose names begin  with  pcre2_serialize_  are  used  for
+       serializing  and de-serializing. They are described in the pcre2serial-
        ize  documentation.  In  this  section  we  describe  the  features  of
        pcre2test that can be used to test these functions.


-       Note that "serialization" in PCRE2 does not convert  compiled  patterns
-       to  an  abstract  format  like Java or .NET. It just makes a reloadable
+       Note  that  "serialization" in PCRE2 does not convert compiled patterns
+       to an abstract format like Java or .NET. It  just  makes  a  reloadable
        byte code stream.  Hence the restrictions on reloading mentioned above.


-       In pcre2test, when a pattern with push modifier  is  successfully  com-
-       piled,  it  is  pushed onto a stack of compiled patterns, and pcre2test
-       expects the next line to contain a new pattern (or command) instead  of
+       In  pcre2test,  when  a pattern with push modifier is successfully com-
+       piled, it is pushed onto a stack of compiled  patterns,  and  pcre2test
+       expects  the next line to contain a new pattern (or command) instead of
        a subject line. By contrast, the pushcopy modifier causes a copy of the
-       compiled pattern to be stacked,  leaving  the  original  available  for
+       compiled  pattern  to  be  stacked,  leaving the original available for
        immediate matching. By using push and/or pushcopy, a number of patterns
-       can be compiled and retained. These  modifiers  are  incompatible  with
+       can  be  compiled  and  retained. These modifiers are incompatible with
        posix, and control modifiers that act at match time are ignored (with a
-       message) for the stacked patterns. The jitverify modifier applies  only
+       message)  for the stacked patterns. The jitverify modifier applies only
        at compile time.


        The command
@@ -1803,21 +1837,21 @@
          #save <filename>


        causes all the stacked patterns to be serialized and the result written
-       to the named file. Afterwards, all the stacked patterns are freed.  The
+       to  the named file. Afterwards, all the stacked patterns are freed. The
        command


          #load <filename>


-       reads  the  data in the file, and then arranges for it to be de-serial-
-       ized, with the resulting compiled patterns added to the pattern  stack.
-       The  pattern  on the top of the stack can be retrieved by the #pop com-
-       mand, which must be followed by  lines  of  subjects  that  are  to  be
-       matched  with  the pattern, terminated as usual by an empty line or end
-       of file. This command may be followed by  a  modifier  list  containing
-       only  control  modifiers that act after a pattern has been compiled. In
+       reads the data in the file, and then arranges for it to  be  de-serial-
+       ized,  with the resulting compiled patterns added to the pattern stack.
+       The pattern on the top of the stack can be retrieved by the  #pop  com-
+       mand,  which  must  be  followed  by  lines  of subjects that are to be
+       matched with the pattern, terminated as usual by an empty line  or  end
+       of  file.  This  command  may be followed by a modifier list containing
+       only control modifiers that act after a pattern has been  compiled.  In
        particular,  hex,  posix,  posix_nosub,  push,  and  pushcopy  are  not
-       allowed,  nor are any option-setting modifiers.  The JIT modifiers are,
-       however permitted. Here is an example that saves and reloads  two  pat-
+       allowed, nor are any option-setting modifiers.  The JIT modifiers  are,
+       however  permitted.  Here is an example that saves and reloads two pat-
        terns.


          /abc/push
@@ -1830,10 +1864,10 @@
          #pop jit,bincode
          abc


-       If  jitverify  is  used with #pop, it does not automatically imply jit,
+       If jitverify is used with #pop, it does not  automatically  imply  jit,
        which is different behaviour from when it is used on a pattern.


-       The #popcopy command is analagous to the pushcopy modifier in  that  it
+       The  #popcopy  command is analagous to the pushcopy modifier in that it
        makes current a copy of the topmost stack pattern, leaving the original
        still on the stack.


@@ -1853,5 +1887,5 @@

REVISION

-       Last updated: 21 September 2018
+       Last updated: 12 November 2018
        Copyright (c) 1997-2018 University of Cambridge.


Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/src/pcre2.h.in    2018-11-12 16:02:01 UTC (rev 1039)
@@ -549,8 +549,12 @@
 typedef struct pcre2_substitute_callout_block { \
   uint32_t      version;           /* Identifies version of block */ \
   /* ------------------------ Version 0 ------------------------------- */ \
-  PCRE2_SIZE    input_offsets[2];  /* Matched portion of the input */ \
+  PCRE2_SPTR    input;             /* Pointer to input subject string */ \
+  PCRE2_SPTR    output;            /* Pointer to output buffer */ \
   PCRE2_SIZE    output_offsets[2]; /* Changed portion of the output */ \
+  PCRE2_SIZE   *ovector;           /* Pointer to current ovector */ \
+  uint32_t      oveccount;         /* Count of pairs set in ovector */ \
+  uint32_t      subscount;         /* Substitution number */ \
   /* ------------------------------------------------------------------ */ \
 } pcre2_substitute_callout_block;


@@ -609,7 +613,7 @@
     int (*)(pcre2_callout_block *, void *), void *); \
 PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
   pcre2_set_substitute_callout(pcre2_match_context *, \
-    void (*)(pcre2_substitute_callout_block *, void *), void *); \
+    int (*)(pcre2_substitute_callout_block *, void *), void *); \
 PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
   pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \
 PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \


Modified: code/trunk/src/pcre2_context.c
===================================================================
--- code/trunk/src/pcre2_context.c    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/src/pcre2_context.c    2018-11-12 16:02:01 UTC (rev 1039)
@@ -407,7 +407,7 @@


 PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
 pcre2_set_substitute_callout(pcre2_match_context *mcontext,
-  void (*substitute_callout)(pcre2_substitute_callout_block *, void *), 
+  int (*substitute_callout)(pcre2_substitute_callout_block *, void *), 
     void *substitute_callout_data)
 {
 mcontext->substitute_callout = substitute_callout;


Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/src/pcre2_intmodedep.h    2018-11-12 16:02:01 UTC (rev 1039)
@@ -585,7 +585,7 @@
 #endif
   int    (*callout)(pcre2_callout_block *, void *);
   void    *callout_data;
-  void    (*substitute_callout)(pcre2_substitute_callout_block *, void *);
+  int    (*substitute_callout)(pcre2_substitute_callout_block *, void *);
   void    *substitute_callout_data;
   PCRE2_SIZE offset_limit;
   uint32_t heap_limit;


Modified: code/trunk/src/pcre2_substitute.c
===================================================================
--- code/trunk/src/pcre2_substitute.c    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/src/pcre2_substitute.c    2018-11-12 16:02:01 UTC (rev 1039)
@@ -241,13 +241,15 @@
 PCRE2_SIZE ovecsave[3];
 pcre2_substitute_callout_block scb;


-scb.version = 0;
+/* General initialization */
+
buff_offset = 0;
lengthleft = buff_length = *blength;
*blength = PCRE2_UNSET;
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;

-/* Partial matching is not valid. */
+/* Partial matching is not valid. This must come after setting *blength to
+PCRE2_UNSET, so as not to imply an offset in the replacement. */

if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION;
@@ -266,6 +268,13 @@
ovector = pcre2_get_ovector_pointer(match_data);
ovector_count = pcre2_get_ovector_count(match_data);

+/* Fixed things in the callout block */
+
+scb.version = 0;
+scb.input = subject;
+scb.output = (PCRE2_SPTR)buffer;
+scb.ovector = ovector;
+
/* Find lengths of zero-terminated strings and the end of the replacement. */

 if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
@@ -393,11 +402,6 @@
     goto EXIT;   
     }   


-  /* Save the match point for a possible callout */
-  
-  scb.input_offsets[0] = ovector[0];
-  scb.input_offsets[1] = ovector[1];   
-    
   /* Count substitutions with a paranoid check for integer overflow; surely no
   real call to this function would ever hit this! */


@@ -409,12 +413,13 @@
subs++;

/* Copy the text leading up to the match, and remember where the insert
- begins. */
+ begins and how many ovector pairs are set. */

if (rc == 0) rc = ovector_count;
fraglength = ovector[0] - start_offset;
CHECKMEMCPY(subject + start_offset, fraglength);
scb.output_offsets[0] = buff_offset;
+ scb.oveccount = rc;

/* Process the replacement string. Literal mode is set by \Q, but only in
extended mode when backslashes are being interpreted. In extended mode we
@@ -836,8 +841,26 @@

   if (!overflowed && mcontext->substitute_callout != NULL)
     {
+    scb.subscount = subs;  
     scb.output_offsets[1] = buff_offset;
-    mcontext->substitute_callout(&scb, mcontext->substitute_callout_data); 
+    rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data); 
+
+    /* A non-zero return means cancel this substitution. Instead, copy the 
+    matched string fragment. */
+
+    if (rc != 0)
+      {
+      PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0];
+      PCRE2_SIZE oldlength = ovector[1] - ovector[0];
+      
+      buff_offset -= newlength;
+      lengthleft += newlength;
+      CHECKMEMCPY(subject + ovector[0], oldlength);    
+      
+      /* A negative return means do not do any more. */
+      
+      if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL);
+      }
     }   


/* Save the details of this match. See above for how this data is used. If we

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/src/pcre2test.c    2018-11-12 16:02:01 UTC (rev 1039)
@@ -531,12 +531,14 @@
 subject must be at the start and in the same order in both cases so that the
 same offset in the big table below works for both. */


-typedef struct patctl {    /* Structure for pattern modifiers. */
-  uint32_t  options;       /* Must be in same position as datctl */
-  uint32_t  control;       /* Must be in same position as datctl */
-  uint32_t  control2;      /* Must be in same position as datctl */
-  uint32_t  jitstack;      /* Must be in same position as datctl */
+typedef struct patctl {       /* Structure for pattern modifiers. */
+  uint32_t  options;          /* Must be in same position as datctl */
+  uint32_t  control;          /* Must be in same position as datctl */
+  uint32_t  control2;         /* Must be in same position as datctl */
+  uint32_t  jitstack;         /* Must be in same position as datctl */
    uint8_t  replacement[REPLACE_MODSIZE];  /* So must this */
+  uint32_t  substitute_skip;  /* Must be in same position as patctl */
+  uint32_t  substitute_stop;  /* Must be in same position as patctl */ 
   uint32_t  jit;
   uint32_t  stackguard_test;
   uint32_t  tables_id;
@@ -551,12 +553,14 @@
 #define MAXCPYGET 10
 #define LENCPYGET 64


-typedef struct datctl {    /* Structure for data line modifiers. */
-  uint32_t  options;       /* Must be in same position as patctl */
-  uint32_t  control;       /* Must be in same position as patctl */
-  uint32_t  control2;      /* Must be in same position as patctl */
-  uint32_t  jitstack;      /* Must be in same position as patctl */
+typedef struct datctl {       /* Structure for data line modifiers. */
+  uint32_t  options;          /* Must be in same position as patctl */
+  uint32_t  control;          /* Must be in same position as patctl */
+  uint32_t  control2;         /* Must be in same position as patctl */
+  uint32_t  jitstack;         /* Must be in same position as patctl */
    uint8_t  replacement[REPLACE_MODSIZE];  /* So must this */
+  uint32_t  substitute_skip;  /* Must be in same position as patctl */
+  uint32_t  substitute_stop;  /* Must be in same position as patctl */ 
   uint32_t  startend[2];
   uint32_t  cerror[2];
   uint32_t  cfail[2];
@@ -704,6 +708,8 @@
   { "substitute_callout",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_CALLOUT,    PO(control2) },
   { "substitute_extended",        MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_EXTENDED,   PO(control2) },
   { "substitute_overflow_length", MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
+  { "substitute_skip",            MOD_PND,  MOD_INT, 0,                          PO(substitute_skip) },
+  { "substitute_stop",            MOD_PND,  MOD_INT, 0,                          PO(substitute_stop) },
   { "substitute_unknown_unset",   MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
   { "substitute_unset_empty",     MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
   { "tables",                     MOD_PAT,  MOD_INT, 0,                          PO(tables_id) },
@@ -1370,13 +1376,13 @@
 #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
   if (test_mode == PCRE8_MODE) \
     pcre2_set_substitute_callout_8(G(a,8), \
-      (void (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \
+      (int (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \
   else if (test_mode == PCRE16_MODE) \
     pcre2_set_substitute_callout_16(G(a,16), \
-      (void (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \
+      (int (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \
   else \
     pcre2_set_substitute_callout_32(G(a,32), \
-      (void (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
+      (int (*)(pcre2_substitute_callout_block_32 *, void *))b,c)


 #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
   if (test_mode == PCRE8_MODE) \
@@ -1850,10 +1856,10 @@
 #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
   if (test_mode == G(G(PCRE,BITONE),_MODE)) \
     G(pcre2_set_substitute_callout_,BITONE)(G(a,BITONE), \
-      (void (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \
+      (int (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \
   else \
     G(pcre2_set_substitute_callout_,BITTWO)(G(a,BITTWO), \
-      (void (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c)
+      (int (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c)


 #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
   if (test_mode == G(G(PCRE,BITONE),_MODE)) \
@@ -2058,7 +2064,7 @@
 #define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_8(G(a,8),b)
 #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
   pcre2_set_substitute_callout_8(G(a,8), \
-    (void (*)(pcre2_substitute_callout_block_8 *, void *))b,c)
+    (int (*)(pcre2_substitute_callout_block_8 *, void *))b,c)
 #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
   a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),G(h,8), \
     (PCRE2_SPTR8)i,j,(PCRE2_UCHAR8 *)k,l)
@@ -2165,7 +2171,7 @@
 #define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_16(G(a,16),b)
 #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
   pcre2_set_substitute_callout_16(G(a,16), \
-    (void (*)(pcre2_substitute_callout_block_16 *, void *))b,c)
+    (int (*)(pcre2_substitute_callout_block_16 *, void *))b,c)
 #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
   a = pcre2_substitute_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),G(h,16), \
     (PCRE2_SPTR16)i,j,(PCRE2_UCHAR16 *)k,l)
@@ -2272,7 +2278,7 @@
 #define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_32(G(a,32),b)
 #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
   pcre2_set_substitute_callout_32(G(a,32), \
-    (void (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
+    (int (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
 #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
   a = pcre2_substitute_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),G(h,32), \
     (PCRE2_SPTR32)i,j,(PCRE2_UCHAR32 *)k,l)
@@ -5955,17 +5961,40 @@
 Returns:      nothing
 */


-static void
+static int
substitute_callout_function(pcre2_substitute_callout_block_8 *scb,
void *data_ptr)
{
+int yield = 0;
+BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
(void)data_ptr; /* Not used */
-fprintf(outfile, "Old %" SIZ_FORM " %" SIZ_FORM " New %" SIZ_FORM
- " %" SIZ_FORM "\n",
- SIZ_CAST scb->input_offsets[0],
- SIZ_CAST scb->input_offsets[1],
- SIZ_CAST scb->output_offsets[0],
- SIZ_CAST scb->output_offsets[1]);
+
+fprintf(outfile, "%2d(%d) Old %" SIZ_FORM " %" SIZ_FORM " \"",
+ scb->subscount, scb->oveccount,
+ SIZ_CAST scb->ovector[0], SIZ_CAST scb->ovector[1]);
+
+PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0],
+ utf, outfile);
+
+fprintf(outfile, "\" New %" SIZ_FORM " %" SIZ_FORM " \"",
+ SIZ_CAST scb->output_offsets[0], SIZ_CAST scb->output_offsets[1]);
+
+PCHARSV(scb->output, scb->output_offsets[0],
+ scb->output_offsets[1] - scb->output_offsets[0], utf, outfile);
+
+if (scb->subscount == dat_datctl.substitute_stop)
+ {
+ yield = -1;
+ fprintf(outfile, " STOPPED");
+ }
+else if (scb->subscount == dat_datctl.substitute_skip)
+ {
+ yield = +1;
+ fprintf(outfile, " SKIPPED");
+ }
+
+fprintf(outfile, "\"\n");
+return yield;
}


@@ -6494,6 +6523,11 @@
strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement);
if (dat_datctl.jitstack == 0) dat_datctl.jitstack = pat_patctl.jitstack;

+if (dat_datctl.substitute_skip == 0)
+    dat_datctl.substitute_skip = pat_patctl.substitute_skip;
+if (dat_datctl.substitute_stop == 0)
+    dat_datctl.substitute_stop = pat_patctl.substitute_stop;
+
 /* Initialize for scanning the data line. */


#ifdef SUPPORT_PCRE2_8
@@ -6832,7 +6866,12 @@

if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl))
return PR_OK;
+
+/* Setting substitute_{skip,fail} implies a substitute callout. */

+if (dat_datctl.substitute_skip != 0 || dat_datctl.substitute_stop != 0)
+ dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT;
+
/* Check for mutually exclusive modifiers. At present, these are all in the
first control word. */


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/testdata/testinput2    2018-11-12 16:02:01 UTC (rev 1039)
@@ -5516,7 +5516,22 @@


 /a(b)c|xyz/g,replace=<$0>,substitute_callout
     abcdefabcpqr
+    abxyzpqrabcxyz
+    12abc34xyz99abc55\=substitute_stop=2
+    12abc34xyz99abc55\=substitute_skip=1
+    12abc34xyz99abc55\=substitute_skip=2


+/a(b)c|xyz/g,replace=<$0>
+    abcdefabcpqr
+    abxyzpqrabcxyz
+    12abc34xyz\=substitute_stop=2
+    12abc34xyz\=substitute_skip=1
+
+/a(b)c|xyz/replace=<$0>
+    abcdefabcpqr
+    12abc34xyz\=substitute_skip=1
+    12abc34xyz\=substitute_stop=1
+
 /abc\rdef/
     abc\ndef



Modified: code/trunk/testdata/testoutput10
===================================================================
--- code/trunk/testdata/testoutput10    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/testdata/testoutput10    2018-11-12 16:02:01 UTC (rev 1039)
@@ -1630,10 +1630,10 @@


 /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
     123abcáyzabcdef789abcሴqr
-Old 6 6  New 6 8
-Old 13 13  New 15 17
-Old 13 16  New 17 22
-Old 22 22  New 28 30
+ 1(2) Old 6 6 "" New 6 8 "<>"
+ 2(2) Old 13 13 "" New 15 17 "<>"
+ 3(2) Old 13 16 "def" New 17 22 "<def>"
+ 4(2) Old 22 22 "" New 28 30 "<>"
  4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr


# End of testinput10

Modified: code/trunk/testdata/testoutput12-16
===================================================================
--- code/trunk/testdata/testoutput12-16    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/testdata/testoutput12-16    2018-11-12 16:02:01 UTC (rev 1039)
@@ -1475,10 +1475,10 @@


 /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
     123abcáyzabcdef789abcሴqr
-Old 6 6  New 6 8
-Old 12 12  New 14 16
-Old 12 15  New 16 21
-Old 21 21  New 27 29
+ 1(2) Old 6 6 "" New 6 8 "<>"
+ 2(2) Old 12 12 "" New 14 16 "<>"
+ 3(2) Old 12 15 "def" New 16 21 "<def>"
+ 4(2) Old 21 21 "" New 27 29 "<>"
  4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr


# A few script run tests in non-UTF mode (but they need Unicode support)

Modified: code/trunk/testdata/testoutput12-32
===================================================================
--- code/trunk/testdata/testoutput12-32    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/testdata/testoutput12-32    2018-11-12 16:02:01 UTC (rev 1039)
@@ -1472,10 +1472,10 @@


 /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
     123abcáyzabcdef789abcሴqr
-Old 6 6  New 6 8
-Old 12 12  New 14 16
-Old 12 15  New 16 21
-Old 21 21  New 27 29
+ 1(2) Old 6 6 "" New 6 8 "<>"
+ 2(2) Old 12 12 "" New 14 16 "<>"
+ 3(2) Old 12 15 "def" New 16 21 "<def>"
+ 4(2) Old 21 21 "" New 27 29 "<>"
  4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr


# A few script run tests in non-UTF mode (but they need Unicode support)

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2018-11-09 18:10:25 UTC (rev 1038)
+++ code/trunk/testdata/testoutput2    2018-11-12 16:02:01 UTC (rev 1039)
@@ -16797,10 +16797,53 @@


 /a(b)c|xyz/g,replace=<$0>,substitute_callout
     abcdefabcpqr
-Old 0 3  New 0 5
-Old 6 9  New 8 13
+ 1(2) Old 0 3 "abc" New 0 5 "<abc>"
+ 2(2) Old 6 9 "abc" New 8 13 "<abc>"
  2: <abc>def<abc>pqr
+    abxyzpqrabcxyz
+ 1(1) Old 2 5 "xyz" New 2 7 "<xyz>"
+ 2(2) Old 8 11 "abc" New 10 15 "<abc>"
+ 3(1) Old 11 14 "xyz" New 15 20 "<xyz>"
+ 3: ab<xyz>pqr<abc><xyz>
+    12abc34xyz99abc55\=substitute_stop=2
+ 1(2) Old 2 5 "abc" New 2 7 "<abc>"
+ 2(1) Old 7 10 "xyz" New 9 14 "<xyz> STOPPED"
+ 2: 12<abc>34xyz99abc55
+    12abc34xyz99abc55\=substitute_skip=1
+ 1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
+ 2(1) Old 7 10 "xyz" New 7 12 "<xyz>"
+ 3(2) Old 12 15 "abc" New 14 19 "<abc>"
+ 3: 12abc34<xyz>99<abc>55
+    12abc34xyz99abc55\=substitute_skip=2
+ 1(2) Old 2 5 "abc" New 2 7 "<abc>"
+ 2(1) Old 7 10 "xyz" New 9 14 "<xyz> SKIPPED"
+ 3(2) Old 12 15 "abc" New 14 19 "<abc>"
+ 3: 12<abc>34xyz99<abc>55


+/a(b)c|xyz/g,replace=<$0>
+    abcdefabcpqr
+ 2: <abc>def<abc>pqr
+    abxyzpqrabcxyz
+ 3: ab<xyz>pqr<abc><xyz>
+    12abc34xyz\=substitute_stop=2
+ 1(2) Old 2 5 "abc" New 2 7 "<abc>"
+ 2(1) Old 7 10 "xyz" New 9 14 "<xyz> STOPPED"
+ 2: 12<abc>34xyz
+    12abc34xyz\=substitute_skip=1
+ 1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
+ 2(1) Old 7 10 "xyz" New 7 12 "<xyz>"
+ 2: 12abc34<xyz>
+
+/a(b)c|xyz/replace=<$0>
+    abcdefabcpqr
+ 1: <abc>defabcpqr
+    12abc34xyz\=substitute_skip=1
+ 1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
+ 1: 12abc34xyz
+    12abc34xyz\=substitute_stop=1
+ 1(2) Old 2 5 "abc" New 2 7 "<abc> STOPPED"
+ 1: 12abc34xyz
+
 /abc\rdef/
     abc\ndef
 No match