[Pcre-svn] [1196] code/trunk: Implement PCRE2_SUBSTITUTE_MAT…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1196] code/trunk: Implement PCRE2_SUBSTITUTE_MATCHED.
Revision: 1196
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1196
Author:   ph10
Date:     2019-12-27 13:35:17 +0000 (Fri, 27 Dec 2019)
Log Message:
-----------
Implement PCRE2_SUBSTITUTE_MATCHED.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/html/pcre2_substitute.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_substitute.3
    code/trunk/doc/pcre2api.3
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_substitute.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/ChangeLog    2019-12-27 13:35:17 UTC (rev 1196)
@@ -26,7 +26,9 @@


6. Avoid some VS compiler warnings.

+7. Added PCRE2_SUBSTITUTE_MATCHED.

+
Version 10.34 21-November-2019
------------------------------


Modified: code/trunk/doc/html/pcre2_substitute.html
===================================================================
--- code/trunk/doc/html/pcre2_substitute.html    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/doc/html/pcre2_substitute.html    2019-12-27 13:35:17 UTC (rev 1196)
@@ -48,8 +48,8 @@
   <i>outlengthptr</i>  Points to the length of the output buffer
 </pre>
 A match data block is needed only if you want to inspect the data from the
-match that is returned in that block. A match context is needed only if you
-want to:
+match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
+match context is needed only if you want to:
 <pre>
   Set up a callout function
   Set a matching offset limit
@@ -75,16 +75,17 @@
   PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
   PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
   PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal 
+  PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match 
   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
   PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
 </pre>
-PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED, 
-PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
+If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED, 
+PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
 </P>
 <P>
 The function returns the number of substitutions, which may be zero if there
-were no matches. The result can be greater than one only when
+are no matches. The result may be greater than one only when
 PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
 is returned.
 </P>


Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/doc/html/pcre2api.html    2019-12-27 13:35:17 UTC (rev 1196)
@@ -3302,14 +3302,21 @@
 <b>  PCRE2_SIZE *<i>outlengthptr</i>);</b>
 </P>
 <P>
-This function calls <b>pcre2_match()</b> and then makes a copy of the subject
-string in <i>outputbuffer</i>, replacing one or more parts that were matched
-with the <i>replacement</i> string, whose length is supplied in <b>rlength</b>.
-This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
-The default is to perform just one replacement, but there is an option that
-requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
+This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
+subject string in <i>outputbuffer</i>, replacing parts that were matched with
+the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
+can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
+is to perform just one replacement if the pattern matches, but there is an
+option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
+for details).
 </P>
 <P>
+If successful, <b>pcre2_substitute()</b> returns the number of substitutions
+that were carried out. This may be zero if no match was found, and is never
+greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
+returned if an error is detected (see below for details).
+</P>
+<P>
 Matches in which a \K item in a lookahead in the pattern causes the match to
 end before it starts are not supported, and give rise to an error return. For
 global replacements, matches in which \K in a lookbehind causes the match to
@@ -3327,16 +3334,33 @@
 <P>
 If an external <i>match_data</i> block is provided, its contents afterwards
 are those set by the final call to <b>pcre2_match()</b>. For global changes,
-this will have ended in a matching error. The contents of the ovector within
+this will have ended in a no-match error. The contents of the ovector within
 the match data block may or may not have been changed.
 </P>
 <P>
-The <i>outlengthptr</i> argument must point to a variable that contains the
-length, in code units, of the output buffer. If the function is successful, the
-value is updated to contain the length of the new string, excluding the
-trailing zero that is automatically added.
+As well as the usual options for <b>pcre2_match()</b>, a number of additional
+options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
+One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
+<i>match_data</i> block must be provided, and it must have been used for an
+external call to <b>pcre2_match()</b>. The data in the <i>match_data</i> block
+(return code, offset vector) is used for the first substitution instead of
+calling <b>pcre2_match()</b> from within <b>pcre2_substitute()</b>. This allows
+an application to check for a match before choosing to substitute, without
+having to repeat the match.
 </P>
 <P>
+The <i>code</i> argument is not used for the first substitution, but if
+PCRE2_SUBSTITUTE_GLOBAL is set, <b>pcre2_match()</b> will be called after the
+first substitution to check for further matches, and the contents of the
+<i>match_data</i> block will be changed.
+</P>
+<P>
+The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
+variable that contains the length, in code units, of the output buffer. If the
+function is successful, the value is updated to contain the length of the new
+string, excluding the trailing zero that is automatically added.
+</P>
+<P>
 If the function is not successful, the value set via <i>outlengthptr</i> depends
 on the type of error. For syntax errors in the replacement string, the value is
 the offset in the replacement string where the error was detected. For other
@@ -3353,7 +3377,7 @@
 is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
 PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
 default, however, a dollar character is an escape character that can specify
-the insertion of characters from capture groups or names from (*MARK) or other
+the insertion of characters from capture groups and names from (*MARK) or other
 control verbs in the pattern. The following forms are always recognized:
 <pre>
   $$                  insert a dollar character
@@ -3378,16 +3402,6 @@
       apple lemon
    2: pear orange
 </pre>
-As well as the usual options for <b>pcre2_match()</b>, a number of additional
-options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
-</P>
-<P>
-As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to 
-be treated as a literal, with no interpretation. If this option is set, 
-PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and 
-PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
-</P>
-<P>
 PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
 replacing every matching substring. If this option is not set, only the first
 matching substring is replaced. The search for matches takes place in the
@@ -3501,14 +3515,17 @@
 groups in the extended syntax forms to be treated as unset.
 </P>
 <P>
-If successful, <b>pcre2_substitute()</b> returns the number of successful
-matches. This may be zero if no matches were found, and is never greater than 1
-unless PCRE2_SUBSTITUTE_GLOBAL is set.
+If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
+PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
+are ignored.
 </P>
+<br><b>
+Substitution errors
+</b><br>
 <P>
-In the event of an error, a negative error code is returned. Except for
-PCRE2_ERROR_NOMATCH (which is never returned), errors from <b>pcre2_match()</b>
-are passed straight back.
+In the event of an error, <b>pcre2_substitute()</b> returns a negative error
+code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
+<b>pcre2_match()</b> are passed straight back.
 </P>
 <P>
 PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
@@ -3526,6 +3543,10 @@
 default.
 </P>
 <P>
+PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
+<i>match_data</i> argument is NULL.
+</P>
+<P>
 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
 replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
 (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
@@ -3876,7 +3897,7 @@
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 26 December 2019
+Last updated: 27 December 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/doc/pcre2.txt    2019-12-27 13:35:17 UTC (rev 1196)
@@ -3193,73 +3193,94 @@
          PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
          PCRE2_SIZE *outlengthptr);


-       This function calls pcre2_match() and then makes a copy of the  subject
-       string  in  outputbuffer, replacing one or more parts that were matched
+       This function optionally calls pcre2_match() and then makes a  copy  of
+       the  subject  string in outputbuffer, replacing parts that were matched
        with the replacement string, whose length is supplied in rlength.  This
-       can  be  given  as  PCRE2_ZERO_TERMINATED for a zero-terminated string.
-       The default is to perform just one replacement, but there is an  option
-       that  requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
-       for details).
+       can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The
+       default is to perform just one replacement if the pattern matches,  but
+       there  is an option that requests multiple replacements (see PCRE2_SUB-
+       STITUTE_GLOBAL below for details).


-       Matches in which a \K item in a lookahead in  the  pattern  causes  the
-       match  to  end  before it starts are not supported, and give rise to an
+       If successful, pcre2_substitute() returns the number  of  substitutions
+       that  were  carried out. This may be zero if no match was found, and is
+       never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set.  A  nega-
+       tive value is returned if an error is detected (see below for details).
+
+       Matches  in  which  a  \K item in a lookahead in the pattern causes the
+       match to end before it starts are not supported, and give  rise  to  an
        error return. For global replacements, matches in which \K in a lookbe-
-       hind  causes the match to start earlier than the point that was reached
+       hind causes the match to start earlier than the point that was  reached
        in the previous iteration are also not supported.


-       The first seven arguments of pcre2_substitute() are  the  same  as  for
+       The  first  seven  arguments  of pcre2_substitute() are the same as for
        pcre2_match(), except that the partial matching options are not permit-
-       ted, and match_data may be passed as NULL, in which case a  match  data
-       block  is obtained and freed within this function, using memory manage-
-       ment functions from the match context, if provided, or else those  that
+       ted,  and  match_data may be passed as NULL, in which case a match data
+       block is obtained and freed within this function, using memory  manage-
+       ment  functions from the match context, if provided, or else those that
        were used to allocate memory for the compiled code.


-       If  an  external  match_data block is provided, its contents afterwards
-       are those set by the final call to pcre2_match(). For  global  changes,
-       this  will  have ended in a matching error. The contents of the ovector
+       If an external match_data block is provided,  its  contents  afterwards
+       are  those  set by the final call to pcre2_match(). For global changes,
+       this will have ended in a no-match error. The contents of  the  ovector
        within the match data block may or may not have been changed.


-       The outlengthptr argument must point to a variable  that  contains  the
-       length,  in  code  units, of the output buffer. If the function is suc-
-       cessful, the value is updated to contain the length of the new  string,
-       excluding the trailing zero that is automatically added.
+       As  well as the usual options for pcre2_match(), a number of additional
+       options can be set in the options argument of pcre2_substitute().   One
+       such  option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
+       match_data block must be provided, and it must have been  used  for  an
+       external  call  to pcre2_match(). The data in the match_data block (re-
+       turn code, offset vector) is used for the first substitution instead of
+       calling  pcre2_match()  from  within pcre2_substitute(). This allows an
+       application to check for a match before choosing to substitute, without
+       having to repeat the match.


-       If  the  function is not successful, the value set via outlengthptr de-
-       pends on the type of  error.  For  syntax  errors  in  the  replacement
+       The  code  argument  is  not  used  for  the first substitution, but if
+       PCRE2_SUBSTITUTE_GLOBAL is set, pcre2_match() will be called after  the
+       first  substitution  to  check for further matches, and the contents of
+       the match_data block will be changed.
+
+       The outlengthptr argument of pcre2_substitute() must point to  a  vari-
+       able  that contains the length, in code units, of the output buffer. If
+       the function is successful, the value is updated to contain the  length
+       of  the  new  string, excluding the trailing zero that is automatically
+       added.
+
+       If the function is not successful, the value set via  outlengthptr  de-
+       pends  on  the  type  of  error.  For  syntax errors in the replacement
        string, the value is the offset in the replacement string where the er-
-       ror was detected. For other errors, the value  is  PCRE2_UNSET  by  de-
+       ror  was  detected.  For  other errors, the value is PCRE2_UNSET by de-
        fault. This includes the case of the output buffer being too small, un-
        less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
-       the  value is the minimum length needed, including space for the trail-
+       the value is the minimum length needed, including space for the  trail-
        ing zero. Note that in order to compute the required length, pcre2_sub-
        stitute() has to simulate all the matching and copying, instead of giv-
        ing an error return as soon as the buffer overflows. Note also that the
        length is in code units, not bytes.


-       The  replacement  string,  which  is interpreted as a UTF string in UTF
-       mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK  option
+       The replacement string, which is interpreted as a  UTF  string  in  UTF
+       mode,  is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
        is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
        preted in any way. By default, however, a dollar character is an escape
-       character  that  can  specify  the insertion of characters from capture
-       groups or names from (*MARK) or other control verbs in the pattern. The
-       following forms are always recognized:
+       character that can specify the insertion  of  characters  from  capture
+       groups  and  names  from (*MARK) or other control verbs in the pattern.
+       The following forms are always recognized:


          $$                  insert a dollar character
          $<n> or ${<n>}      insert the contents of group <n>
          $*MARK or ${*MARK}  insert a control verb name


-       Either  a  group  number  or  a  group name can be given for <n>. Curly
-       brackets are required only if the following character would  be  inter-
+       Either a group number or a group name  can  be  given  for  <n>.  Curly
+       brackets  are  required only if the following character would be inter-
        preted as part of the number or name. The number may be zero to include
-       the entire matched string.   For  example,  if  the  pattern  a(b)c  is
-       matched  with "=abc=" and the replacement string "+$1$0$1+", the result
+       the  entire  matched  string.   For  example,  if  the pattern a(b)c is
+       matched with "=abc=" and the replacement string "+$1$0$1+", the  result
        is "=+babcb+=".


-       $*MARK inserts the name from the last encountered backtracking  control
-       verb  on the matching path that has a name. (*MARK) must always include
-       a name, but the other verbs need not.  For  example,  in  the  case  of
+       $*MARK  inserts the name from the last encountered backtracking control
+       verb on the matching path that has a name. (*MARK) must always  include
+       a  name,  but  the  other  verbs  need not. For example, in the case of
        (*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
-       the relevant name is "B". This facility can be used to  perform  simple
+       the  relevant  name is "B". This facility can be used to perform simple
        simultaneous substitutions, as this pcre2test example shows:


          /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
@@ -3266,24 +3287,16 @@
              apple lemon
           2: pear orange


-       As  well as the usual options for pcre2_match(), a number of additional
-       options can be set in the options argument of pcre2_substitute().
-
-       As mentioned above,  PCRE2_SUBSTITUTE_LITERAL  causes  the  replacement
-       string  to be treated as a literal, with no interpretation. If this op-
-       tion is set, PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
-       and PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
-
        PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
-       string, replacing every matching substring. If this option is not  set,
-       only  the  first matching substring is replaced. The search for matches
-       takes place in the original subject string (that is, previous  replace-
-       ments  do  not  affect  it).  Iteration is implemented by advancing the
-       startoffset value for each search, which is always  passed  the  entire
+       string,  replacing every matching substring. If this option is not set,
+       only the first matching substring is replaced. The search  for  matches
+       takes  place in the original subject string (that is, previous replace-
+       ments do not affect it).  Iteration is  implemented  by  advancing  the
+       startoffset  value  for  each search, which is always passed the entire
        subject string. If an offset limit is set in the match context, search-
        ing stops when that limit is reached.


-       You can restrict the effect of a global substitution to  a  portion  of
+       You  can  restrict  the effect of a global substitution to a portion of
        the subject string by setting either or both of startoffset and an off-
        set limit. Here is a pcre2test example:


@@ -3291,87 +3304,87 @@
          ABC ABC ABC ABC\=offset=3,offset_limit=12
           2: ABC A!C A!C ABC


-       When continuing with global substitutions after  matching  a  substring
+       When  continuing  with  global substitutions after matching a substring
        with zero length, an attempt to find a non-empty match at the same off-
        set is performed.  If this is not successful, the offset is advanced by
        one character except when CRLF is a valid newline sequence and the next
-       two characters are CR, LF. In this case, the offset is advanced by  two
+       two  characters are CR, LF. In this case, the offset is advanced by two
        characters.


-       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  changes  what happens when the output
+       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when  the  output
        buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
-       ORY  immediately.  If  this  option is set, however, pcre2_substitute()
+       ORY immediately. If this option  is  set,  however,  pcre2_substitute()
        continues to go through the motions of matching and substituting (with-
-       out,  of course, writing anything) in order to compute the size of buf-
-       fer that is needed. This value is  passed  back  via  the  outlengthptr
-       variable,  with  the  result  of  the  function  still  being PCRE2_ER-
+       out, of course, writing anything) in order to compute the size of  buf-
+       fer  that  is  needed.  This  value is passed back via the outlengthptr
+       variable, with  the  result  of  the  function  still  being  PCRE2_ER-
        ROR_NOMEMORY.


-       Passing a buffer size of zero is a permitted way  of  finding  out  how
-       much  memory  is needed for given substitution. However, this does mean
+       Passing  a  buffer  size  of zero is a permitted way of finding out how
+       much memory is needed for given substitution. However, this  does  mean
        that the entire operation is carried out twice. Depending on the appli-
-       cation,  it  may  be more efficient to allocate a large buffer and free
-       the  excess  afterwards,  instead   of   using   PCRE2_SUBSTITUTE_OVER-
+       cation, it may be more efficient to allocate a large  buffer  and  free
+       the   excess   afterwards,   instead  of  using  PCRE2_SUBSTITUTE_OVER-
        FLOW_LENGTH.


        PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
        do not appear in the pattern to be treated as unset groups. This option
-       should  be used with care, because it means that a typo in a group name
+       should be used with care, because it means that a typo in a group  name
        or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.


        PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
-       known  groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
-       as empty strings when inserted as described above. If  this  option  is
+       known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be  treated
+       as  empty  strings  when inserted as described above. If this option is
        not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
-       SET error. This option does not  influence  the  extended  substitution
+       SET  error.  This  option  does not influence the extended substitution
        syntax described below.


-       PCRE2_SUBSTITUTE_EXTENDED  causes extra processing to be applied to the
-       replacement string. Without this option, only the dollar  character  is
-       special,  and  only  the  group insertion forms listed above are valid.
+       PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to  the
+       replacement  string.  Without this option, only the dollar character is
+       special, and only the group insertion forms  listed  above  are  valid.
        When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:


-       Firstly, backslash in a replacement string is interpreted as an  escape
+       Firstly,  backslash in a replacement string is interpreted as an escape
        character. The usual forms such as \n or \x{ddd} can be used to specify
-       particular character codes, and backslash followed by any  non-alphanu-
-       meric  character  quotes  that character. Extended quoting can be coded
+       particular  character codes, and backslash followed by any non-alphanu-
+       meric character quotes that character. Extended quoting  can  be  coded
        using \Q...\E, exactly as in pattern strings.


-       There are also four escape sequences for forcing the case  of  inserted
-       letters.   The  insertion  mechanism has three states: no case forcing,
+       There  are  also four escape sequences for forcing the case of inserted
+       letters.  The insertion mechanism has three states:  no  case  forcing,
        force upper case, and force lower case. The escape sequences change the
        current state: \U and \L change to upper or lower case forcing, respec-
-       tively, and \E (when not terminating a \Q quoted sequence)  reverts  to
-       no  case  forcing. The sequences \u and \l force the next character (if
-       it is a letter) to upper or lower  case,  respectively,  and  then  the
+       tively,  and  \E (when not terminating a \Q quoted sequence) reverts to
+       no case forcing. The sequences \u and \l force the next  character  (if
+       it  is  a  letter)  to  upper or lower case, respectively, and then the
        state automatically reverts to no case forcing. Case forcing applies to
-       all inserted  characters, including those from capture groups and  let-
+       all  inserted  characters, including those from capture groups and let-
        ters within \Q...\E quoted sequences.


        Note that case forcing sequences such as \U...\E do not nest. For exam-
-       ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc";  the  final
-       \E  has  no  effect.  Note  also  that the PCRE2_ALT_BSUX and PCRE2_EX-
+       ple,  the  result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
+       \E has no effect. Note  also  that  the  PCRE2_ALT_BSUX  and  PCRE2_EX-
        TRA_ALT_BSUX options do not apply to replacement strings.


-       The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to  add  more
-       flexibility  to  capture  group  substitution. The syntax is similar to
+       The  second  effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
+       flexibility to capture group substitution. The  syntax  is  similar  to
        that used by Bash:


          ${<n>:-<string>}
          ${<n>:+<string1>:<string2>}


-       As before, <n> may be a group number or a name. The first  form  speci-
-       fies  a  default  value. If group <n> is set, its value is inserted; if
-       not, <string> is expanded and the  result  inserted.  The  second  form
-       specifies  strings that are expanded and inserted when group <n> is set
-       or unset, respectively. The first form is just a  convenient  shorthand
+       As  before,  <n> may be a group number or a name. The first form speci-
+       fies a default value. If group <n> is set, its value  is  inserted;  if
+       not,  <string>  is  expanded  and  the result inserted. The second form
+       specifies strings that are expanded and inserted when group <n> is  set
+       or  unset,  respectively. The first form is just a convenient shorthand
        for


          ${<n>:+${<n>}:<string>}


-       Backslash  can  be  used to escape colons and closing curly brackets in
-       the replacement strings. A change of the case forcing  state  within  a
-       replacement  string  remains  in  force  afterwards,  as  shown in this
+       Backslash can be used to escape colons and closing  curly  brackets  in
+       the  replacement  strings.  A change of the case forcing state within a
+       replacement string remains  in  force  afterwards,  as  shown  in  this
        pcre2test example:


          /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
@@ -3380,31 +3393,36 @@
              somebody
           1: HELLO


-       The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these  extended
-       substitutions.  However,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
+       The  PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
+       substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does  cause  un-
        known groups in the extended syntax forms to be treated as unset.


-       If successful, pcre2_substitute()  returns  the  number  of  successful
-       matches.  This  may  be  zero  if  no  matches were found, and is never
-       greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
+       If  PCRE2_SUBSTITUTE_LITERAL  is  set,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
+       PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrele-
+       vant and are ignored.


-       In the event of an error, a negative error code is returned. Except for
-       PCRE2_ERROR_NOMATCH    (which   is   never   returned),   errors   from
-       pcre2_match() are passed straight back.
+   Substitution errors


+       In  the  event of an error, pcre2_substitute() returns a negative error
+       code. Except for PCRE2_ERROR_NOMATCH (which is never returned),  errors
+       from pcre2_match() are passed straight back.
+
        PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
        tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.


        PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
-       ing an unknown substring when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)
-       when  the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
+       ing  an  unknown  substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
+       when the simple (non-extended) syntax is used and  PCRE2_SUBSTITUTE_UN-
        SET_EMPTY is not set.


-       PCRE2_ERROR_NOMEMORY is returned  if  the  output  buffer  is  not  big
+       PCRE2_ERROR_NOMEMORY  is  returned  if  the  output  buffer  is not big
        enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
-       of buffer that is needed is returned via outlengthptr. Note  that  this
+       of  buffer  that is needed is returned via outlengthptr. Note that this
        does not happen by default.


+       PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
+       match_data argument is NULL.
+
        PCRE2_ERROR_BADREPLACEMENT  is  used for miscellaneous syntax errors in
        the replacement string, with more  particular  errors  being  PCRE2_ER-
        ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
@@ -3727,7 +3745,7 @@


REVISION

-       Last updated: 26 December 2019
+       Last updated: 27 December 2019
        Copyright (c) 1997-2019 University of Cambridge.
 ------------------------------------------------------------------------------



Modified: code/trunk/doc/pcre2_substitute.3
===================================================================
--- code/trunk/doc/pcre2_substitute.3    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/doc/pcre2_substitute.3    2019-12-27 13:35:17 UTC (rev 1196)
@@ -1,4 +1,4 @@
-.TH PCRE2_SUBSTITUTE 3 "26 December 2019" "PCRE2 10.35"
+.TH PCRE2_SUBSTITUTE 3 "27 December 2019" "PCRE2 10.35"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -36,8 +36,8 @@
   \fIoutlengthptr\fP  Points to the length of the output buffer
 .sp
 A match data block is needed only if you want to inspect the data from the
-match that is returned in that block. A match context is needed only if you
-want to:
+match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
+match context is needed only if you want to:
 .sp
   Set up a callout function
   Set a matching offset limit
@@ -67,15 +67,16 @@
   PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
   PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
   PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal 
+  PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match 
   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
   PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
 .sp
-PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED, 
-PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
+If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED, 
+PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
 .P
 The function returns the number of substitutions, which may be zero if there
-were no matches. The result can be greater than one only when
+are no matches. The result may be greater than one only when
 PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
 is returned.
 .P


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/doc/pcre2api.3    2019-12-27 13:35:17 UTC (rev 1196)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "26 December 2019" "PCRE2 10.35"
+.TH PCRE2API 3 "27 December 2019" "PCRE2 10.35"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -3321,13 +3321,19 @@
 .B "  PCRE2_SIZE *\fIoutlengthptr\fP);"
 .fi
 .P
-This function calls \fBpcre2_match()\fP and then makes a copy of the subject
-string in \fIoutputbuffer\fP, replacing one or more parts that were matched
-with the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP.
-This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
-The default is to perform just one replacement, but there is an option that
-requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
+This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
+subject string in \fIoutputbuffer\fP, replacing parts that were matched with
+the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
+can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
+is to perform just one replacement if the pattern matches, but there is an
+option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
+for details).
 .P
+If successful, \fBpcre2_substitute()\fP returns the number of substitutions
+that were carried out. This may be zero if no match was found, and is never
+greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
+returned if an error is detected (see below for details).
+.P
 Matches in which a \eK item in a lookahead in the pattern causes the match to
 end before it starts are not supported, and give rise to an error return. For
 global replacements, matches in which \eK in a lookbehind causes the match to
@@ -3343,14 +3349,29 @@
 .P
 If an external \fImatch_data\fP block is provided, its contents afterwards
 are those set by the final call to \fBpcre2_match()\fP. For global changes,
-this will have ended in a matching error. The contents of the ovector within
+this will have ended in a no-match error. The contents of the ovector within
 the match data block may or may not have been changed.
 .P
-The \fIoutlengthptr\fP argument must point to a variable that contains the
-length, in code units, of the output buffer. If the function is successful, the
-value is updated to contain the length of the new string, excluding the
-trailing zero that is automatically added.
+As well as the usual options for \fBpcre2_match()\fP, a number of additional
+options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
+One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
+\fImatch_data\fP block must be provided, and it must have been used for an
+external call to \fBpcre2_match()\fP. The data in the \fImatch_data\fP block
+(return code, offset vector) is used for the first substitution instead of
+calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
+an application to check for a match before choosing to substitute, without
+having to repeat the match.
 .P
+The \fIcode\fP argument is not used for the first substitution, but if
+PCRE2_SUBSTITUTE_GLOBAL is set, \fBpcre2_match()\fP will be called after the
+first substitution to check for further matches, and the contents of the
+\fImatch_data\fP block will be changed.
+.P
+The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
+variable that contains the length, in code units, of the output buffer. If the
+function is successful, the value is updated to contain the length of the new
+string, excluding the trailing zero that is automatically added.
+.P
 If the function is not successful, the value set via \fIoutlengthptr\fP depends
 on the type of error. For syntax errors in the replacement string, the value is
 the offset in the replacement string where the error was detected. For other
@@ -3366,7 +3387,7 @@
 is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
 PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
 default, however, a dollar character is an escape character that can specify
-the insertion of characters from capture groups or names from (*MARK) or other
+the insertion of characters from capture groups and names from (*MARK) or other
 control verbs in the pattern. The following forms are always recognized:
 .sp
   $$                  insert a dollar character
@@ -3390,14 +3411,6 @@
       apple lemon
    2: pear orange
 .sp
-As well as the usual options for \fBpcre2_match()\fP, a number of additional
-options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
-.P
-As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to 
-be treated as a literal, with no interpretation. If this option is set, 
-PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and 
-PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
-.P
 PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
 replacing every matching substring. If this option is not set, only the first
 matching substring is replaced. The search for matches takes place in the
@@ -3500,14 +3513,18 @@
 substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
 groups in the extended syntax forms to be treated as unset.
 .P
-If successful, \fBpcre2_substitute()\fP returns the number of successful
-matches. This may be zero if no matches were found, and is never greater than 1
-unless PCRE2_SUBSTITUTE_GLOBAL is set.
+If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
+PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
+are ignored.
+.
+.
+.SS "Substitution errors"
+.rs
+.sp
+In the event of an error, \fBpcre2_substitute()\fP returns a negative error
+code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
+\fBpcre2_match()\fP are passed straight back.
 .P
-In the event of an error, a negative error code is returned. Except for
-PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
-are passed straight back.
-.P
 PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
 unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
 .P
@@ -3520,6 +3537,9 @@
 needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
 default.
 .P
+PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
+\fImatch_data\fP argument is NULL.
+.P
 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
 replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
 (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
@@ -3884,6 +3904,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 December 2019
+Last updated: 27 December 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi


Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/src/pcre2.h.in    2019-12-27 13:35:17 UTC (rev 1196)
@@ -182,6 +182,7 @@
 #define PCRE2_NO_JIT                      0x00002000u  /* Not for pcre2_dfa_match() */
 #define PCRE2_COPY_MATCHED_SUBJECT        0x00004000u
 #define PCRE2_SUBSTITUTE_LITERAL          0x00008000u  /* pcre2_substitute() only */
+#define PCRE2_SUBSTITUTE_MATCHED          0x00010000u  /* pcre2_substitute() only */


/* Options for pcre2_pattern_convert(). */


Modified: code/trunk/src/pcre2_substitute.c
===================================================================
--- code/trunk/src/pcre2_substitute.c    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/src/pcre2_substitute.c    2019-12-27 13:35:17 UTC (rev 1196)
@@ -49,8 +49,9 @@


#define SUBSTITUTE_OPTIONS \
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
- PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_OVERFLOW_LENGTH| \
- PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY)
+ PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
+ PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
+ PCRE2_SUBSTITUTE_UNSET_EMPTY)



@@ -229,6 +230,7 @@
BOOL match_data_created = FALSE;
BOOL escaped_literal = FALSE;
BOOL overflowed = FALSE;
+BOOL use_existing_match;
#ifdef SUPPORT_UNICODE
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
#endif
@@ -248,16 +250,26 @@
*blength = PCRE2_UNSET;
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;

-/* Partial matching is not valid. This must come after setting *blength to
+/* Partial matching is not valid. This must come after setting *blength to
PCRE2_UNSET, so as not to imply an offset in the replacement. */

if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION;

-/* If no match data block is provided, create one. */
+/* Check for using a match that has already happened. Note that the subject
+pointer in the match data may be NULL after a no-match. */

-if (match_data == NULL)
+use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
+
+if (use_existing_match)
   {
+  if (match_data == NULL) return PCRE2_ERROR_NULL;
+  }
+
+/* Otherwise, if no match data block is provided, create one. */
+
+else if (match_data == NULL)
+  {
   pcre2_general_context *gcontext = (mcontext == NULL)?
     (pcre2_general_context *)code :
     (pcre2_general_context *)mcontext;
@@ -310,7 +322,8 @@
   }
 CHECKMEMCPY(subject, start_offset);


-/* Loop for global substituting. */
+/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
+match is taken from the match_data that was passed in. */

subs = 0;
do
@@ -318,8 +331,13 @@
PCRE2_SPTR ptrstack[PTR_STACK_SIZE];
uint32_t ptrstackptr = 0;

-  rc = pcre2_match(code, subject, length, start_offset, options|goptions,
-    match_data, mcontext);
+  if (use_existing_match)
+    {
+    rc = match_data->rc;
+    use_existing_match = FALSE;
+    }
+  else rc = pcre2_match(code, subject, length, start_offset, options|goptions,
+      match_data, mcontext);


#ifdef SUPPORT_UNICODE
if (utf) options |= PCRE2_NO_UTF_CHECK; /* Only need to check once */
@@ -375,14 +393,14 @@

   /* Handle a successful match. Matches that use \K to end before they start
   or start before the current point in the subject are not supported. */
-  
+
   if (ovector[1] < ovector[0] || ovector[0] < start_offset)
     {
     rc = PCRE2_ERROR_BADSUBSPATTERN;
     goto EXIT;
     }
-    
-  /* Check for the same match as previous. This is legitimate after matching an 
+
+  /* Check for the same match as previous. This is legitimate after matching an
   empty string that starts after the initial match offset. We have tried again
   at the match point in case the pattern is one like /(?<=\G.)/ which can never
   match at its starting point, so running the match achieves the bumpalong. If
@@ -389,19 +407,19 @@
   we do get the same (null) match at the original match point, it isn't such a
   pattern, so we now do the empty string magic. In all other cases, a repeat
   match should never occur. */
-    
+
   if (ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
-    {                                                                        
-    if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)     
-      {                                                                   
-      goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;                 
-      ovecsave[2] = start_offset;                                     
-      continue;    /* Back to the top of the loop */                        
+    {
+    if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)
+      {
+      goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
+      ovecsave[2] = start_offset;
+      continue;    /* Back to the top of the loop */
       }
     rc = PCRE2_ERROR_INTERNAL_DUPMATCH;
-    goto EXIT;   
-    }   
-    
+    goto EXIT;
+    }
+
   /* Count substitutions with a paranoid check for integer overflow; surely no
   real call to this function would ever hit this! */


@@ -421,20 +439,20 @@
scb.output_offsets[0] = buff_offset;
scb.oveccount = rc;

-  /* Process the replacement string. If the entire replacement is literal, just 
+  /* Process the replacement string. If the entire replacement is literal, just
   copy it with length check. */
-  
+
   ptr = replacement;
   if ((suboptions & PCRE2_SUBSTITUTE_LITERAL) != 0)
     {
-    CHECKMEMCPY(ptr, rlength);  
+    CHECKMEMCPY(ptr, rlength);
     }


-  /* Within a non-literal replacement, which must be scanned character by 
+  /* Within a non-literal replacement, which must be scanned character by
   character, local literal mode can be set by \Q, but only in extended mode
   when backslashes are being interpreted. In extended mode we must handle
   nested substrings that are to be reprocessed. */
- 
+
   else for (;;)
     {
     uint32_t ch;
@@ -844,17 +862,17 @@
       } /* End handling a literal code unit */
     }   /* End of loop for scanning the replacement. */


-  /* The replacement has been copied to the output, or its size has been 
-  remembered. Do the callout if there is one and we have done an actual 
+  /* The replacement has been copied to the output, or its size has been
+  remembered. Do the callout if there is one and we have done an actual
   replacement. */
-  
+
   if (!overflowed && mcontext != NULL && mcontext->substitute_callout != NULL)
     {
-    scb.subscount = subs;  
+    scb.subscount = subs;
     scb.output_offsets[1] = buff_offset;
-    rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data); 
+    rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);


-    /* A non-zero return means cancel this substitution. Instead, copy the 
+    /* A non-zero return means cancel this substitution. Instead, copy the
     matched string fragment. */


     if (rc != 0)
@@ -861,25 +879,25 @@
       {
       PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0];
       PCRE2_SIZE oldlength = ovector[1] - ovector[0];
-      
+
       buff_offset -= newlength;
       lengthleft += newlength;
-      CHECKMEMCPY(subject + ovector[0], oldlength);    
-      
+      CHECKMEMCPY(subject + ovector[0], oldlength);
+
       /* A negative return means do not do any more. */
-      
+
       if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL);
       }
-    }   
- 
+    }
+
   /* Save the details of this match. See above for how this data is used. If we
   matched an empty string, do the magic for global matches. Finally, update the
   start offset to point to the rest of the subject string. */
-  
-  ovecsave[0] = ovector[0];                                
-  ovecsave[1] = ovector[1];                                        
+
+  ovecsave[0] = ovector[0];
+  ovecsave[1] = ovector[1];
   ovecsave[2] = start_offset;
-   
+
   goptions = (ovector[0] != ovector[1] || ovector[0] > start_offset)? 0 :
     PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
   start_offset = ovector[1];


Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/src/pcre2test.c    2019-12-27 13:35:17 UTC (rev 1196)
@@ -503,13 +503,14 @@
 #define CTL2_SUBSTITUTE_CALLOUT          0x00000001u
 #define CTL2_SUBSTITUTE_EXTENDED         0x00000002u
 #define CTL2_SUBSTITUTE_LITERAL          0x00000004u
-#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH  0x00000008u
-#define CTL2_SUBSTITUTE_UNKNOWN_UNSET    0x00000010u
-#define CTL2_SUBSTITUTE_UNSET_EMPTY      0x00000020u
-#define CTL2_SUBJECT_LITERAL             0x00000040u
-#define CTL2_CALLOUT_NO_WHERE            0x00000080u
-#define CTL2_CALLOUT_EXTRA               0x00000100u
-#define CTL2_ALLVECTOR                   0x00000200u
+#define CTL2_SUBSTITUTE_MATCHED          0x00000008u
+#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH  0x00000010u
+#define CTL2_SUBSTITUTE_UNKNOWN_UNSET    0x00000020u
+#define CTL2_SUBSTITUTE_UNSET_EMPTY      0x00000040u
+#define CTL2_SUBJECT_LITERAL             0x00000080u
+#define CTL2_CALLOUT_NO_WHERE            0x00000100u
+#define CTL2_CALLOUT_EXTRA               0x00000200u
+#define CTL2_ALLVECTOR                   0x00000400u


 #define CTL2_NL_SET                      0x40000000u  /* Informational */
 #define CTL2_BSR_SET                     0x80000000u  /* Informational */
@@ -532,6 +533,7 @@
 #define CTL2_ALLPD (CTL2_SUBSTITUTE_CALLOUT|\
                     CTL2_SUBSTITUTE_EXTENDED|\
                     CTL2_SUBSTITUTE_LITERAL|\
+                    CTL2_SUBSTITUTE_MATCHED|\
                     CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
                     CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
                     CTL2_SUBSTITUTE_UNSET_EMPTY|\
@@ -721,6 +723,7 @@
   { "substitute_callout",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_CALLOUT,    PO(control2) },
   { "substitute_extended",        MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_EXTENDED,   PO(control2) },
   { "substitute_literal",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_LITERAL,    PO(control2) },
+  { "substitute_matched",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_MATCHED,    PO(control2) },
   { "substitute_overflow_length", MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
   { "substitute_skip",            MOD_PND,  MOD_INT, 0,                          PO(substitute_skip) },
   { "substitute_stop",            MOD_PND,  MOD_INT, 0,                          PO(substitute_stop) },
@@ -4088,7 +4091,7 @@
 static void
 show_controls(uint32_t controls, uint32_t controls2, const char *before)
 {
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
   before,
   ((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
   ((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@@ -4127,6 +4130,7 @@
   ((controls2 & CTL2_SUBSTITUTE_CALLOUT) != 0)? " substitute_callout" : "",
   ((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
   ((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
+  ((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
   ((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
   ((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
   ((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
@@ -7232,6 +7236,7 @@
   uint8_t rbuffer[REPLACE_BUFFSIZE];
   uint8_t nbuffer[REPLACE_BUFFSIZE];
   uint32_t xoptions;
+  uint32_t emoption;  /* External match option */
   PCRE2_SIZE j, rlen, nsize, erroroffset;
   BOOL badutf = FALSE;


@@ -7252,11 +7257,25 @@

   if (timeitm)
     fprintf(outfile, "** Timing is not supported with replace: ignored\n");
-
+    
   if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
     fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");


-  xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
+  /* Check for a test that does substitution after an initial external match. 
+  If this is set, we run the external match, but leave the interpretation of 
+  its output to pcre2_substitute(). */
+
+  emoption = ((dat_datctl.control2 & CTL2_SUBSTITUTE_MATCHED) == 0)? 0 :
+    PCRE2_SUBSTITUTE_MATCHED;
+     
+  if (emoption != 0)
+    {
+    PCRE2_MATCH(rc, compiled_code, pp, arg_ulen, dat_datctl.offset,
+      dat_datctl.options, match_data, use_dat_context);
+    }
+
+  xoptions = emoption |
+             (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
                 PCRE2_SUBSTITUTE_GLOBAL) |
              (((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
                 PCRE2_SUBSTITUTE_EXTENDED) |
@@ -7268,7 +7287,7 @@
                 PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
              (((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
                 PCRE2_SUBSTITUTE_UNSET_EMPTY);
-                
+
   SETCASTPTR(r, rbuffer);  /* Sets r8, r16, or r32, as appropriate. */
   pr = dat_datctl.replacement;



Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/testdata/testinput2    2019-12-27 13:35:17 UTC (rev 1196)
@@ -4640,7 +4640,14 @@


 /(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
     aaBB
+    
+/abcd/replace=wxyz,substitute_matched
+    abcd
+    pqrs 


+/abcd/g
+    >abcd1234abcd5678<\=replace=wxyz,substitute_matched
+
 /^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I


/((p(?'K/

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2019-12-26 15:10:26 UTC (rev 1195)
+++ code/trunk/testdata/testoutput2    2019-12-27 13:35:17 UTC (rev 1196)
@@ -14859,7 +14859,17 @@
 /(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
     aaBB
  1: AAbbaa..AAbBaa
+    
+/abcd/replace=wxyz,substitute_matched
+    abcd
+ 1: wxyz
+    pqrs 
+ 0: pqrs


+/abcd/g
+    >abcd1234abcd5678<\=replace=wxyz,substitute_matched
+ 2: >wxyz1234wxyz5678<
+
 /^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
 Capture group count = 2
 Max back reference = 1