[Pcre-svn] [1206] code/trunk: Implement PCRE2_SUBSTITUTE_RE…

Kezdőlap
Üzenet törlése
Szerző: Subversion repository
Dátum:  
Címzett: pcre-svn
Tárgy: [Pcre-svn] [1206] code/trunk: Implement PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
Revision: 1206
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1206
Author:   ph10
Date:     2020-01-22 17:50:12 +0000 (Wed, 22 Jan 2020)
Log Message:
-----------
Implement PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/html/pcre2_substitute.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_substitute.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2test.1
    code/trunk/doc/pcre2test.txt
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_substitute.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/ChangeLog    2020-01-22 17:50:12 UTC (rev 1206)
@@ -11,14 +11,14 @@
 3. A JIT bug is fixed which allowed to read the fields of the compiled
 pattern before its existence is checked.


-4. Back in the PCRE1 day, capturing groups that contained recursive back
-references to themselves were made atomic (version 8.01, change 18) because
-after the end a repeated group, the captured substrings had their values from
-the final repetition, not from an earlier repetition that might be the
-destination of a backtrack. This feature was documented, and was carried over
-into PCRE2. However, it has now been realized that the major refactoring that
-was done for 10.30 has made this atomicizing unnecessary, and it is confusing
-when users are unaware of it, making some patterns appear not to be working as
+4. Back in the PCRE1 day, capturing groups that contained recursive back
+references to themselves were made atomic (version 8.01, change 18) because
+after the end a repeated group, the captured substrings had their values from
+the final repetition, not from an earlier repetition that might be the
+destination of a backtrack. This feature was documented, and was carried over
+into PCRE2. However, it has now been realized that the major refactoring that
+was done for 10.30 has made this atomicizing unnecessary, and it is confusing
+when users are unaware of it, making some patterns appear not to be working as
expected. Capture values of recursive back references in repeated groups are
now correctly backtracked, so this unnecessary restriction has been removed.

@@ -28,20 +28,22 @@

7. Added PCRE2_SUBSTITUTE_MATCHED.

-8. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another
-regex engine. The Perl regex folks are aware of this usage and have made a note
+8. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another
+regex engine. The Perl regex folks are aware of this usage and have made a note
about it.

-9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
+9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
1, believing that repeating an assertion is pointless. However, if a positive
-assertion contains capturing groups, repetition can be useful. In any case, an
-assertion could always be wrapped in a repeated group. The only restriction
-that is now imposed is that an unlimited maximum is changed to one more than
+assertion contains capturing groups, repetition can be useful. In any case, an
+assertion could always be wrapped in a repeated group. The only restriction
+that is now imposed is that an unlimited maximum is changed to one more than
the minimum.

10. Fix *THEN verbs in lookahead assertions in JIT.

+11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.

+
Version 10.34 21-November-2019
------------------------------


Modified: code/trunk/doc/html/pcre2_substitute.html
===================================================================
--- code/trunk/doc/html/pcre2_substitute.html    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/html/pcre2_substitute.html    2020-01-22 17:50:12 UTC (rev 1206)
@@ -82,6 +82,7 @@
   PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal
   PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match
   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
+  PCRE2_SUBSTITUTE_REPLACEMENT_ONLY  Return only replacement string(s) 
   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
   PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
 </pre>


Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/html/pcre2api.html    2020-01-22 17:50:12 UTC (rev 1206)
@@ -3305,10 +3305,11 @@
 This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
 subject string in <i>outputbuffer</i>, replacing parts that were matched with
 the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
-can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
-is to perform just one replacement if the pattern matches, but there is an
-option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
-for details).
+can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
+option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
+replacement string(s). The default action is to perform just one replacement if
+the pattern matches, but there is an option that requests multiple replacements
+(see PCRE2_SUBSTITUTE_GLOBAL below for details).
 </P>
 <P>
 If successful, <b>pcre2_substitute()</b> returns the number of substitutions
@@ -3349,12 +3350,21 @@
 having to repeat the match.
 </P>
 <P>
-The <i>code</i> argument is not used for the first substitution, but if
-PCRE2_SUBSTITUTE_GLOBAL is set, <b>pcre2_match()</b> will be called after the
-first substitution to check for further matches, and the contents of the
-<i>match_data</i> block will be changed.
+The <i>code</i> argument is not used for the first substitution when
+PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
+<b>pcre2_match()</b> will be called after the first substitution to check for
+further matches, and the contents of the <i>match_data</i> block will be
+changed.
 </P>
 <P>
+The default is to return a copy of the subject string with matched substrings 
+replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the 
+replacement substrings are returned. In the global case, multiple replacements 
+are concatenated in the output buffer. Substitution callouts (see
+<a href="#subcallouts">below)</a>
+can be used to separate them if necessary.
+</P>
+<P>
 The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
 variable that contains the length, in code units, of the output buffer. If the
 function is successful, the value is updated to contain the length of the new
@@ -3560,7 +3570,7 @@
 obtained by calling the <b>pcre2_get_error_message()</b> function (see
 "Obtaining a textual error message"
 <a href="#geterrormessage">above).</a>
-</P>
+<a name="subcallouts"></a></P>
 <br><b>
 Substitution callouts
 </b><br>
@@ -3897,9 +3907,9 @@
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 27 December 2019
+Last updated: 22 January 2020
 <br>
-Copyright &copy; 1997-2019 University of Cambridge.
+Copyright &copy; 1997-2020 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/html/pcre2test.html    2020-01-22 17:50:12 UTC (rev 1206)
@@ -1050,25 +1050,27 @@
 processed with that pattern. These modifiers do not affect the compilation
 process.
 <pre>
-      aftertext                  show text after match
-      allaftertext               show text after captures
-      allcaptures                show all captures
-      allvector                  show the entire ovector
-      allusedtext                show all consulted text
-      altglobal                  alternative global matching
-  /g  global                     global matching
-      jitstack=&#60;n&#62;               set size of JIT stack
-      mark                       show mark values
-      replace=&#60;string&#62;           specify a replacement string
-      startchar                  show starting character when relevant
-      substitute_callout         use substitution callouts
-      substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
-      substitute_literal         use PCRE2_SUBSTITUTE_LITERAL 
-      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
-      substitute_skip=&#60;n&#62;        skip substitution number n
-      substitute_stop=&#60;n&#62;        skip substitution number n and greater
-      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
-      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+      aftertext                   show text after match
+      allaftertext                show text after captures
+      allcaptures                 show all captures
+      allvector                   show the entire ovector
+      allusedtext                 show all consulted text
+      altglobal                   alternative global matching
+  /g  global                      global matching
+      jitstack=&#60;n&#62;                set size of JIT stack
+      mark                        show mark values
+      replace=&#60;string&#62;            specify a replacement string
+      startchar                   show starting character when relevant
+      substitute_callout          use substitution callouts
+      substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
+      substitute_matched          use PCRE2_SUBSTITUTE_MATCHED  
+      substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 
+      substitute_skip=&#60;n&#62;         skip substitution &#60;n&#62;
+      substitute_stop=&#60;n&#62;         skip substitution &#60;n&#62; and following
+      substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY
 </pre>
 These modifiers may not appear in a <b>#pattern</b> command. If you want them as
 defaults, set them in a <b>#subject</b> command.
@@ -1235,7 +1237,9 @@
       substitute_callout         use substitution callouts
       substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
       substitute_literal         use PCRE2_SUBSTITUTE_LITERAL 
+      substitute_matched         use PCRE2_SUBSTITUTE_MATCHED 
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 
       substitute_skip=&#60;n&#62;        skip substitution number n
       substitute_stop=&#60;n&#62;        skip substitution number n and greater
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@@ -1397,9 +1401,10 @@
 </b><br>
 <P>
 If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
-called instead of one of the matching functions. Note that replacement strings
-cannot contain commas, because a comma signifies the end of a modifier. This is
-not thought to be an issue in a test program.
+called instead of one of the matching functions (or after one call of 
+<b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
+replacement strings cannot contain commas, because a comma signifies the end of
+a modifier. This is not thought to be an issue in a test program.
 </P>
 <P>
 Unlike subject strings, <b>pcre2test</b> does not process replacement strings
@@ -1416,11 +1421,15 @@
   global                      PCRE2_SUBSTITUTE_GLOBAL
   substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
   substitute_literal          PCRE2_SUBSTITUTE_LITERAL 
+  substitute_matched          PCRE2_SUBSTITUTE_MATCHED 
   substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+  substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 
   substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
   substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
-
-</PRE>
+</pre>
+See the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+documentation for details of these options.
 </P>
 <P>
 After a successful substitution, the modified string is output, preceded by the
@@ -2096,9 +2105,9 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 26 December 2019
+Last updated: 22 January 2020
 <br>
-Copyright &copy; 1997-2019 University of Cambridge.
+Copyright &copy; 1997-2020 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/pcre2.txt    2020-01-22 17:50:12 UTC (rev 1206)
@@ -3196,10 +3196,12 @@
        This function optionally calls pcre2_match() and then makes a  copy  of
        the  subject  string in outputbuffer, replacing parts that were matched
        with the replacement string, whose length is supplied in rlength.  This
-       can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The
-       default is to perform just one replacement if the pattern matches,  but
-       there  is an option that requests multiple replacements (see PCRE2_SUB-
-       STITUTE_GLOBAL below for details).
+       can  be  given  as  PCRE2_ZERO_TERMINATED for a zero-terminated string.
+       There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
+       turn  just  the replacement string(s). The default action is to perform
+       just one replacement if the pattern matches, but  there  is  an  option
+       that  requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
+       for details).


        If successful, pcre2_substitute() returns the number  of  substitutions
        that  were  carried out. This may be zero if no match was found, and is
@@ -3234,35 +3236,42 @@
        application to check for a match before choosing to substitute, without
        having to repeat the match.


-       The  code  argument  is  not  used  for  the first substitution, but if
-       PCRE2_SUBSTITUTE_GLOBAL is set, pcre2_match() will be called after  the
-       first  substitution  to  check for further matches, and the contents of
-       the match_data block will be changed.
+       The  code  argument  is  not  used  for  the  first  substitution  when
+       PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also
+       set, pcre2_match() will be called after the first substitution to check
+       for further matches, and the contents of the match_data block  will  be
+       changed.


-       The outlengthptr argument of pcre2_substitute() must point to  a  vari-
-       able  that contains the length, in code units, of the output buffer. If
-       the function is successful, the value is updated to contain the  length
-       of  the  new  string, excluding the trailing zero that is automatically
+       The default is to return a copy of the subject string with matched sub-
+       strings replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set,
+       only  the replacement substrings are returned. In the global case, mul-
+       tiple replacements are concatenated in the output buffer.  Substitution
+       callouts (see below) can be used to separate them if necessary.
+
+       The  outlengthptr  argument of pcre2_substitute() must point to a vari-
+       able that contains the length, in code units, of the output buffer.  If
+       the  function is successful, the value is updated to contain the length
+       of the new string, excluding the trailing zero  that  is  automatically
        added.


-       If the function is not successful, the value set via  outlengthptr  de-
-       pends  on  the  type  of  error.  For  syntax errors in the replacement
+       If  the  function is not successful, the value set via outlengthptr de-
+       pends on the type of  error.  For  syntax  errors  in  the  replacement
        string, the value is the offset in the replacement string where the er-
-       ror  was  detected.  For  other errors, the value is PCRE2_UNSET by de-
+       ror was detected. For other errors, the value  is  PCRE2_UNSET  by  de-
        fault. This includes the case of the output buffer being too small, un-
        less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
-       the value is the minimum length needed, including space for the  trail-
+       the  value is the minimum length needed, including space for the trail-
        ing zero. Note that in order to compute the required length, pcre2_sub-
        stitute() has to simulate all the matching and copying, instead of giv-
        ing an error return as soon as the buffer overflows. Note also that the
        length is in code units, not bytes.


-       The replacement string, which is interpreted as a  UTF  string  in  UTF
-       mode,  is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
+       The  replacement  string,  which  is interpreted as a UTF string in UTF
+       mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK  option
        is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
        preted in any way. By default, however, a dollar character is an escape
-       character that can specify the insertion  of  characters  from  capture
-       groups  and  names  from (*MARK) or other control verbs in the pattern.
+       character  that  can  specify  the insertion of characters from capture
+       groups and names from (*MARK) or other control verbs  in  the  pattern.
        The following forms are always recognized:


          $$                  insert a dollar character
@@ -3269,18 +3278,18 @@
          $<n> or ${<n>}      insert the contents of group <n>
          $*MARK or ${*MARK}  insert a control verb name


-       Either a group number or a group name  can  be  given  for  <n>.  Curly
-       brackets  are  required only if the following character would be inter-
+       Either  a  group  number  or  a  group name can be given for <n>. Curly
+       brackets are required only if the following character would  be  inter-
        preted as part of the number or name. The number may be zero to include
-       the  entire  matched  string.   For  example,  if  the pattern a(b)c is
-       matched with "=abc=" and the replacement string "+$1$0$1+", the  result
+       the entire matched string.   For  example,  if  the  pattern  a(b)c  is
+       matched  with "=abc=" and the replacement string "+$1$0$1+", the result
        is "=+babcb+=".


-       $*MARK  inserts the name from the last encountered backtracking control
-       verb on the matching path that has a name. (*MARK) must always  include
-       a  name,  but  the  other  verbs  need not. For example, in the case of
+       $*MARK inserts the name from the last encountered backtracking  control
+       verb  on the matching path that has a name. (*MARK) must always include
+       a name, but the other verbs need not.  For  example,  in  the  case  of
        (*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
-       the  relevant  name is "B". This facility can be used to perform simple
+       the relevant name is "B". This facility can be used to  perform  simple
        simultaneous substitutions, as this pcre2test example shows:


          /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
@@ -3288,15 +3297,15 @@
           2: pear orange


        PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
-       string,  replacing every matching substring. If this option is not set,
-       only the first matching substring is replaced. The search  for  matches
-       takes  place in the original subject string (that is, previous replace-
-       ments do not affect it).  Iteration is  implemented  by  advancing  the
-       startoffset  value  for  each search, which is always passed the entire
+       string, replacing every matching substring. If this option is not  set,
+       only  the  first matching substring is replaced. The search for matches
+       takes place in the original subject string (that is, previous  replace-
+       ments  do  not  affect  it).  Iteration is implemented by advancing the
+       startoffset value for each search, which is always  passed  the  entire
        subject string. If an offset limit is set in the match context, search-
        ing stops when that limit is reached.


-       You  can  restrict  the effect of a global substitution to a portion of
+       You can restrict the effect of a global substitution to  a  portion  of
        the subject string by setting either or both of startoffset and an off-
        set limit. Here is a pcre2test example:


@@ -3304,87 +3313,87 @@
          ABC ABC ABC ABC\=offset=3,offset_limit=12
           2: ABC A!C A!C ABC


-       When  continuing  with  global substitutions after matching a substring
+       When continuing with global substitutions after  matching  a  substring
        with zero length, an attempt to find a non-empty match at the same off-
        set is performed.  If this is not successful, the offset is advanced by
        one character except when CRLF is a valid newline sequence and the next
-       two  characters are CR, LF. In this case, the offset is advanced by two
+       two characters are CR, LF. In this case, the offset is advanced by  two
        characters.


-       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when  the  output
+       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  changes  what happens when the output
        buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
-       ORY immediately. If this option  is  set,  however,  pcre2_substitute()
+       ORY  immediately.  If  this  option is set, however, pcre2_substitute()
        continues to go through the motions of matching and substituting (with-
-       out, of course, writing anything) in order to compute the size of  buf-
-       fer  that  is  needed.  This  value is passed back via the outlengthptr
-       variable, with  the  result  of  the  function  still  being  PCRE2_ER-
+       out,  of course, writing anything) in order to compute the size of buf-
+       fer that is needed. This value is  passed  back  via  the  outlengthptr
+       variable,  with  the  result  of  the  function  still  being PCRE2_ER-
        ROR_NOMEMORY.


-       Passing  a  buffer  size  of zero is a permitted way of finding out how
-       much memory is needed for given substitution. However, this  does  mean
+       Passing a buffer size of zero is a permitted way  of  finding  out  how
+       much  memory  is needed for given substitution. However, this does mean
        that the entire operation is carried out twice. Depending on the appli-
-       cation, it may be more efficient to allocate a large  buffer  and  free
-       the   excess   afterwards,   instead  of  using  PCRE2_SUBSTITUTE_OVER-
+       cation,  it  may  be more efficient to allocate a large buffer and free
+       the  excess  afterwards,  instead   of   using   PCRE2_SUBSTITUTE_OVER-
        FLOW_LENGTH.


        PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
        do not appear in the pattern to be treated as unset groups. This option
-       should be used with care, because it means that a typo in a group  name
+       should  be used with care, because it means that a typo in a group name
        or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.


        PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
-       known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be  treated
-       as  empty  strings  when inserted as described above. If this option is
+       known  groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
+       as empty strings when inserted as described above. If  this  option  is
        not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
-       SET  error.  This  option  does not influence the extended substitution
+       SET error. This option does not  influence  the  extended  substitution
        syntax described below.


-       PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to  the
-       replacement  string.  Without this option, only the dollar character is
-       special, and only the group insertion forms  listed  above  are  valid.
+       PCRE2_SUBSTITUTE_EXTENDED  causes extra processing to be applied to the
+       replacement string. Without this option, only the dollar  character  is
+       special,  and  only  the  group insertion forms listed above are valid.
        When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:


-       Firstly,  backslash in a replacement string is interpreted as an escape
+       Firstly, backslash in a replacement string is interpreted as an  escape
        character. The usual forms such as \n or \x{ddd} can be used to specify
-       particular  character codes, and backslash followed by any non-alphanu-
-       meric character quotes that character. Extended quoting  can  be  coded
+       particular character codes, and backslash followed by any  non-alphanu-
+       meric  character  quotes  that character. Extended quoting can be coded
        using \Q...\E, exactly as in pattern strings.


-       There  are  also four escape sequences for forcing the case of inserted
-       letters.  The insertion mechanism has three states:  no  case  forcing,
+       There are also four escape sequences for forcing the case  of  inserted
+       letters.   The  insertion  mechanism has three states: no case forcing,
        force upper case, and force lower case. The escape sequences change the
        current state: \U and \L change to upper or lower case forcing, respec-
-       tively,  and  \E (when not terminating a \Q quoted sequence) reverts to
-       no case forcing. The sequences \u and \l force the next  character  (if
-       it  is  a  letter)  to  upper or lower case, respectively, and then the
+       tively, and \E (when not terminating a \Q quoted sequence)  reverts  to
+       no  case  forcing. The sequences \u and \l force the next character (if
+       it is a letter) to upper or lower  case,  respectively,  and  then  the
        state automatically reverts to no case forcing. Case forcing applies to
-       all  inserted  characters, including those from capture groups and let-
+       all inserted  characters, including those from capture groups and  let-
        ters within \Q...\E quoted sequences.


        Note that case forcing sequences such as \U...\E do not nest. For exam-
-       ple,  the  result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
-       \E has no effect. Note  also  that  the  PCRE2_ALT_BSUX  and  PCRE2_EX-
+       ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc";  the  final
+       \E  has  no  effect.  Note  also  that the PCRE2_ALT_BSUX and PCRE2_EX-
        TRA_ALT_BSUX options do not apply to replacement strings.


-       The  second  effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
-       flexibility to capture group substitution. The  syntax  is  similar  to
+       The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to  add  more
+       flexibility  to  capture  group  substitution. The syntax is similar to
        that used by Bash:


          ${<n>:-<string>}
          ${<n>:+<string1>:<string2>}


-       As  before,  <n> may be a group number or a name. The first form speci-
-       fies a default value. If group <n> is set, its value  is  inserted;  if
-       not,  <string>  is  expanded  and  the result inserted. The second form
-       specifies strings that are expanded and inserted when group <n> is  set
-       or  unset,  respectively. The first form is just a convenient shorthand
+       As before, <n> may be a group number or a name. The first  form  speci-
+       fies  a  default  value. If group <n> is set, its value is inserted; if
+       not, <string> is expanded and the  result  inserted.  The  second  form
+       specifies  strings that are expanded and inserted when group <n> is set
+       or unset, respectively. The first form is just a  convenient  shorthand
        for


          ${<n>:+${<n>}:<string>}


-       Backslash can be used to escape colons and closing  curly  brackets  in
-       the  replacement  strings.  A change of the case forcing state within a
-       replacement string remains  in  force  afterwards,  as  shown  in  this
+       Backslash  can  be  used to escape colons and closing curly brackets in
+       the replacement strings. A change of the case forcing  state  within  a
+       replacement  string  remains  in  force  afterwards,  as  shown in this
        pcre2test example:


          /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
@@ -3393,8 +3402,8 @@
              somebody
           1: HELLO


-       The  PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
-       substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does  cause  un-
+       The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these  extended
+       substitutions.  However,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
        known groups in the extended syntax forms to be treated as unset.


        If  PCRE2_SUBSTITUTE_LITERAL  is  set,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
@@ -3403,8 +3412,8 @@


    Substitution errors


-       In  the  event of an error, pcre2_substitute() returns a negative error
-       code. Except for PCRE2_ERROR_NOMATCH (which is never returned),  errors
+       In the event of an error, pcre2_substitute() returns a  negative  error
+       code.  Except for PCRE2_ERROR_NOMATCH (which is never returned), errors
        from pcre2_match() are passed straight back.


        PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
@@ -3411,29 +3420,29 @@
        tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.


        PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
-       ing  an  unknown  substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
-       when the simple (non-extended) syntax is used and  PCRE2_SUBSTITUTE_UN-
+       ing an unknown substring when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)
+       when  the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
        SET_EMPTY is not set.


-       PCRE2_ERROR_NOMEMORY  is  returned  if  the  output  buffer  is not big
+       PCRE2_ERROR_NOMEMORY is returned  if  the  output  buffer  is  not  big
        enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
-       of  buffer  that is needed is returned via outlengthptr. Note that this
+       of buffer that is needed is returned via outlengthptr. Note  that  this
        does not happen by default.


        PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
        match_data argument is NULL.


-       PCRE2_ERROR_BADREPLACEMENT  is  used for miscellaneous syntax errors in
-       the replacement string, with more  particular  errors  being  PCRE2_ER-
+       PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax  errors  in
+       the  replacement  string,  with  more particular errors being PCRE2_ER-
        ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
-       (closing curly bracket not found), PCRE2_ERROR_BADSUBSTITUTION  (syntax
-       error  in  extended group substitution), and PCRE2_ERROR_BADSUBSPATTERN
+       (closing  curly bracket not found), PCRE2_ERROR_BADSUBSTITUTION (syntax
+       error in extended group substitution),  and  PCRE2_ERROR_BADSUBSPATTERN
        (the pattern match ended before it started or the match started earlier
-       than  the  current  position  in the subject, which can happen if \K is
+       than the current position in the subject, which can  happen  if  \K  is
        used in an assertion).


        As for all PCRE2 errors, a text message that describes the error can be
-       obtained  by  calling  the pcre2_get_error_message() function (see "Ob-
+       obtained by calling the pcre2_get_error_message()  function  (see  "Ob-
        taining a textual error message" above).


    Substitution callouts
@@ -3442,15 +3451,15 @@
          int (*callout_function)(pcre2_substitute_callout_block *, void *),
          void *callout_data);


-       The pcre2_set_substitution_callout() function can be used to specify  a
-       callout  function for pcre2_substitute(). This information is passed in
+       The  pcre2_set_substitution_callout() function can be used to specify a
+       callout function for pcre2_substitute(). This information is passed  in
        a match context. The callout function is called after each substitution
        has been processed, but it can cause the replacement not to happen. The
-       callout function is not called for simulated substitutions that  happen
+       callout  function is not called for simulated substitutions that happen
        as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.


        The first argument of the callout function is a pointer to a substitute
-       callout block structure, which contains the following fields, not  nec-
+       callout  block structure, which contains the following fields, not nec-
        essarily in this order:


          uint32_t    version;
@@ -3461,9 +3470,9 @@
          uint32_t    oveccount;
          PCRE2_SIZE  output_offsets[2];


-       The  version field contains the version number of the block format. The
-       current version is 0. The version number will  increase  in  future  if
-       more  fields are added, but the intention is never to remove any of the
+       The version field contains the version number of the block format.  The
+       current  version  is  0.  The version number will increase in future if
+       more fields are added, but the intention is never to remove any of  the
        existing fields.


        The subscount field is the number of the current match. It is 1 for the
@@ -3470,25 +3479,25 @@
        first callout, 2 for the second, and so on. The input and output point-
        ers are copies of the values passed to pcre2_substitute().


-       The ovector field points to the ovector, which contains the  result  of
+       The  ovector  field points to the ovector, which contains the result of
        the most recent match. The oveccount field contains the number of pairs
        that are set in the ovector, and is always greater than zero.


-       The output_offsets vector contains the offsets of  the  replacement  in
-       the  output  string. This has already been processed for dollar and (if
+       The  output_offsets  vector  contains the offsets of the replacement in
+       the output string. This has already been processed for dollar  and  (if
        requested) backslash substitutions as described above.


-       The second argument of the callout function  is  the  value  passed  as
-       callout_data  when  the  function was registered. The value returned by
+       The  second  argument  of  the  callout function is the value passed as
+       callout_data when the function was registered. The  value  returned  by
        the callout function is interpreted as follows:


-       If the value is zero, the replacement is accepted, and,  if  PCRE2_SUB-
-       STITUTE_GLOBAL  is set, processing continues with a search for the next
-       match. If the value is not zero, the current  replacement  is  not  ac-
-       cepted.  If  the  value is greater than zero, processing continues when
-       PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than  zero
-       or  PCRE2_SUBSTITUTE_GLOBAL  is  not set), the the rest of the input is
-       copied to the output and the call to pcre2_substitute() exits,  return-
+       If  the  value is zero, the replacement is accepted, and, if PCRE2_SUB-
+       STITUTE_GLOBAL is set, processing continues with a search for the  next
+       match.  If  the  value  is not zero, the current replacement is not ac-
+       cepted. If the value is greater than zero,  processing  continues  when
+       PCRE2_SUBSTITUTE_GLOBAL  is set. Otherwise (the value is less than zero
+       or PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of  the  input  is
+       copied  to the output and the call to pcre2_substitute() exits, return-
        ing the number of matches so far.



@@ -3497,56 +3506,56 @@
        int pcre2_substring_nametable_scan(const pcre2_code *code,
          PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);


-       When  a  pattern  is compiled with the PCRE2_DUPNAMES option, names for
-       capture groups are not required to be unique. Duplicate names  are  al-
-       ways  allowed for groups with the same number, created by using the (?|
+       When a pattern is compiled with the PCRE2_DUPNAMES  option,  names  for
+       capture  groups  are not required to be unique. Duplicate names are al-
+       ways allowed for groups with the same number, created by using the  (?|
        feature. Indeed, if such groups are named, they are required to use the
        same names.


-       Normally,  patterns  that  use duplicate names are such that in any one
-       match, only one of each set of identically-named  groups  participates.
+       Normally, patterns that use duplicate names are such that  in  any  one
+       match,  only  one of each set of identically-named groups participates.
        An example is shown in the pcre2pattern documentation.


-       When   duplicates   are   present,   pcre2_substring_copy_byname()  and
-       pcre2_substring_get_byname() return the first  substring  corresponding
-       to  the given name that is set. Only if none are set is PCRE2_ERROR_UN-
-       SET is returned. The  pcre2_substring_number_from_name()  function  re-
-       turns  the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate
+       When  duplicates   are   present,   pcre2_substring_copy_byname()   and
+       pcre2_substring_get_byname()  return  the first substring corresponding
+       to the given name that is set. Only if none are set is  PCRE2_ERROR_UN-
+       SET  is  returned.  The pcre2_substring_number_from_name() function re-
+       turns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are  duplicate
        names.


-       If you want to get full details of all captured substrings for a  given
-       name,  you  must use the pcre2_substring_nametable_scan() function. The
-       first argument is the compiled pattern, and the second is the name.  If
-       the  third  and fourth arguments are NULL, the function returns a group
+       If  you want to get full details of all captured substrings for a given
+       name, you must use the pcre2_substring_nametable_scan()  function.  The
+       first  argument is the compiled pattern, and the second is the name. If
+       the third and fourth arguments are NULL, the function returns  a  group
        number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.


        When the third and fourth arguments are not NULL, they must be pointers
-       to  variables  that are updated by the function. After it has run, they
+       to variables that are updated by the function. After it has  run,  they
        point to the first and last entries in the name-to-number table for the
-       given  name,  and the function returns the length of each entry in code
-       units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there  are
+       given name, and the function returns the length of each entry  in  code
+       units.  In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
        no entries for the given name.


        The format of the name table is described above in the section entitled
-       Information about a pattern. Given all the  relevant  entries  for  the
-       name,  you  can  extract  each of their numbers, and hence the captured
+       Information  about  a  pattern.  Given all the relevant entries for the
+       name, you can extract each of their numbers,  and  hence  the  captured
        data.



FINDING ALL POSSIBLE MATCHES AT ONE POSITION

-       The traditional matching function uses a  similar  algorithm  to  Perl,
-       which  stops when it finds the first match at a given point in the sub-
+       The  traditional  matching  function  uses a similar algorithm to Perl,
+       which stops when it finds the first match at a given point in the  sub-
        ject. If you want to find all possible matches, or the longest possible
-       match  at  a  given  position,  consider using the alternative matching
-       function (see below) instead. If you cannot use the  alternative  func-
+       match at a given position,  consider  using  the  alternative  matching
+       function  (see  below) instead. If you cannot use the alternative func-
        tion, you can kludge it up by making use of the callout facility, which
        is described in the pcre2callout documentation.


        What you have to do is to insert a callout right at the end of the pat-
-       tern.   When your callout function is called, extract and save the cur-
-       rent matched substring. Then return 1, which  forces  pcre2_match()  to
-       backtrack  and  try other alternatives. Ultimately, when it runs out of
+       tern.  When your callout function is called, extract and save the  cur-
+       rent  matched  substring.  Then return 1, which forces pcre2_match() to
+       backtrack and try other alternatives. Ultimately, when it runs  out  of
        matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.



@@ -3558,26 +3567,26 @@
          pcre2_match_context *mcontext,
          int *workspace, PCRE2_SIZE wscount);


-       The function pcre2_dfa_match() is called  to  match  a  subject  string
-       against  a  compiled pattern, using a matching algorithm that scans the
+       The  function  pcre2_dfa_match()  is  called  to match a subject string
+       against a compiled pattern, using a matching algorithm that  scans  the
        subject string just once (not counting lookaround assertions), and does
-       not  backtrack.  This has different characteristics to the normal algo-
-       rithm, and is not compatible with Perl. Some of the features  of  PCRE2
-       patterns  are  not  supported.  Nevertheless, there are times when this
-       kind of matching can be useful. For a discussion of  the  two  matching
+       not backtrack.  This has different characteristics to the normal  algo-
+       rithm,  and  is not compatible with Perl. Some of the features of PCRE2
+       patterns are not supported.  Nevertheless, there are  times  when  this
+       kind  of  matching  can be useful. For a discussion of the two matching
        algorithms, and a list of features that pcre2_dfa_match() does not sup-
        port, see the pcre2matching documentation.


-       The arguments for the pcre2_dfa_match() function are the  same  as  for
+       The  arguments  for  the pcre2_dfa_match() function are the same as for
        pcre2_match(), plus two extras. The ovector within the match data block
        is used in a different way, and this is described below. The other com-
-       mon  arguments  are used in the same way as for pcre2_match(), so their
+       mon arguments are used in the same way as for pcre2_match(),  so  their
        description is not repeated here.


-       The two additional arguments provide workspace for  the  function.  The
-       workspace  vector  should  contain at least 20 elements. It is used for
+       The  two  additional  arguments provide workspace for the function. The
+       workspace vector should contain at least 20 elements. It  is  used  for
        keeping  track  of  multiple  paths  through  the  pattern  tree.  More
-       workspace  is needed for patterns and subjects where there are a lot of
+       workspace is needed for patterns and subjects where there are a lot  of
        potential matches.


        Here is an example of a simple call to pcre2_dfa_match():
@@ -3597,45 +3606,45 @@


    Option bits for pcre_dfa_match()


-       The unused bits of the options argument for pcre2_dfa_match()  must  be
-       zero.   The   only   bits   that   may   be   set  are  PCRE2_ANCHORED,
-       PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL,  PCRE2_NO-
+       The  unused  bits of the options argument for pcre2_dfa_match() must be
+       zero.  The  only   bits   that   may   be   set   are   PCRE2_ANCHORED,
+       PCRE2_COPY_MATCHED_SUBJECT,  PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO-
        TEOL,   PCRE2_NOTEMPTY,   PCRE2_NOTEMPTY_ATSTART,   PCRE2_NO_UTF_CHECK,
-       PCRE2_PARTIAL_HARD,   PCRE2_PARTIAL_SOFT,    PCRE2_DFA_SHORTEST,    and
-       PCRE2_DFA_RESTART.  All but the last four of these are exactly the same
+       PCRE2_PARTIAL_HARD,    PCRE2_PARTIAL_SOFT,    PCRE2_DFA_SHORTEST,   and
+       PCRE2_DFA_RESTART. All but the last four of these are exactly the  same
        as for pcre2_match(), so their description is not repeated here.


          PCRE2_PARTIAL_HARD
          PCRE2_PARTIAL_SOFT


-       These have the same general effect as they do  for  pcre2_match(),  but
-       the  details are slightly different. When PCRE2_PARTIAL_HARD is set for
-       pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if  the  end  of  the
+       These  have  the  same general effect as they do for pcre2_match(), but
+       the details are slightly different. When PCRE2_PARTIAL_HARD is set  for
+       pcre2_dfa_match(),  it  returns  PCRE2_ERROR_PARTIAL  if the end of the
        subject is reached and there is still at least one matching possibility
        that requires additional characters. This happens even if some complete
-       matches  have  already  been found. When PCRE2_PARTIAL_SOFT is set, the
-       return code PCRE2_ERROR_NOMATCH is converted  into  PCRE2_ERROR_PARTIAL
-       if  the  end  of  the  subject  is reached, there have been no complete
+       matches have already been found. When PCRE2_PARTIAL_SOFT  is  set,  the
+       return  code  PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
+       if the end of the subject is  reached,  there  have  been  no  complete
        matches, but there is still at least one matching possibility. The por-
-       tion  of  the  string that was inspected when the longest partial match
+       tion of the string that was inspected when the  longest  partial  match
        was found is set as the first matching string in both cases. There is a
-       more  detailed  discussion  of partial and multi-segment matching, with
+       more detailed discussion of partial and  multi-segment  matching,  with
        examples, in the pcre2partial documentation.


          PCRE2_DFA_SHORTEST


-       Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm  to
+       Setting  the PCRE2_DFA_SHORTEST option causes the matching algorithm to
        stop as soon as it has found one match. Because of the way the alterna-
-       tive algorithm works, this is necessarily the shortest  possible  match
+       tive  algorithm  works, this is necessarily the shortest possible match
        at the first possible matching point in the subject string.


          PCRE2_DFA_RESTART


-       When  pcre2_dfa_match() returns a partial match, it is possible to call
+       When pcre2_dfa_match() returns a partial match, it is possible to  call
        it again, with additional subject characters, and have it continue with
        the same match. The PCRE2_DFA_RESTART option requests this action; when
-       it is set, the workspace and wscount options must  reference  the  same
-       vector  as  before  because data about the match so far is left in them
+       it  is  set,  the workspace and wscount options must reference the same
+       vector as before because data about the match so far is  left  in  them
        after a partial match. There is more discussion of this facility in the
        pcre2partial documentation.


@@ -3643,8 +3652,8 @@

        When pcre2_dfa_match() succeeds, it may have matched more than one sub-
        string in the subject. Note, however, that all the matches from one run
-       of  the  function  start  at the same point in the subject. The shorter
-       matches are all initial substrings of the longer matches. For  example,
+       of the function start at the same point in  the  subject.  The  shorter
+       matches  are all initial substrings of the longer matches. For example,
        if the pattern


          <.*>
@@ -3659,80 +3668,80 @@
          <something> <something else>
          <something>


-       On  success,  the  yield of the function is a number greater than zero,
-       which is the number of matched substrings.  The  offsets  of  the  sub-
-       strings  are returned in the ovector, and can be extracted by number in
-       the same way as for pcre2_match(), but the numbers bear no relation  to
-       any  capture groups that may exist in the pattern, because DFA matching
+       On success, the yield of the function is a number  greater  than  zero,
+       which  is  the  number  of  matched substrings. The offsets of the sub-
+       strings are returned in the ovector, and can be extracted by number  in
+       the  same way as for pcre2_match(), but the numbers bear no relation to
+       any capture groups that may exist in the pattern, because DFA  matching
        does not support capturing.


-       Calls to the convenience functions that extract substrings by name  re-
+       Calls  to the convenience functions that extract substrings by name re-
        turn the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used af-
-       ter a DFA match. The convenience functions that extract  substrings  by
+       ter  a  DFA match. The convenience functions that extract substrings by
        number never return PCRE2_ERROR_NOSUBSTRING.


-       The  matched  strings  are  stored  in  the ovector in reverse order of
-       length; that is, the longest matching string is first.  If  there  were
-       too  many matches to fit into the ovector, the yield of the function is
+       The matched strings are stored in  the  ovector  in  reverse  order  of
+       length;  that  is,  the longest matching string is first. If there were
+       too many matches to fit into the ovector, the yield of the function  is
        zero, and the vector is filled with the longest matches.


-       NOTE: PCRE2's "auto-possessification" optimization usually  applies  to
-       character  repeats at the end of a pattern (as well as internally). For
-       example, the pattern "a\d+" is compiled as if it were "a\d++". For  DFA
-       matching,  this means that only one possible match is found. If you re-
+       NOTE:  PCRE2's  "auto-possessification" optimization usually applies to
+       character repeats at the end of a pattern (as well as internally).  For
+       example,  the pattern "a\d+" is compiled as if it were "a\d++". For DFA
+       matching, this means that only one possible match is found. If you  re-
        ally do want multiple matches in such cases, either use an ungreedy re-
-       peat  such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when com-
+       peat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when  com-
        piling.


    Error returns from pcre2_dfa_match()


        The pcre2_dfa_match() function returns a negative number when it fails.
-       Many  of  the  errors  are  the same as for pcre2_match(), as described
+       Many of the errors are the same  as  for  pcre2_match(),  as  described
        above.  There are in addition the following errors that are specific to
        pcre2_dfa_match():


          PCRE2_ERROR_DFA_UITEM


-       This  return  is  given  if pcre2_dfa_match() encounters an item in the
-       pattern that it does not support, for instance, the use of \C in a  UTF
+       This return is given if pcre2_dfa_match() encounters  an  item  in  the
+       pattern  that it does not support, for instance, the use of \C in a UTF
        mode or a backreference.


          PCRE2_ERROR_DFA_UCOND


-       This  return  is given if pcre2_dfa_match() encounters a condition item
+       This return is given if pcre2_dfa_match() encounters a  condition  item
        that uses a backreference for the condition, or a test for recursion in
        a specific capture group. These are not supported.


          PCRE2_ERROR_DFA_UINVALID_UTF


-       This  return is given if pcre2_dfa_match() is called for a pattern that
-       was compiled with PCRE2_MATCH_INVALID_UTF. This is  not  supported  for
+       This return is given if pcre2_dfa_match() is called for a pattern  that
+       was  compiled  with  PCRE2_MATCH_INVALID_UTF. This is not supported for
        DFA matching.


          PCRE2_ERROR_DFA_WSSIZE


-       This  return  is  given  if  pcre2_dfa_match() runs out of space in the
+       This return is given if pcre2_dfa_match() runs  out  of  space  in  the
        workspace vector.


          PCRE2_ERROR_DFA_RECURSE


        When a recursion or subroutine call is processed, the matching function
-       calls  itself  recursively,  using  private  memory for the ovector and
-       workspace.  This error is given if the internal ovector  is  not  large
-       enough.  This  should  be  extremely  rare, as a vector of size 1000 is
+       calls itself recursively, using private  memory  for  the  ovector  and
+       workspace.   This  error  is given if the internal ovector is not large
+       enough. This should be extremely rare, as a  vector  of  size  1000  is
        used.


          PCRE2_ERROR_DFA_BADRESTART


-       When pcre2_dfa_match() is called  with  the  PCRE2_DFA_RESTART  option,
-       some  plausibility  checks  are  made on the contents of the workspace,
-       which should contain data about the previous partial match. If  any  of
+       When  pcre2_dfa_match()  is  called  with the PCRE2_DFA_RESTART option,
+       some plausibility checks are made on the  contents  of  the  workspace,
+       which  should  contain data about the previous partial match. If any of
        these checks fail, this error is given.



SEE ALSO

-       pcre2build(3),    pcre2callout(3),    pcre2demo(3),   pcre2matching(3),
+       pcre2build(3),   pcre2callout(3),    pcre2demo(3),    pcre2matching(3),
        pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).



@@ -3745,8 +3754,8 @@

REVISION

-       Last updated: 27 December 2019
-       Copyright (c) 1997-2019 University of Cambridge.
+       Last updated: 22 January 2020
+       Copyright (c) 1997-2020 University of Cambridge.
 ------------------------------------------------------------------------------




Modified: code/trunk/doc/pcre2_substitute.3
===================================================================
--- code/trunk/doc/pcre2_substitute.3    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/pcre2_substitute.3    2020-01-22 17:50:12 UTC (rev 1206)
@@ -1,4 +1,4 @@
-.TH PCRE2_SUBSTITUTE 3 "05 January 2020" "PCRE2 10.35"
+.TH PCRE2_SUBSTITUTE 3 "22 January 2020" "PCRE2 10.35"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -73,6 +73,7 @@
   PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal
   PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match
   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
+  PCRE2_SUBSTITUTE_REPLACEMENT_ONLY  Return only replacement string(s) 
   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
   PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
 .sp


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/pcre2api.3    2020-01-22 17:50:12 UTC (rev 1206)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "27 December 2019" "PCRE2 10.35"
+.TH PCRE2API 3 "22 January 2020" "PCRE2 10.35"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -3324,10 +3324,11 @@
 This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
 subject string in \fIoutputbuffer\fP, replacing parts that were matched with
 the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
-can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
-is to perform just one replacement if the pattern matches, but there is an
-option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
-for details).
+can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
+option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
+replacement string(s). The default action is to perform just one replacement if
+the pattern matches, but there is an option that requests multiple replacements
+(see PCRE2_SUBSTITUTE_GLOBAL below for details).
 .P
 If successful, \fBpcre2_substitute()\fP returns the number of substitutions
 that were carried out. This may be zero if no match was found, and is never
@@ -3362,11 +3363,22 @@
 an application to check for a match before choosing to substitute, without
 having to repeat the match.
 .P
-The \fIcode\fP argument is not used for the first substitution, but if
-PCRE2_SUBSTITUTE_GLOBAL is set, \fBpcre2_match()\fP will be called after the
-first substitution to check for further matches, and the contents of the
-\fImatch_data\fP block will be changed.
+The \fIcode\fP argument is not used for the first substitution when
+PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
+\fBpcre2_match()\fP will be called after the first substitution to check for
+further matches, and the contents of the \fImatch_data\fP block will be
+changed.
 .P
+The default is to return a copy of the subject string with matched substrings 
+replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the 
+replacement substrings are returned. In the global case, multiple replacements 
+are concatenated in the output buffer. Substitution callouts (see
+.\" HTML <a href="#subcallouts">
+.\" </a>
+below)
+.\"
+can be used to separate them if necessary.
+.P
 The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
 variable that contains the length, in code units, of the output buffer. If the
 function is successful, the value is updated to contain the length of the new
@@ -3557,6 +3569,7 @@
 .\"
 .
 .
+.\" HTML <a name="subcallouts"></a>
 .SS "Substitution callouts"
 .rs
 .sp
@@ -3904,6 +3917,6 @@
 .rs
 .sp
 .nf
-Last updated: 27 December 2019
-Copyright (c) 1997-2019 University of Cambridge.
+Last updated: 22 January 2020
+Copyright (c) 1997-2020 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/pcre2test.1    2020-01-22 17:50:12 UTC (rev 1206)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "26 December 2019" "PCRE 10.35"
+.TH PCRE2TEST 1 "22 January 2020" "PCRE 10.35"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -1011,25 +1011,27 @@
 processed with that pattern. These modifiers do not affect the compilation
 process.
 .sp
-      aftertext                  show text after match
-      allaftertext               show text after captures
-      allcaptures                show all captures
-      allvector                  show the entire ovector
-      allusedtext                show all consulted text
-      altglobal                  alternative global matching
-  /g  global                     global matching
-      jitstack=<n>               set size of JIT stack
-      mark                       show mark values
-      replace=<string>           specify a replacement string
-      startchar                  show starting character when relevant
-      substitute_callout         use substitution callouts
-      substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
-      substitute_literal         use PCRE2_SUBSTITUTE_LITERAL 
-      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
-      substitute_skip=<n>        skip substitution number n
-      substitute_stop=<n>        skip substitution number n and greater
-      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
-      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+      aftertext                   show text after match
+      allaftertext                show text after captures
+      allcaptures                 show all captures
+      allvector                   show the entire ovector
+      allusedtext                 show all consulted text
+      altglobal                   alternative global matching
+  /g  global                      global matching
+      jitstack=<n>                set size of JIT stack
+      mark                        show mark values
+      replace=<string>            specify a replacement string
+      startchar                   show starting character when relevant
+      substitute_callout          use substitution callouts
+      substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
+      substitute_matched          use PCRE2_SUBSTITUTE_MATCHED  
+      substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 
+      substitute_skip=<n>         skip substitution <n>
+      substitute_stop=<n>         skip substitution <n> and following
+      substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY
 .sp
 These modifiers may not appear in a \fB#pattern\fP command. If you want them as
 defaults, set them in a \fB#subject\fP command.
@@ -1203,7 +1205,9 @@
       substitute_callout         use substitution callouts
       substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
       substitute_literal         use PCRE2_SUBSTITUTE_LITERAL 
+      substitute_matched         use PCRE2_SUBSTITUTE_MATCHED 
       substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 
       substitute_skip=<n>        skip substitution number n
       substitute_stop=<n>        skip substitution number n and greater
       substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@@ -1367,9 +1371,10 @@
 .rs
 .sp
 If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
-called instead of one of the matching functions. Note that replacement strings
-cannot contain commas, because a comma signifies the end of a modifier. This is
-not thought to be an issue in a test program.
+called instead of one of the matching functions (or after one call of 
+\fBpcre2_match()\fP in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
+replacement strings cannot contain commas, because a comma signifies the end of
+a modifier. This is not thought to be an issue in a test program.
 .P
 Unlike subject strings, \fBpcre2test\fP does not process replacement strings
 for escape sequences. In UTF mode, a replacement string is checked to see if it
@@ -1384,10 +1389,17 @@
   global                      PCRE2_SUBSTITUTE_GLOBAL
   substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
   substitute_literal          PCRE2_SUBSTITUTE_LITERAL 
+  substitute_matched          PCRE2_SUBSTITUTE_MATCHED 
   substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+  substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 
   substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
   substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
 .sp
+See the
+.\" HREF
+\fBpcre2api\fP
+.\"
+documentation for details of these options.
 .P
 After a successful substitution, the modified string is output, preceded by the
 number of replacements. This may be zero if there were no matches. Here is a
@@ -2076,6 +2088,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 December 2019
-Copyright (c) 1997-2019 University of Cambridge.
+Last updated: 22 January 2020
+Copyright (c) 1997-2020 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/doc/pcre2test.txt    2020-01-22 17:50:12 UTC (rev 1206)
@@ -936,25 +936,27 @@
        ject  line  that is processed with that pattern. These modifiers do not
        affect the compilation process.


-             aftertext                  show text after match
-             allaftertext               show text after captures
-             allcaptures                show all captures
-             allvector                  show the entire ovector
-             allusedtext                show all consulted text
-             altglobal                  alternative global matching
-         /g  global                     global matching
-             jitstack=<n>               set size of JIT stack
-             mark                       show mark values
-             replace=<string>           specify a replacement string
-             startchar                  show starting character when relevant
-             substitute_callout         use substitution callouts
-             substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
-             substitute_literal         use PCRE2_SUBSTITUTE_LITERAL
-             substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
-             substitute_skip=<n>        skip substitution number n
-             substitute_stop=<n>        skip substitution number n and greater
-             substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
-             substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+             aftertext                   show text after match
+             allaftertext                show text after captures
+             allcaptures                 show all captures
+             allvector                   show the entire ovector
+             allusedtext                 show all consulted text
+             altglobal                   alternative global matching
+         /g  global                      global matching
+             jitstack=<n>                set size of JIT stack
+             mark                        show mark values
+             replace=<string>            specify a replacement string
+             startchar                   show starting character when relevant
+             substitute_callout          use substitution callouts
+             substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
+             substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
+             substitute_matched          use PCRE2_SUBSTITUTE_MATCHED
+             substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+             substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
+             substitute_skip=<n>         skip substitution <n>
+             substitute_stop=<n>         skip substitution <n> and following
+             substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+             substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY


        These modifiers may not appear in a #pattern command. If you want  them
        as defaults, set them in a #subject command.
@@ -1105,7 +1107,9 @@
              substitute_callout         use substitution callouts
              substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
              substitute_literal         use PCRE2_SUBSTITUTE_LITERAL
+             substitute_matched         use PCRE2_SUBSTITUTE_MATCHED
              substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+             substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
              substitute_skip=<n>        skip substitution number n
              substitute_stop=<n>        skip substitution number n and greater
              substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@@ -1251,9 +1255,11 @@
    Testing the substitution function


        If  the  replace  modifier  is  set, the pcre2_substitute() function is
-       called instead of one of the matching functions. Note that  replacement
-       strings  cannot  contain commas, because a comma signifies the end of a
-       modifier. This is not thought to be an issue in a test program.
+       called instead of one of the matching functions (or after one  call  of
+       pcre2_match()  in  the case of PCRE2_SUBSTITUTE_MATCHED). Note that re-
+       placement strings cannot contain commas, because a comma signifies  the
+       end  of  a  modifier. This is not thought to be an issue in a test pro-
+       gram.


        Unlike subject strings, pcre2test does not process replacement  strings
        for  escape  sequences. In UTF mode, a replacement string is checked to
@@ -1268,10 +1274,13 @@
          global                      PCRE2_SUBSTITUTE_GLOBAL
          substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
          substitute_literal          PCRE2_SUBSTITUTE_LITERAL
+         substitute_matched          PCRE2_SUBSTITUTE_MATCHED
          substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+         substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
          substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
          substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY


+       See the pcre2api documentation for details of these options.


        After a successful substitution, the modified string  is  output,  pre-
        ceded  by the number of replacements. This may be zero if there were no
@@ -1905,5 +1914,5 @@


REVISION

-       Last updated: 26 December 2019
-       Copyright (c) 1997-2019 University of Cambridge.
+       Last updated: 22 January 2020
+       Copyright (c) 1997-2020 University of Cambridge.


Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/src/pcre2.h.in    2020-01-22 17:50:12 UTC (rev 1206)
@@ -5,7 +5,7 @@
 /* This is the public header file for the PCRE library, second API, to be
 #included by applications that call PCRE2 functions.


-           Copyright (c) 2016-2019 University of Cambridge
+           Copyright (c) 2016-2020 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -183,6 +183,7 @@
 #define PCRE2_COPY_MATCHED_SUBJECT        0x00004000u
 #define PCRE2_SUBSTITUTE_LITERAL          0x00008000u  /* pcre2_substitute() only */
 #define PCRE2_SUBSTITUTE_MATCHED          0x00010000u  /* pcre2_substitute() only */
+#define PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 0x00020000u  /* pcre2_substitute() only */


/* Options for pcre2_pattern_convert(). */


Modified: code/trunk/src/pcre2_substitute.c
===================================================================
--- code/trunk/src/pcre2_substitute.c    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/src/pcre2_substitute.c    2020-01-22 17:50:12 UTC (rev 1206)
@@ -7,7 +7,7 @@


                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2019 University of Cambridge
+          New API code Copyright (c) 2016-2020 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -50,8 +50,8 @@
 #define SUBSTITUTE_OPTIONS \
   (PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
    PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
-   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
-   PCRE2_SUBSTITUTE_UNSET_EMPTY)
+   PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_REPLACEMENT_ONLY| \
+   PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY)




@@ -195,6 +195,7 @@
length. */

 #define CHECKMEMCPY(from,length) \
+  { \
   if (!overflowed && lengthleft < length) \
     { \
     if ((suboptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) == 0) goto NOROOM; \
@@ -210,7 +211,8 @@
     memcpy(buffer + buff_offset, from, CU2BYTES(length)); \
     buff_offset += length; \
     lengthleft -= length; \
-    }
+    } \
+  }


/* Here's the function */

@@ -231,6 +233,7 @@
BOOL escaped_literal = FALSE;
BOOL overflowed = FALSE;
BOOL use_existing_match;
+BOOL replacement_only;
#ifdef SUPPORT_UNICODE
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
#endif
@@ -256,10 +259,11 @@
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION;

-/* Check for using a match that has already happened. Note that the subject
+/* Check for using a match that has already happened. Note that the subject
pointer in the match data may be NULL after a no-match. */

use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
+replacement_only = ((options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) != 0);

if (use_existing_match)
{
@@ -312,7 +316,7 @@
suboptions = options & SUBSTITUTE_OPTIONS;
options &= ~SUBSTITUTE_OPTIONS;

-/* Copy up to the start offset */
+/* Error if the start match offset it greater than the length of the subject. */

if (start_offset > length)
{
@@ -320,8 +324,11 @@
rc = PCRE2_ERROR_BADOFFSET;
goto EXIT;
}
-CHECKMEMCPY(subject, start_offset);

+/* Copy up to the start offset, unless only the replacement is required. */
+
+if (!replacement_only) CHECKMEMCPY(subject, start_offset);
+
/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
match is taken from the match_data that was passed in. */

@@ -382,11 +389,11 @@
 #endif
       }


-    /* Copy what we have advanced past, reset the special global options, and
-    continue to the next match. */
+    /* Copy what we have advanced past (unless not required), reset the special
+    global options, and continue to the next match. */


     fraglength = start_offset - save_start;
-    CHECKMEMCPY(subject + save_start, fraglength);
+    if (!replacement_only) CHECKMEMCPY(subject + save_start, fraglength);
     goptions = 0;
     continue;
     }
@@ -430,12 +437,12 @@
     }
   subs++;


- /* Copy the text leading up to the match, and remember where the insert
- begins and how many ovector pairs are set. */
+ /* Copy the text leading up to the match (unless not required), and remember
+ where the insert begins and how many ovector pairs are set. */

if (rc == 0) rc = ovector_count;
fraglength = ovector[0] - start_offset;
- CHECKMEMCPY(subject + start_offset, fraglength);
+ if (!replacement_only) CHECKMEMCPY(subject + start_offset, fraglength);
scb.output_offsets[0] = buff_offset;
scb.oveccount = rc;

@@ -882,7 +889,7 @@

       buff_offset -= newlength;
       lengthleft += newlength;
-      CHECKMEMCPY(subject + ovector[0], oldlength);
+      if (!replacement_only) CHECKMEMCPY(subject + ovector[0], oldlength);


       /* A negative return means do not do any more. */


@@ -903,12 +910,17 @@
start_offset = ovector[1];
} while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */

-/* Copy the rest of the subject. */
+/* Copy the rest of the subject unless not required, and terminate the output
+with a binary zero. */

-fraglength = length - start_offset;
-CHECKMEMCPY(subject + start_offset, fraglength);
+if (!replacement_only)
+ {
+ fraglength = length - start_offset;
+ CHECKMEMCPY(subject + start_offset, fraglength);
+ }
+
temp[0] = 0;
-CHECKMEMCPY(temp , 1);
+CHECKMEMCPY(temp, 1);

/* If overflowed is set it means the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set,
and matching has carried on after a full buffer, in order to compute the length

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/src/pcre2test.c    2020-01-22 17:50:12 UTC (rev 1206)
@@ -11,7 +11,7 @@


                        Written by Philip Hazel
      Original code Copyright (c) 1997-2012 University of Cambridge
-    Rewritten code Copyright (c) 2016-2019 University of Cambridge
+    Rewritten code Copyright (c) 2016-2020 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -505,12 +505,13 @@
 #define CTL2_SUBSTITUTE_LITERAL          0x00000004u
 #define CTL2_SUBSTITUTE_MATCHED          0x00000008u
 #define CTL2_SUBSTITUTE_OVERFLOW_LENGTH  0x00000010u
-#define CTL2_SUBSTITUTE_UNKNOWN_UNSET    0x00000020u
-#define CTL2_SUBSTITUTE_UNSET_EMPTY      0x00000040u
-#define CTL2_SUBJECT_LITERAL             0x00000080u
-#define CTL2_CALLOUT_NO_WHERE            0x00000100u
-#define CTL2_CALLOUT_EXTRA               0x00000200u
-#define CTL2_ALLVECTOR                   0x00000400u
+#define CTL2_SUBSTITUTE_REPLACEMENT_ONLY 0x00000020u
+#define CTL2_SUBSTITUTE_UNKNOWN_UNSET    0x00000040u
+#define CTL2_SUBSTITUTE_UNSET_EMPTY      0x00000080u
+#define CTL2_SUBJECT_LITERAL             0x00000100u
+#define CTL2_CALLOUT_NO_WHERE            0x00000200u
+#define CTL2_CALLOUT_EXTRA               0x00000400u
+#define CTL2_ALLVECTOR                   0x00000800u


 #define CTL2_NL_SET                      0x40000000u  /* Informational */
 #define CTL2_BSR_SET                     0x80000000u  /* Informational */
@@ -535,6 +536,7 @@
                     CTL2_SUBSTITUTE_LITERAL|\
                     CTL2_SUBSTITUTE_MATCHED|\
                     CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
+                    CTL2_SUBSTITUTE_REPLACEMENT_ONLY|\
                     CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
                     CTL2_SUBSTITUTE_UNSET_EMPTY|\
                     CTL2_ALLVECTOR)
@@ -614,129 +616,130 @@
 } modstruct;


 static modstruct modlist[] = {
-  { "aftertext",                  MOD_PNDP, MOD_CTL, CTL_AFTERTEXT,              PO(control) },
-  { "allaftertext",               MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT,           PO(control) },
-  { "allcaptures",                MOD_PND,  MOD_CTL, CTL_ALLCAPTURES,            PO(control) },
-  { "allow_empty_class",          MOD_PAT,  MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS,    PO(options) },
-  { "allow_surrogate_escapes",    MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES, CO(extra_options) },
-  { "allusedtext",                MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT,            PO(control) },
-  { "allvector",                  MOD_PND,  MOD_CTL, CTL2_ALLVECTOR,             PO(control2) },
-  { "alt_bsux",                   MOD_PAT,  MOD_OPT, PCRE2_ALT_BSUX,             PO(options) },
-  { "alt_circumflex",             MOD_PAT,  MOD_OPT, PCRE2_ALT_CIRCUMFLEX,       PO(options) },
-  { "alt_verbnames",              MOD_PAT,  MOD_OPT, PCRE2_ALT_VERBNAMES,        PO(options) },
-  { "altglobal",                  MOD_PND,  MOD_CTL, CTL_ALTGLOBAL,              PO(control) },
-  { "anchored",                   MOD_PD,   MOD_OPT, PCRE2_ANCHORED,             PD(options) },
-  { "auto_callout",               MOD_PAT,  MOD_OPT, PCRE2_AUTO_CALLOUT,         PO(options) },
-  { "bad_escape_is_literal",      MOD_CTC,  MOD_OPT, PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL, CO(extra_options) },
-  { "bincode",                    MOD_PAT,  MOD_CTL, CTL_BINCODE,                PO(control) },
-  { "bsr",                        MOD_CTC,  MOD_BSR, 0,                          CO(bsr_convention) },
-  { "callout_capture",            MOD_DAT,  MOD_CTL, CTL_CALLOUT_CAPTURE,        DO(control) },
-  { "callout_data",               MOD_DAT,  MOD_INS, 0,                          DO(callout_data) },
-  { "callout_error",              MOD_DAT,  MOD_IN2, 0,                          DO(cerror) },
-  { "callout_extra",              MOD_DAT,  MOD_CTL, CTL2_CALLOUT_EXTRA,         DO(control2) },
-  { "callout_fail",               MOD_DAT,  MOD_IN2, 0,                          DO(cfail) },
-  { "callout_info",               MOD_PAT,  MOD_CTL, CTL_CALLOUT_INFO,           PO(control) },
-  { "callout_no_where",           MOD_DAT,  MOD_CTL, CTL2_CALLOUT_NO_WHERE,      DO(control2) },
-  { "callout_none",               MOD_DAT,  MOD_CTL, CTL_CALLOUT_NONE,           DO(control) },
-  { "caseless",                   MOD_PATP, MOD_OPT, PCRE2_CASELESS,             PO(options) },
-  { "convert",                    MOD_PAT,  MOD_CON, 0,                          PO(convert_type) },
-  { "convert_glob_escape",        MOD_PAT,  MOD_CHR, 0,                          PO(convert_glob_escape) },
-  { "convert_glob_separator",     MOD_PAT,  MOD_CHR, 0,                          PO(convert_glob_separator) },
-  { "convert_length",             MOD_PAT,  MOD_INT, 0,                          PO(convert_length) },
-  { "copy",                       MOD_DAT,  MOD_NN,  DO(copy_numbers),           DO(copy_names) },
-  { "copy_matched_subject",       MOD_DAT,  MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
-  { "debug",                      MOD_PAT,  MOD_CTL, CTL_DEBUG,                  PO(control) },
-  { "depth_limit",                MOD_CTM,  MOD_INT, 0,                          MO(depth_limit) },
-  { "dfa",                        MOD_DAT,  MOD_CTL, CTL_DFA,                    DO(control) },
-  { "dfa_restart",                MOD_DAT,  MOD_OPT, PCRE2_DFA_RESTART,          DO(options) },
-  { "dfa_shortest",               MOD_DAT,  MOD_OPT, PCRE2_DFA_SHORTEST,         DO(options) },
-  { "dollar_endonly",             MOD_PAT,  MOD_OPT, PCRE2_DOLLAR_ENDONLY,       PO(options) },
-  { "dotall",                     MOD_PATP, MOD_OPT, PCRE2_DOTALL,               PO(options) },
-  { "dupnames",                   MOD_PATP, MOD_OPT, PCRE2_DUPNAMES,             PO(options) },
-  { "endanchored",                MOD_PD,   MOD_OPT, PCRE2_ENDANCHORED,          PD(options) },
-  { "escaped_cr_is_lf",           MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ESCAPED_CR_IS_LF, CO(extra_options) },
-  { "expand",                     MOD_PAT,  MOD_CTL, CTL_EXPAND,                 PO(control) },
-  { "extended",                   MOD_PATP, MOD_OPT, PCRE2_EXTENDED,             PO(options) },
-  { "extended_more",              MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE,        PO(options) },
-  { "extra_alt_bsux",             MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ALT_BSUX,       CO(extra_options) },
-  { "find_limits",                MOD_DAT,  MOD_CTL, CTL_FINDLIMITS,             DO(control) },
-  { "firstline",                  MOD_PAT,  MOD_OPT, PCRE2_FIRSTLINE,            PO(options) },
-  { "framesize",                  MOD_PAT,  MOD_CTL, CTL_FRAMESIZE,              PO(control) },
-  { "fullbincode",                MOD_PAT,  MOD_CTL, CTL_FULLBINCODE,            PO(control) },
-  { "get",                        MOD_DAT,  MOD_NN,  DO(get_numbers),            DO(get_names) },
-  { "getall",                     MOD_DAT,  MOD_CTL, CTL_GETALL,                 DO(control) },
-  { "global",                     MOD_PNDP, MOD_CTL, CTL_GLOBAL,                 PO(control) },
-  { "heap_limit",                 MOD_CTM,  MOD_INT, 0,                          MO(heap_limit) },
-  { "hex",                        MOD_PAT,  MOD_CTL, CTL_HEXPAT,                 PO(control) },
-  { "info",                       MOD_PAT,  MOD_CTL, CTL_INFO,                   PO(control) },
-  { "jit",                        MOD_PAT,  MOD_IND, 7,                          PO(jit) },
-  { "jitfast",                    MOD_PAT,  MOD_CTL, CTL_JITFAST,                PO(control) },
-  { "jitstack",                   MOD_PNDP, MOD_INT, 0,                          PO(jitstack) },
-  { "jitverify",                  MOD_PAT,  MOD_CTL, CTL_JITVERIFY,              PO(control) },
-  { "literal",                    MOD_PAT,  MOD_OPT, PCRE2_LITERAL,              PO(options) },
-  { "locale",                     MOD_PAT,  MOD_STR, LOCALESIZE,                 PO(locale) },
-  { "mark",                       MOD_PNDP, MOD_CTL, CTL_MARK,                   PO(control) },
-  { "match_invalid_utf",          MOD_PAT,  MOD_OPT, PCRE2_MATCH_INVALID_UTF,    PO(options) },
-  { "match_limit",                MOD_CTM,  MOD_INT, 0,                          MO(match_limit) },
-  { "match_line",                 MOD_CTC,  MOD_OPT, PCRE2_EXTRA_MATCH_LINE,     CO(extra_options) },
-  { "match_unset_backref",        MOD_PAT,  MOD_OPT, PCRE2_MATCH_UNSET_BACKREF,  PO(options) },
-  { "match_word",                 MOD_CTC,  MOD_OPT, PCRE2_EXTRA_MATCH_WORD,     CO(extra_options) },
-  { "max_pattern_length",         MOD_CTC,  MOD_SIZ, 0,                          CO(max_pattern_length) },
-  { "memory",                     MOD_PD,   MOD_CTL, CTL_MEMORY,                 PD(control) },
-  { "multiline",                  MOD_PATP, MOD_OPT, PCRE2_MULTILINE,            PO(options) },
-  { "never_backslash_c",          MOD_PAT,  MOD_OPT, PCRE2_NEVER_BACKSLASH_C,    PO(options) },
-  { "never_ucp",                  MOD_PAT,  MOD_OPT, PCRE2_NEVER_UCP,            PO(options) },
-  { "never_utf",                  MOD_PAT,  MOD_OPT, PCRE2_NEVER_UTF,            PO(options) },
-  { "newline",                    MOD_CTC,  MOD_NL,  0,                          CO(newline_convention) },
-  { "no_auto_capture",            MOD_PAT,  MOD_OPT, PCRE2_NO_AUTO_CAPTURE,      PO(options) },
-  { "no_auto_possess",            MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS,      PO(options) },
-  { "no_dotstar_anchor",          MOD_PAT,  MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR,    PO(options) },
-  { "no_jit",                     MOD_DAT,  MOD_OPT, PCRE2_NO_JIT,               DO(options) },
-  { "no_start_optimize",          MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE,    PO(options) },
-  { "no_utf_check",               MOD_PD,   MOD_OPT, PCRE2_NO_UTF_CHECK,         PD(options) },
-  { "notbol",                     MOD_DAT,  MOD_OPT, PCRE2_NOTBOL,               DO(options) },
-  { "notempty",                   MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY,             DO(options) },
-  { "notempty_atstart",           MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY_ATSTART,     DO(options) },
-  { "noteol",                     MOD_DAT,  MOD_OPT, PCRE2_NOTEOL,               DO(options) },
-  { "null_context",               MOD_PD,   MOD_CTL, CTL_NULLCONTEXT,            PO(control) },
-  { "offset",                     MOD_DAT,  MOD_INT, 0,                          DO(offset) },
-  { "offset_limit",               MOD_CTM,  MOD_SIZ, 0,                          MO(offset_limit)},
-  { "ovector",                    MOD_DAT,  MOD_INT, 0,                          DO(oveccount) },
-  { "parens_nest_limit",          MOD_CTC,  MOD_INT, 0,                          CO(parens_nest_limit) },
-  { "partial_hard",               MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,         DO(options) },
-  { "partial_soft",               MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,         DO(options) },
-  { "ph",                         MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,         DO(options) },
-  { "posix",                      MOD_PAT,  MOD_CTL, CTL_POSIX,                  PO(control) },
-  { "posix_nosub",                MOD_PAT,  MOD_CTL, CTL_POSIX|CTL_POSIX_NOSUB,  PO(control) },
-  { "posix_startend",             MOD_DAT,  MOD_IN2, 0,                          DO(startend) },
-  { "ps",                         MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,         DO(options) },
-  { "push",                       MOD_PAT,  MOD_CTL, CTL_PUSH,                   PO(control) },
-  { "pushcopy",                   MOD_PAT,  MOD_CTL, CTL_PUSHCOPY,               PO(control) },
-  { "pushtablescopy",             MOD_PAT,  MOD_CTL, CTL_PUSHTABLESCOPY,         PO(control) },
-  { "recursion_limit",            MOD_CTM,  MOD_INT, 0,                          MO(depth_limit) },  /* Obsolete synonym */
-  { "regerror_buffsize",          MOD_PAT,  MOD_INT, 0,                          PO(regerror_buffsize) },
-  { "replace",                    MOD_PND,  MOD_STR, REPLACE_MODSIZE,            PO(replacement) },
-  { "stackguard",                 MOD_PAT,  MOD_INT, 0,                          PO(stackguard_test) },
-  { "startchar",                  MOD_PND,  MOD_CTL, CTL_STARTCHAR,              PO(control) },
-  { "startoffset",                MOD_DAT,  MOD_INT, 0,                          DO(offset) },
-  { "subject_literal",            MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL,       PO(control2) },
-  { "substitute_callout",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_CALLOUT,    PO(control2) },
-  { "substitute_extended",        MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_EXTENDED,   PO(control2) },
-  { "substitute_literal",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_LITERAL,    PO(control2) },
-  { "substitute_matched",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_MATCHED,    PO(control2) },
-  { "substitute_overflow_length", MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
-  { "substitute_skip",            MOD_PND,  MOD_INT, 0,                          PO(substitute_skip) },
-  { "substitute_stop",            MOD_PND,  MOD_INT, 0,                          PO(substitute_stop) },
-  { "substitute_unknown_unset",   MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
-  { "substitute_unset_empty",     MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
-  { "tables",                     MOD_PAT,  MOD_INT, 0,                          PO(tables_id) },
-  { "ucp",                        MOD_PATP, MOD_OPT, PCRE2_UCP,                  PO(options) },
-  { "ungreedy",                   MOD_PAT,  MOD_OPT, PCRE2_UNGREEDY,             PO(options) },
-  { "use_length",                 MOD_PAT,  MOD_CTL, CTL_USE_LENGTH,             PO(control) },
-  { "use_offset_limit",           MOD_PAT,  MOD_OPT, PCRE2_USE_OFFSET_LIMIT,     PO(options) },
-  { "utf",                        MOD_PATP, MOD_OPT, PCRE2_UTF,                  PO(options) },
-  { "utf8_input",                 MOD_PAT,  MOD_CTL, CTL_UTF8_INPUT,             PO(control) },
-  { "zero_terminate",             MOD_DAT,  MOD_CTL, CTL_ZERO_TERMINATE,         DO(control) }
+  { "aftertext",                   MOD_PNDP, MOD_CTL, CTL_AFTERTEXT,              PO(control) },
+  { "allaftertext",                MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT,           PO(control) },
+  { "allcaptures",                 MOD_PND,  MOD_CTL, CTL_ALLCAPTURES,            PO(control) },
+  { "allow_empty_class",           MOD_PAT,  MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS,    PO(options) },
+  { "allow_surrogate_escapes",     MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES, CO(extra_options) },
+  { "allusedtext",                 MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT,            PO(control) },
+  { "allvector",                   MOD_PND,  MOD_CTL, CTL2_ALLVECTOR,             PO(control2) },
+  { "alt_bsux",                    MOD_PAT,  MOD_OPT, PCRE2_ALT_BSUX,             PO(options) },
+  { "alt_circumflex",              MOD_PAT,  MOD_OPT, PCRE2_ALT_CIRCUMFLEX,       PO(options) },
+  { "alt_verbnames",               MOD_PAT,  MOD_OPT, PCRE2_ALT_VERBNAMES,        PO(options) },
+  { "altglobal",                   MOD_PND,  MOD_CTL, CTL_ALTGLOBAL,              PO(control) },
+  { "anchored",                    MOD_PD,   MOD_OPT, PCRE2_ANCHORED,             PD(options) },
+  { "auto_callout",                MOD_PAT,  MOD_OPT, PCRE2_AUTO_CALLOUT,         PO(options) },
+  { "bad_escape_is_literal",       MOD_CTC,  MOD_OPT, PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL, CO(extra_options) },
+  { "bincode",                     MOD_PAT,  MOD_CTL, CTL_BINCODE,                PO(control) },
+  { "bsr",                         MOD_CTC,  MOD_BSR, 0,                          CO(bsr_convention) },
+  { "callout_capture",             MOD_DAT,  MOD_CTL, CTL_CALLOUT_CAPTURE,        DO(control) },
+  { "callout_data",                MOD_DAT,  MOD_INS, 0,                          DO(callout_data) },
+  { "callout_error",               MOD_DAT,  MOD_IN2, 0,                          DO(cerror) },
+  { "callout_extra",               MOD_DAT,  MOD_CTL, CTL2_CALLOUT_EXTRA,         DO(control2) },
+  { "callout_fail",                MOD_DAT,  MOD_IN2, 0,                          DO(cfail) },
+  { "callout_info",                MOD_PAT,  MOD_CTL, CTL_CALLOUT_INFO,           PO(control) },
+  { "callout_no_where",            MOD_DAT,  MOD_CTL, CTL2_CALLOUT_NO_WHERE,      DO(control2) },
+  { "callout_none",                MOD_DAT,  MOD_CTL, CTL_CALLOUT_NONE,           DO(control) },
+  { "caseless",                    MOD_PATP, MOD_OPT, PCRE2_CASELESS,             PO(options) },
+  { "convert",                     MOD_PAT,  MOD_CON, 0,                          PO(convert_type) },
+  { "convert_glob_escape",         MOD_PAT,  MOD_CHR, 0,                          PO(convert_glob_escape) },
+  { "convert_glob_separator",      MOD_PAT,  MOD_CHR, 0,                          PO(convert_glob_separator) },
+  { "convert_length",              MOD_PAT,  MOD_INT, 0,                          PO(convert_length) },
+  { "copy",                        MOD_DAT,  MOD_NN,  DO(copy_numbers),           DO(copy_names) },
+  { "copy_matched_subject",        MOD_DAT,  MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
+  { "debug",                       MOD_PAT,  MOD_CTL, CTL_DEBUG,                  PO(control) },
+  { "depth_limit",                 MOD_CTM,  MOD_INT, 0,                          MO(depth_limit) },
+  { "dfa",                         MOD_DAT,  MOD_CTL, CTL_DFA,                    DO(control) },
+  { "dfa_restart",                 MOD_DAT,  MOD_OPT, PCRE2_DFA_RESTART,          DO(options) },
+  { "dfa_shortest",                MOD_DAT,  MOD_OPT, PCRE2_DFA_SHORTEST,         DO(options) },
+  { "dollar_endonly",              MOD_PAT,  MOD_OPT, PCRE2_DOLLAR_ENDONLY,       PO(options) },
+  { "dotall",                      MOD_PATP, MOD_OPT, PCRE2_DOTALL,               PO(options) },
+  { "dupnames",                    MOD_PATP, MOD_OPT, PCRE2_DUPNAMES,             PO(options) },
+  { "endanchored",                 MOD_PD,   MOD_OPT, PCRE2_ENDANCHORED,          PD(options) },
+  { "escaped_cr_is_lf",            MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ESCAPED_CR_IS_LF, CO(extra_options) },
+  { "expand",                      MOD_PAT,  MOD_CTL, CTL_EXPAND,                 PO(control) },
+  { "extended",                    MOD_PATP, MOD_OPT, PCRE2_EXTENDED,             PO(options) },
+  { "extended_more",               MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE,        PO(options) },
+  { "extra_alt_bsux",              MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ALT_BSUX,       CO(extra_options) },
+  { "find_limits",                 MOD_DAT,  MOD_CTL, CTL_FINDLIMITS,             DO(control) },
+  { "firstline",                   MOD_PAT,  MOD_OPT, PCRE2_FIRSTLINE,            PO(options) },
+  { "framesize",                   MOD_PAT,  MOD_CTL, CTL_FRAMESIZE,              PO(control) },
+  { "fullbincode",                 MOD_PAT,  MOD_CTL, CTL_FULLBINCODE,            PO(control) },
+  { "get",                         MOD_DAT,  MOD_NN,  DO(get_numbers),            DO(get_names) },
+  { "getall",                      MOD_DAT,  MOD_CTL, CTL_GETALL,                 DO(control) },
+  { "global",                      MOD_PNDP, MOD_CTL, CTL_GLOBAL,                 PO(control) },
+  { "heap_limit",                  MOD_CTM,  MOD_INT, 0,                          MO(heap_limit) },
+  { "hex",                         MOD_PAT,  MOD_CTL, CTL_HEXPAT,                 PO(control) },
+  { "info",                        MOD_PAT,  MOD_CTL, CTL_INFO,                   PO(control) },
+  { "jit",                         MOD_PAT,  MOD_IND, 7,                          PO(jit) },
+  { "jitfast",                     MOD_PAT,  MOD_CTL, CTL_JITFAST,                PO(control) },
+  { "jitstack",                    MOD_PNDP, MOD_INT, 0,                          PO(jitstack) },
+  { "jitverify",                   MOD_PAT,  MOD_CTL, CTL_JITVERIFY,              PO(control) },
+  { "literal",                     MOD_PAT,  MOD_OPT, PCRE2_LITERAL,              PO(options) },
+  { "locale",                      MOD_PAT,  MOD_STR, LOCALESIZE,                 PO(locale) },
+  { "mark",                        MOD_PNDP, MOD_CTL, CTL_MARK,                   PO(control) },
+  { "match_invalid_utf",           MOD_PAT,  MOD_OPT, PCRE2_MATCH_INVALID_UTF,    PO(options) },
+  { "match_limit",                 MOD_CTM,  MOD_INT, 0,                          MO(match_limit) },
+  { "match_line",                  MOD_CTC,  MOD_OPT, PCRE2_EXTRA_MATCH_LINE,     CO(extra_options) },
+  { "match_unset_backref",         MOD_PAT,  MOD_OPT, PCRE2_MATCH_UNSET_BACKREF,  PO(options) },
+  { "match_word",                  MOD_CTC,  MOD_OPT, PCRE2_EXTRA_MATCH_WORD,     CO(extra_options) },
+  { "max_pattern_length",          MOD_CTC,  MOD_SIZ, 0,                          CO(max_pattern_length) },
+  { "memory",                      MOD_PD,   MOD_CTL, CTL_MEMORY,                 PD(control) },
+  { "multiline",                   MOD_PATP, MOD_OPT, PCRE2_MULTILINE,            PO(options) },
+  { "never_backslash_c",           MOD_PAT,  MOD_OPT, PCRE2_NEVER_BACKSLASH_C,    PO(options) },
+  { "never_ucp",                   MOD_PAT,  MOD_OPT, PCRE2_NEVER_UCP,            PO(options) },
+  { "never_utf",                   MOD_PAT,  MOD_OPT, PCRE2_NEVER_UTF,            PO(options) },
+  { "newline",                     MOD_CTC,  MOD_NL,  0,                          CO(newline_convention) },
+  { "no_auto_capture",             MOD_PAT,  MOD_OPT, PCRE2_NO_AUTO_CAPTURE,      PO(options) },
+  { "no_auto_possess",             MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS,      PO(options) },
+  { "no_dotstar_anchor",           MOD_PAT,  MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR,    PO(options) },
+  { "no_jit",                      MOD_DAT,  MOD_OPT, PCRE2_NO_JIT,               DO(options) },
+  { "no_start_optimize",           MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE,    PO(options) },
+  { "no_utf_check",                MOD_PD,   MOD_OPT, PCRE2_NO_UTF_CHECK,         PD(options) },
+  { "notbol",                      MOD_DAT,  MOD_OPT, PCRE2_NOTBOL,               DO(options) },
+  { "notempty",                    MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY,             DO(options) },
+  { "notempty_atstart",            MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY_ATSTART,     DO(options) },
+  { "noteol",                      MOD_DAT,  MOD_OPT, PCRE2_NOTEOL,               DO(options) },
+  { "null_context",                MOD_PD,   MOD_CTL, CTL_NULLCONTEXT,            PO(control) },
+  { "offset",                      MOD_DAT,  MOD_INT, 0,                          DO(offset) },
+  { "offset_limit",                MOD_CTM,  MOD_SIZ, 0,                          MO(offset_limit)},
+  { "ovector",                     MOD_DAT,  MOD_INT, 0,                          DO(oveccount) },
+  { "parens_nest_limit",           MOD_CTC,  MOD_INT, 0,                          CO(parens_nest_limit) },
+  { "partial_hard",                MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,         DO(options) },
+  { "partial_soft",                MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,         DO(options) },
+  { "ph",                          MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,         DO(options) },
+  { "posix",                       MOD_PAT,  MOD_CTL, CTL_POSIX,                  PO(control) },
+  { "posix_nosub",                 MOD_PAT,  MOD_CTL, CTL_POSIX|CTL_POSIX_NOSUB,  PO(control) },
+  { "posix_startend",              MOD_DAT,  MOD_IN2, 0,                          DO(startend) },
+  { "ps",                          MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,         DO(options) },
+  { "push",                        MOD_PAT,  MOD_CTL, CTL_PUSH,                   PO(control) },
+  { "pushcopy",                    MOD_PAT,  MOD_CTL, CTL_PUSHCOPY,               PO(control) },
+  { "pushtablescopy",              MOD_PAT,  MOD_CTL, CTL_PUSHTABLESCOPY,         PO(control) },
+  { "recursion_limit",             MOD_CTM,  MOD_INT, 0,                          MO(depth_limit) },  /* Obsolete synonym */
+  { "regerror_buffsize",           MOD_PAT,  MOD_INT, 0,                          PO(regerror_buffsize) },
+  { "replace",                     MOD_PND,  MOD_STR, REPLACE_MODSIZE,            PO(replacement) },
+  { "stackguard",                  MOD_PAT,  MOD_INT, 0,                          PO(stackguard_test) },
+  { "startchar",                   MOD_PND,  MOD_CTL, CTL_STARTCHAR,              PO(control) },
+  { "startoffset",                 MOD_DAT,  MOD_INT, 0,                          DO(offset) },
+  { "subject_literal",             MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL,       PO(control2) },
+  { "substitute_callout",          MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_CALLOUT,    PO(control2) },
+  { "substitute_extended",         MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_EXTENDED,   PO(control2) },
+  { "substitute_literal",          MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_LITERAL,    PO(control2) },
+  { "substitute_matched",          MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_MATCHED,    PO(control2) },
+  { "substitute_overflow_length",  MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
+  { "substitute_replacement_only", MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_REPLACEMENT_ONLY, PO(control2) },
+  { "substitute_skip",             MOD_PND,  MOD_INT, 0,                          PO(substitute_skip) },
+  { "substitute_stop",             MOD_PND,  MOD_INT, 0,                          PO(substitute_stop) },
+  { "substitute_unknown_unset",    MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
+  { "substitute_unset_empty",      MOD_PND,  MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
+  { "tables",                      MOD_PAT,  MOD_INT, 0,                          PO(tables_id) },
+  { "ucp",                         MOD_PATP, MOD_OPT, PCRE2_UCP,                  PO(options) },
+  { "ungreedy",                    MOD_PAT,  MOD_OPT, PCRE2_UNGREEDY,             PO(options) },
+  { "use_length",                  MOD_PAT,  MOD_CTL, CTL_USE_LENGTH,             PO(control) },
+  { "use_offset_limit",            MOD_PAT,  MOD_OPT, PCRE2_USE_OFFSET_LIMIT,     PO(options) },
+  { "utf",                         MOD_PATP, MOD_OPT, PCRE2_UTF,                  PO(options) },
+  { "utf8_input",                  MOD_PAT,  MOD_CTL, CTL_UTF8_INPUT,             PO(control) },
+  { "zero_terminate",              MOD_DAT,  MOD_CTL, CTL_ZERO_TERMINATE,         DO(control) }
 };


#define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct)
@@ -4091,7 +4094,7 @@
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@@ -4132,6 +4135,7 @@
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
+ ((controls2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) != 0)? " substitute_replacement_only" : "",
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
((controls & CTL_USE_LENGTH) != 0)? " use_length" : "",
@@ -7257,17 +7261,17 @@

   if (timeitm)
     fprintf(outfile, "** Timing is not supported with replace: ignored\n");
-    
+
   if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
     fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");


- /* Check for a test that does substitution after an initial external match.
- If this is set, we run the external match, but leave the interpretation of
+ /* Check for a test that does substitution after an initial external match.
+ If this is set, we run the external match, but leave the interpretation of
its output to pcre2_substitute(). */

   emoption = ((dat_datctl.control2 & CTL2_SUBSTITUTE_MATCHED) == 0)? 0 :
     PCRE2_SUBSTITUTE_MATCHED;
-     
+
   if (emoption != 0)
     {
     PCRE2_MATCH(rc, compiled_code, pp, arg_ulen, dat_datctl.offset,
@@ -7283,6 +7287,8 @@
                 PCRE2_SUBSTITUTE_LITERAL) |
              (((dat_datctl.control2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) == 0)? 0 :
                 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) |
+             (((dat_datctl.control2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) == 0)? 0 :
+                PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) |
              (((dat_datctl.control2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) == 0)? 0 :
                 PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
              (((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/testdata/testinput2    2020-01-22 17:50:12 UTC (rev 1206)
@@ -5793,4 +5793,17 @@
 /^((\1+)(?C)|\d)+133X$/
     111133X\=callout_capture


+/abc/replace=xyz,substitute_replacement_only
+    123abc456
+
+/a(?<ONE>b)c(?<TWO>d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only
+    "abcde-abcde-"
+     
+/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only
+    abcdefabcpqr                
+    abxyzpqrabcxyz              
+    12abc34xyz99abc55\=substitute_stop=2                          
+    12abc34xyz99abc55\=substitute_skip=1
+    12abc34xyz99abc55\=substitute_skip=2
+
 # End of testinput2


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2020-01-15 16:50:45 UTC (rev 1205)
+++ code/trunk/testdata/testoutput2    2020-01-22 17:50:12 UTC (rev 1206)
@@ -17503,6 +17503,39 @@
  1: 11
  2: 11


+/abc/replace=xyz,substitute_replacement_only
+    123abc456
+ 1: xyz
+
+/a(?<ONE>b)c(?<TWO>d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only
+    "abcde-abcde-"
+ 2: Xb+dZXb+dZ
+     
+/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only
+    abcdefabcpqr                
+ 1(2) Old 0 3 "abc" New 0 5 "<abc>"
+ 2(2) Old 6 9 "abc" New 5 10 "<abc>"
+ 2: <abc><abc>
+    abxyzpqrabcxyz              
+ 1(1) Old 2 5 "xyz" New 0 5 "<xyz>"
+ 2(2) Old 8 11 "abc" New 5 10 "<abc>"
+ 3(1) Old 11 14 "xyz" New 10 15 "<xyz>"
+ 3: <xyz><abc><xyz>
+    12abc34xyz99abc55\=substitute_stop=2                          
+ 1(2) Old 2 5 "abc" New 0 5 "<abc>"
+ 2(1) Old 7 10 "xyz" New 5 10 "<xyz> STOPPED"
+ 2: <abc>
+    12abc34xyz99abc55\=substitute_skip=1
+ 1(2) Old 2 5 "abc" New 0 5 "<abc> SKIPPED"
+ 2(1) Old 7 10 "xyz" New 0 5 "<xyz>"
+ 3(2) Old 12 15 "abc" New 5 10 "<abc>"
+ 3: <xyz><abc>
+    12abc34xyz99abc55\=substitute_skip=2
+ 1(2) Old 2 5 "abc" New 0 5 "<abc>"
+ 2(1) Old 7 10 "xyz" New 5 10 "<xyz> SKIPPED"
+ 3(2) Old 12 15 "abc" New 5 10 "<abc>"
+ 3: <abc><abc>
+
 # End of testinput2
 Error -70: PCRE2_ERROR_BADDATA (unknown error number)
 Error -62: bad serialized data