[Pcre-svn] [1028] code/trunk: Implement PCRE2_COPY_MATCHED

Autore: Subversion repository
Data:
To: pcre-svn
Oggetto: [Pcre-svn] [1028] code/trunk: Implement PCRE2_COPY_MATCHED_SUBJECT.

Revision: 1028

          http://www.exim.org/viewvc/pcre2?view=rev&revision=1028
Author:   ph10
Date:     2018-10-17 09:33:38 +0100 (Wed, 17 Oct 2018)
Log Message:
-----------
Implement PCRE2_COPY_MATCHED_SUBJECT.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/html/pcre2_dfa_match.html
    code/trunk/doc/html/pcre2_match.html
    code/trunk/doc/html/pcre2_match_data_free.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2jit.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_dfa_match.3
    code/trunk/doc/pcre2_match.3
    code/trunk/doc/pcre2_match_data_free.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2jit.3
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_dfa_match.c
    code/trunk/src/pcre2_internal.h
    code/trunk/src/pcre2_intmodedep.h
    code/trunk/src/pcre2_jit_match.c
    code/trunk/src/pcre2_match.c
    code/trunk/src/pcre2_match_data.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput17
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput17
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput6

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/ChangeLog    2018-10-17 08:33:38 UTC (rev 1028)
@@ -37,7 +37,11 @@
 ranges such as a-z in EBCDIC environments. The original code probably never 
 worked, though there were no bug reports.

+10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
+pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
+path.

+
Version 10.32 10-September-2018
-------------------------------

Modified: code/trunk/doc/html/pcre2_dfa_match.html
===================================================================
--- code/trunk/doc/html/pcre2_dfa_match.html    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/html/pcre2_dfa_match.html    2018-10-17 08:33:38 UTC (rev 1028)
@@ -51,6 +51,8 @@
 characters. The options are:
 <pre>
   PCRE2_ANCHORED          Match only at the first position
+  PCRE2_COPY_MATCHED_SUBJECT
+                          On success, make a private subject copy  
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_NOTBOL            Subject is not the beginning of a line
   PCRE2_NOTEOL            Subject is not the end of a line

Modified: code/trunk/doc/html/pcre2_match.html
===================================================================
--- code/trunk/doc/html/pcre2_match.html    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/html/pcre2_match.html    2018-10-17 08:33:38 UTC (rev 1028)
@@ -55,11 +55,13 @@
   Change the backtracking depth limit
   Set custom memory management specifically for the match
 </pre>
-The <i>length</i> and <i>startoffset</i> values are code
-units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
-subject that is terminated by a binary zero code unit. The options are:
+The <i>length</i> and <i>startoffset</i> values are code units, not characters.
+The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
+terminated by a binary zero code unit. The options are:
 <pre>
   PCRE2_ANCHORED          Match only at the first position
+  PCRE2_COPY_MATCHED_SUBJECT
+                          On success, make a private subject copy   
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_NOTBOL            Subject string is not the beginning of a line
   PCRE2_NOTEOL            Subject string is not the end of a line

Modified: code/trunk/doc/html/pcre2_match_data_free.html
===================================================================
--- code/trunk/doc/html/pcre2_match_data_free.html    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/html/pcre2_match_data_free.html    2018-10-17 08:33:38 UTC (rev 1028)
@@ -31,6 +31,11 @@
 with which it was created, or <b>free()</b> if that was not set.
 </P>
 <P>
+If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this 
+match data block, the copy of the subject that was remembered with the block is
+also freed.
+</P>
+<P>
 There is a complete description of the PCRE2 native API in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 page and a description of the POSIX API in the

Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/html/pcre2api.html    2018-10-17 08:33:38 UTC (rev 1028)
@@ -1305,10 +1305,13 @@
 NOTE: When one of the matching functions is called, pointers to the compiled
 pattern and the subject string are set in the match data block so that they can
 be referenced by the substring extraction functions. After running a match, you
-must not free a compiled pattern (or a subject string) until after all
+must not free a compiled pattern or a subject string until after all
 operations on the
 <a href="#matchdatablock">match data block</a>
-have taken place.
+have taken place, unless, in the case of the subject string, you have used the 
+PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
+"Option bits for <b>pcre2_match()</b>"
+<a href="#matchoptions>">below.</a>
 </P>
 <P>
 The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
@@ -2419,7 +2422,10 @@
 and the subject string are set in the match data block so that they can be
 referenced by the extraction functions. After running a match, you must not
 free a compiled pattern or a subject string until after all operations on the
-match data block (for that match) have taken place.
+match data block (for that match) have taken place, unless, in the case of the
+subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
+described in the section entitled "Option bits for <b>pcre2_match()</b>"
+<a href="#matchoptions>">below.</a>
 </P>
 <P>
 When a match data block itself is no longer needed, it should be freed by
@@ -2531,10 +2537,10 @@
 </b><br>
 <P>
 The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
-zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
-Their action is described below.
+zero. The only bits that may be set are PCRE2_ANCHORED, 
+PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
+PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
+PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
 </P>
 <P>
 Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
@@ -2550,6 +2556,22 @@
 matching time. Note that setting the option at match time disables JIT
 matching.
 <pre>
+  PCRE2_COPY_MATCHED_SUBJECT
+</pre>
+By default, a pointer to the subject is remembered in the match data block so 
+that, after a successful match, it can be referenced by the substring 
+extraction functions. This means that the subject's memory must not be freed
+until all such operations are complete. For some applications where the
+lifetime of the subject string is not guaranteed, it may be necessary to make a
+copy of the subject string, but it is wasteful to do this unless the match is
+successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
+subject is copied and the new pointer is remembered in the match data block
+instead of the original subject pointer. The memory allocator that was used for
+the match block itself is used. The copy is automatically freed when
+<b>pcre2_match_data_free()</b> is called to free the match data block. It is also
+automatically freed if the match data block is re-used for another match
+operation.
+<pre>
   PCRE2_ENDANCHORED
 </pre>
 If the PCRE2_ENDANCHORED option is set, any string that <b>pcre2_match()</b>
@@ -2954,7 +2976,8 @@
 If a pattern contains many nested backtracking points, heap memory is used to
 remember them. This error is given when the memory allocation function (default
 or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
-if the amount of memory needed exceeds the heap limit.
+if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is 
+also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
 <pre>
   PCRE2_ERROR_NULL
 </pre>
@@ -3584,11 +3607,12 @@
 </b><br>
 <P>
 The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
-be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
-and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
-for <b>pcre2_match()</b>, so their description is not repeated here.
+be zero. The only bits that may be set are PCRE2_ANCHORED,
+PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
+PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
+PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
+four of these are exactly the same as for <b>pcre2_match()</b>, so their
+description is not repeated here.
 <pre>
   PCRE2_PARTIAL_HARD
   PCRE2_PARTIAL_SOFT
@@ -3732,7 +3756,7 @@
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 21 September 2018
+Last updated: 16 October 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2jit.html
===================================================================
--- code/trunk/doc/html/pcre2jit.html    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/html/pcre2jit.html    2018-10-17 08:33:38 UTC (rev 1028)
@@ -147,9 +147,10 @@
 <br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
 <P>
 The <b>pcre2_match()</b> options that are supported for JIT matching are
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
-PCRE2_ANCHORED option is not supported at match time.
+PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
+PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
+PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
+supported at match time.
 </P>
 <P>
 If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
@@ -402,10 +403,13 @@
 </P>
 <P>
 The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
-the same arguments as <b>pcre2_match()</b>. The return values are also the same,
-plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
-requested that was not compiled. Unsupported option bits (for example,
-PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
+the same arguments as <b>pcre2_match()</b>. However, the subject string must be
+specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
+option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
+PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
+return values are also the same as for <b>pcre2_match()</b>, plus
+PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
+that was not compiled.
 </P>
 <P>
 When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
@@ -434,7 +438,7 @@
 </P>
 <br><a name="SEC13" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 28 June 2018
+Last updated: 16 October 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/pcre2.txt    2018-10-17 08:33:38 UTC (rev 1028)
@@ -1302,9 +1302,11 @@
        NOTE: When one of the matching functions is  called,  pointers  to  the
        compiled pattern and the subject string are set in the match data block
        so that they can be referenced by the substring  extraction  functions.
-       After  running a match, you must not free a compiled pattern (or a sub-
-       ject string) until after all operations on the match  data  block  have
-       taken place.
+       After  running  a match, you must not free a compiled pattern or a sub-
+       ject string until after all operations on the  match  data  block  have
+       taken  place,  unless, in the case of the subject string, you have used
+       the PCRE2_COPY_MATCHED_SUBJECT option, which is described in  the  sec-
+       tion entitled "Option bits for pcre2_match()" below.

        The  options argument for pcre2_compile() contains various bit settings
        that affect the compilation. It  should  be  zero  if  no  options  are
@@ -2388,7 +2390,9 @@
        they can be referenced by the extraction  functions.  After  running  a
        match,  you  must not free a compiled pattern or a subject string until
        after all operations on the match data  block  (for  that  match)  have
-       taken place.
+       taken  place,  unless, in the case of the subject string, you have used
+       the PCRE2_COPY_MATCHED_SUBJECT option, which is described in  the  sec-
+       tion entitled "Option bits for pcre2_match()" below.

        When  a match data block itself is no longer needed, it should be freed
        by calling pcre2_match_data_free(). If this function is called  with  a
@@ -2488,25 +2492,43 @@
    Option bits for pcre2_match()

        The unused bits of the options argument for pcre2_match() must be zero.
-       The  only  bits  that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
-       PCRE2_NOTBOL,  PCRE2_NOTEOL,  PCRE2_NOTEMPTY,   PCRE2_NOTEMPTY_ATSTART,
-       PCRE2_NO_JIT,  PCRE2_NO_UTF_CHECK,  PCRE2_PARTIAL_HARD,  and PCRE2_PAR-
-       TIAL_SOFT.  Their action is described below.
+       The    only    bits    that    may    be    set   are   PCRE2_ANCHORED,
+       PCRE2_COPY_MATCHED_SUBJECT,      PCRE2_ENDANCHORED,       PCRE2_NOTBOL,
+       PCRE2_NOTEOL,   PCRE2_NOTEMPTY,  PCRE2_NOTEMPTY_ATSTART,  PCRE2_NO_JIT,
+       PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and  PCRE2_PARTIAL_SOFT.  Their
+       action is described below.

-       Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is  not  sup-
-       ported  by  the just-in-time (JIT) compiler. If it is set, JIT matching
-       is disabled and the interpretive code in pcre2_match()  is  run.  Apart
-       from  PCRE2_NO_JIT (obviously), the remaining options are supported for
+       Setting  PCRE2_ANCHORED  or PCRE2_ENDANCHORED at match time is not sup-
+       ported by the just-in-time (JIT) compiler. If it is set,  JIT  matching
+       is  disabled  and  the interpretive code in pcre2_match() is run. Apart
+       from PCRE2_NO_JIT (obviously), the remaining options are supported  for
        JIT matching.

          PCRE2_ANCHORED

        The PCRE2_ANCHORED option limits pcre2_match() to matching at the first
-       matching  position.  If  a pattern was compiled with PCRE2_ANCHORED, or
-       turned out to be anchored by virtue of its contents, it cannot be  made
-       unachored  at matching time. Note that setting the option at match time
+       matching position. If a pattern was compiled  with  PCRE2_ANCHORED,  or
+       turned  out to be anchored by virtue of its contents, it cannot be made
+       unachored at matching time. Note that setting the option at match  time
        disables JIT matching.

+         PCRE2_COPY_MATCHED_SUBJECT
+
+       By  default,  a  pointer to the subject is remembered in the match data
+       block so that, after a successful match, it can be  referenced  by  the
+       substring  extraction  functions.  This means that the subject's memory
+       must not be freed until all such  operations  are  complete.  For  some
+       applications  where  the  lifetime of the subject string is not guaran-
+       teed, it may be necessary to make a copy of the subject string, but  it
+       is wasteful to do this unless the match is successful. After a success-
+       ful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the subject is  copied
+       and  the  new  pointer is remembered in the match data block instead of
+       the original subject pointer. The memory allocator that  was  used  for
+       the  match  block  itself is used. The copy is automatically freed when
+       pcre2_match_data_free() is called to free the match data block.  It  is
+       also automatically freed if the match data block is re-used for another
+       match operation.
+
          PCRE2_ENDANCHORED

        If the PCRE2_ENDANCHORED option is set, any string  that  pcre2_match()
@@ -2881,7 +2903,8 @@
        used to remember them. This error is given when the  memory  allocation
        function  (default  or  custom)  fails.  Note  that  a different error,
        PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed  exceeds
-       the heap limit.
+       the    heap   limit.   PCRE2_ERROR_NOMEMORY   is   also   returned   if
+       PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.

          PCRE2_ERROR_NULL

@@ -2889,12 +2912,12 @@

          PCRE2_ERROR_RECURSELOOP

-       This  error  is  returned  when  pcre2_match() detects a recursion loop
-       within the pattern. Specifically, it means that either the  whole  pat-
+       This error is returned when  pcre2_match()  detects  a  recursion  loop
+       within  the  pattern. Specifically, it means that either the whole pat-
        tern or a subpattern has been called recursively for the second time at
-       the same position in the subject  string.  Some  simple  patterns  that
-       might  do  this are detected and faulted at compile time, but more com-
-       plicated cases, in particular mutual recursions between  two  different
+       the  same  position  in  the  subject string. Some simple patterns that
+       might do this are detected and faulted at compile time, but  more  com-
+       plicated  cases,  in particular mutual recursions between two different
        subpatterns, cannot be detected until matching is attempted.

@@ -2903,20 +2926,20 @@
        int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
          PCRE2_SIZE bufflen);

-       A  text  message  for  an  error code from any PCRE2 function (compile,
-       match, or auxiliary) can be obtained  by  calling  pcre2_get_error_mes-
-       sage().  The  code  is passed as the first argument, with the remaining
-       two arguments specifying a code unit buffer  and  its  length  in  code
-       units,  into  which the text message is placed. The message is returned
-       in code units of the appropriate width for the library  that  is  being
+       A text message for an error code  from  any  PCRE2  function  (compile,
+       match,  or  auxiliary)  can be obtained by calling pcre2_get_error_mes-
+       sage(). The code is passed as the first argument,  with  the  remaining
+       two  arguments  specifying  a  code  unit buffer and its length in code
+       units, into which the text message is placed. The message  is  returned
+       in  code  units  of the appropriate width for the library that is being
        used.

-       The  returned message is terminated with a trailing zero, and the func-
-       tion returns the number of code  units  used,  excluding  the  trailing
+       The returned message is terminated with a trailing zero, and the  func-
+       tion  returns  the  number  of  code units used, excluding the trailing
        zero.  If  the  error  number  is  unknown,  the  negative  error  code
-       PCRE2_ERROR_BADDATA is returned. If the buffer is too small,  the  mes-
-       sage  is  truncated  (but still with a trailing zero), and the negative
-       error code PCRE2_ERROR_NOMEMORY is returned.  None of the messages  are
+       PCRE2_ERROR_BADDATA  is  returned. If the buffer is too small, the mes-
+       sage is truncated (but still with a trailing zero),  and  the  negative
+       error  code PCRE2_ERROR_NOMEMORY is returned.  None of the messages are
        very long; a buffer size of 120 code units is ample.

@@ -2935,39 +2958,39 @@

        void pcre2_substring_free(PCRE2_UCHAR *buffer);

-       Captured  substrings  can  be accessed directly by using the ovector as
+       Captured substrings can be accessed directly by using  the  ovector  as
        described above.  For convenience, auxiliary functions are provided for
-       extracting   captured  substrings  as  new,  separate,  zero-terminated
+       extracting  captured  substrings  as  new,  separate,   zero-terminated
        strings. A substring that contains a binary zero is correctly extracted
-       and  has  a  further  zero  added on the end, but the result is not, of
+       and has a further zero added on the end, but  the  result  is  not,  of
        course, a C string.

        The functions in this section identify substrings by number. The number
        zero refers to the entire matched substring, with higher numbers refer-
-       ring to substrings captured by parenthesized groups.  After  a  partial
-       match,  only  substring  zero  is  available. An attempt to extract any
-       other substring gives the error PCRE2_ERROR_PARTIAL. The  next  section
+       ring  to  substrings  captured by parenthesized groups. After a partial
+       match, only substring zero is available.  An  attempt  to  extract  any
+       other  substring  gives the error PCRE2_ERROR_PARTIAL. The next section
        describes similar functions for extracting captured substrings by name.

-       If  a  pattern uses the \K escape sequence within a positive assertion,
+       If a pattern uses the \K escape sequence within a  positive  assertion,
        the reported start of a successful match can be greater than the end of
-       the  match.   For  example,  if the pattern (?=ab\K) is matched against
-       "ab", the start and end offset values for the match are  2  and  0.  In
-       this  situation,  calling  these functions with a zero substring number
+       the match.  For example, if the pattern  (?=ab\K)  is  matched  against
+       "ab",  the  start  and  end offset values for the match are 2 and 0. In
+       this situation, calling these functions with a  zero  substring  number
        extracts a zero-length empty string.

-       You can find the length in code units of a captured  substring  without
-       extracting  it  by calling pcre2_substring_length_bynumber(). The first
-       argument is a pointer to the match data block, the second is the  group
-       number,  and the third is a pointer to a variable into which the length
-       is placed. If you just want to know whether or not  the  substring  has
+       You  can  find the length in code units of a captured substring without
+       extracting it by calling pcre2_substring_length_bynumber().  The  first
+       argument  is a pointer to the match data block, the second is the group
+       number, and the third is a pointer to a variable into which the  length
+       is  placed.  If  you just want to know whether or not the substring has
        been captured, you can pass the third argument as NULL.

-       The  pcre2_substring_copy_bynumber()  function  copies  a captured sub-
-       string into a supplied buffer,  whereas  pcre2_substring_get_bynumber()
-       copies  it  into  new memory, obtained using the same memory allocation
-       function that was used for the match data block. The  first  two  argu-
-       ments  of  these  functions are a pointer to the match data block and a
+       The pcre2_substring_copy_bynumber() function  copies  a  captured  sub-
+       string  into  a supplied buffer, whereas pcre2_substring_get_bynumber()
+       copies it into new memory, obtained using the  same  memory  allocation
+       function  that  was  used for the match data block. The first two argu-
+       ments of these functions are a pointer to the match data  block  and  a
        capturing group number.

        The final arguments of pcre2_substring_copy_bynumber() are a pointer to
@@ -2976,25 +2999,25 @@
        for the extracted substring, excluding the terminating zero.

        For pcre2_substring_get_bynumber() the third and fourth arguments point
-       to variables that are updated with a pointer to the new memory and  the
-       number  of  code units that comprise the substring, again excluding the
-       terminating zero. When the substring is no longer  needed,  the  memory
+       to  variables that are updated with a pointer to the new memory and the
+       number of code units that comprise the substring, again  excluding  the
+       terminating  zero.  When  the substring is no longer needed, the memory
        should be freed by calling pcre2_substring_free().

-       The  return  value  from  all these functions is zero for success, or a
-       negative error code. If the pattern match  failed,  the  match  failure
-       code  is  returned.   If  a  substring number greater than zero is used
-       after a partial match, PCRE2_ERROR_PARTIAL is returned. Other  possible
+       The return value from all these functions is zero  for  success,  or  a
+       negative  error  code.  If  the pattern match failed, the match failure
+       code is returned.  If a substring number  greater  than  zero  is  used
+       after  a partial match, PCRE2_ERROR_PARTIAL is returned. Other possible
        error codes are:

          PCRE2_ERROR_NOMEMORY

-       The  buffer  was  too small for pcre2_substring_copy_bynumber(), or the
+       The buffer was too small for  pcre2_substring_copy_bynumber(),  or  the
        attempt to get memory failed for pcre2_substring_get_bynumber().

          PCRE2_ERROR_NOSUBSTRING

-       There is no substring with that number in the  pattern,  that  is,  the
+       There  is  no  substring  with that number in the pattern, that is, the
        number is greater than the number of capturing parentheses.

          PCRE2_ERROR_UNAVAILABLE
@@ -3005,8 +3028,8 @@

          PCRE2_ERROR_UNSET

-       The  substring  did  not  participate in the match. For example, if the
-       pattern is (abc)|(def) and the subject is "def", and the  ovector  con-
+       The substring did not participate in the match.  For  example,  if  the
+       pattern  is  (abc)|(def) and the subject is "def", and the ovector con-
        tains at least two capturing slots, substring number 1 is unset.

@@ -3017,32 +3040,32 @@

        void pcre2_substring_list_free(PCRE2_SPTR *list);

-       The  pcre2_substring_list_get()  function  extracts  all available sub-
-       strings and builds a list of pointers to  them.  It  also  (optionally)
-       builds  a  second  list  that  contains  their lengths (in code units),
+       The pcre2_substring_list_get() function  extracts  all  available  sub-
+       strings  and  builds  a  list of pointers to them. It also (optionally)
+       builds a second list that  contains  their  lengths  (in  code  units),
        excluding a terminating zero that is added to each of them. All this is
        done in a single block of memory that is obtained using the same memory
        allocation function that was used to get the match data block.

-       This function must be called only after a successful match.  If  called
+       This  function  must be called only after a successful match. If called
        after a partial match, the error code PCRE2_ERROR_PARTIAL is returned.

-       The  address of the memory block is returned via listptr, which is also
+       The address of the memory block is returned via listptr, which is  also
        the start of the list of string pointers. The end of the list is marked
-       by  a  NULL pointer. The address of the list of lengths is returned via
-       lengthsptr. If your strings do not contain binary zeros and you do  not
+       by a NULL pointer. The address of the list of lengths is  returned  via
+       lengthsptr.  If your strings do not contain binary zeros and you do not
        therefore need the lengths, you may supply NULL as the lengthsptr argu-
-       ment to disable the creation of a list of lengths.  The  yield  of  the
-       function  is zero if all went well, or PCRE2_ERROR_NOMEMORY if the mem-
-       ory block could not be obtained. When the list is no longer needed,  it
+       ment  to  disable  the  creation of a list of lengths. The yield of the
+       function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the  mem-
+       ory  block could not be obtained. When the list is no longer needed, it
        should be freed by calling pcre2_substring_list_free().

        If this function encounters a substring that is unset, which can happen
-       when capturing subpattern number n+1 matches some part of the  subject,
-       but  subpattern n has not been used at all, it returns an empty string.
-       This can be distinguished  from  a  genuine  zero-length  substring  by
+       when  capturing subpattern number n+1 matches some part of the subject,
+       but subpattern n has not been used at all, it returns an empty  string.
+       This  can  be  distinguished  from  a  genuine zero-length substring by
        inspecting  the  appropriate  offset  in  the  ovector,  which  contain
-       PCRE2_UNSET  for   unset   substrings,   or   by   calling   pcre2_sub-
+       PCRE2_UNSET   for   unset   substrings,   or   by   calling  pcre2_sub-
        string_length_bynumber().

@@ -3062,39 +3085,39 @@

        void pcre2_substring_free(PCRE2_UCHAR *buffer);

-       To  extract a substring by name, you first have to find associated num-
+       To extract a substring by name, you first have to find associated  num-
        ber.  For example, for this pattern:

          (a+)b(?<xxx>\d+)...

        the number of the subpattern called "xxx" is 2. If the name is known to
-       be  unique  (PCRE2_DUPNAMES  was not set), you can find the number from
+       be unique (PCRE2_DUPNAMES was not set), you can find  the  number  from
        the name by calling pcre2_substring_number_from_name(). The first argu-
-       ment  is the compiled pattern, and the second is the name. The yield of
+       ment is the compiled pattern, and the second is the name. The yield  of
        the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there
-       is  no  subpattern  of  that  name, or PCRE2_ERROR_NOUNIQUESUBSTRING if
-       there is more than one subpattern of that name. Given the  number,  you
-       can  extract the substring directly from the ovector, or use one of the
+       is no subpattern of  that  name,  or  PCRE2_ERROR_NOUNIQUESUBSTRING  if
+       there  is  more than one subpattern of that name. Given the number, you
+       can extract the substring directly from the ovector, or use one of  the
        "bynumber" functions described above.

-       For convenience, there are also "byname" functions that  correspond  to
-       the  "bynumber"  functions,  the  only difference being that the second
-       argument is a name instead of a number. If PCRE2_DUPNAMES  is  set  and
+       For  convenience,  there are also "byname" functions that correspond to
+       the "bynumber" functions, the only difference  being  that  the  second
+       argument  is  a  name instead of a number. If PCRE2_DUPNAMES is set and
        there are duplicate names, these functions scan all the groups with the
        given name, and return the first named string that is set.

-       If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING  is
-       returned.  If  all  groups  with the name have numbers that are greater
-       than the number of slots in  the  ovector,  PCRE2_ERROR_UNAVAILABLE  is
-       returned.  If  there  is at least one group with a slot in the ovector,
+       If  there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
+       returned. If all groups with the name have  numbers  that  are  greater
+       than  the  number  of  slots in the ovector, PCRE2_ERROR_UNAVAILABLE is
+       returned. If there is at least one group with a slot  in  the  ovector,
        but no group is found to be set, PCRE2_ERROR_UNSET is returned.

        Warning: If the pattern uses the (?| feature to set up multiple subpat-
-       terns  with  the  same number, as described in the section on duplicate
-       subpattern numbers in the pcre2pattern page, you cannot  use  names  to
-       distinguish  the  different subpatterns, because names are not included
-       in the compiled code. The matching process uses only numbers. For  this
-       reason,  the  use of different names for subpatterns of the same number
+       terns with the same number, as described in the  section  on  duplicate
+       subpattern  numbers  in  the pcre2pattern page, you cannot use names to
+       distinguish the different subpatterns, because names are  not  included
+       in  the compiled code. The matching process uses only numbers. For this
+       reason, the use of different names for subpatterns of the  same  number
        causes an error at compile time.

@@ -3107,54 +3130,54 @@
          PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
          PCRE2_SIZE *outlengthptr);

-       This function calls pcre2_match() and then makes a copy of the  subject
-       string  in  outputbuffer, replacing one or more parts that were matched
+       This  function calls pcre2_match() and then makes a copy of the subject
+       string in outputbuffer, replacing one or more parts that  were  matched
        with the replacement string, whose length is supplied in rlength.  This
-       can  be  given  as  PCRE2_ZERO_TERMINATED for a zero-terminated string.
-       The default is to perform just one replacement, but there is an  option
-       that  requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
+       can be given as PCRE2_ZERO_TERMINATED  for  a  zero-terminated  string.
+       The  default is to perform just one replacement, but there is an option
+       that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL  below
        for details).

-       Matches in which a \K item in a lookahead in  the  pattern  causes  the
-       match  to  end  before it starts are not supported, and give rise to an
+       Matches  in  which  a  \K item in a lookahead in the pattern causes the
+       match to end before it starts are not supported, and give  rise  to  an
        error return. For global replacements, matches in which \K in a lookbe-
-       hind  causes the match to start earlier than the point that was reached
+       hind causes the match to start earlier than the point that was  reached
        in the previous iteration are also not supported.

-       The first seven arguments of pcre2_substitute() are  the  same  as  for
+       The  first  seven  arguments  of pcre2_substitute() are the same as for
        pcre2_match(), except that the partial matching options are not permit-
-       ted, and match_data may be passed as NULL, in which case a  match  data
-       block  is obtained and freed within this function, using memory manage-
-       ment functions from the match context, if provided, or else those  that
+       ted,  and  match_data may be passed as NULL, in which case a match data
+       block is obtained and freed within this function, using memory  manage-
+       ment  functions from the match context, if provided, or else those that
        were used to allocate memory for the compiled code.

-       If  an  external  match_data block is provided, its contents afterwards
-       are those set by the final call to pcre2_match(). For  global  changes,
-       this  will  have ended in a matching error. The contents of the ovector
+       If an external match_data block is provided,  its  contents  afterwards
+       are  those  set by the final call to pcre2_match(). For global changes,
+       this will have ended in a matching error. The contents of  the  ovector
        within the match data block may or may not have been changed.

-       The outlengthptr argument must point to a variable  that  contains  the
-       length,  in  code  units, of the output buffer. If the function is suc-
-       cessful, the value is updated to contain the length of the new  string,
+       The  outlengthptr  argument  must point to a variable that contains the
+       length, in code units, of the output buffer. If the  function  is  suc-
+       cessful,  the value is updated to contain the length of the new string,
        excluding the trailing zero that is automatically added.

-       If  the  function  is  not  successful,  the value set via outlengthptr
-       depends on the type of error. For  syntax  errors  in  the  replacement
-       string,  the  value  is  the offset in the replacement string where the
-       error was detected. For other  errors,  the  value  is  PCRE2_UNSET  by
-       default.  This  includes the case of the output buffer being too small,
-       unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see  below),  in  which
-       case  the  value  is the minimum length needed, including space for the
-       trailing zero. Note that in  order  to  compute  the  required  length,
-       pcre2_substitute()  has  to  simulate  all  the  matching  and copying,
+       If the function is not  successful,  the  value  set  via  outlengthptr
+       depends  on  the  type  of  error. For syntax errors in the replacement
+       string, the value is the offset in the  replacement  string  where  the
+       error  was  detected.  For  other  errors,  the value is PCRE2_UNSET by
+       default. This includes the case of the output buffer being  too  small,
+       unless  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  is  set (see below), in which
+       case the value is the minimum length needed, including  space  for  the
+       trailing  zero.  Note  that  in  order  to compute the required length,
+       pcre2_substitute() has  to  simulate  all  the  matching  and  copying,
        instead of giving an error return as soon as the buffer overflows. Note
        also that the length is in code units, not bytes.

-       In  the replacement string, which is interpreted as a UTF string in UTF
-       mode, and is checked for UTF  validity  unless  the  PCRE2_NO_UTF_CHECK
+       In the replacement string, which is interpreted as a UTF string in  UTF
+       mode,  and  is  checked  for UTF validity unless the PCRE2_NO_UTF_CHECK
        option is set, a dollar character is an escape character that can spec-
-       ify the insertion of characters from capturing  groups  or  names  from
-       (*MARK)  or other control verbs in the pattern. The following forms are
+       ify  the  insertion  of  characters from capturing groups or names from
+       (*MARK) or other control verbs in the pattern. The following forms  are
        always recognized:

          $$                  insert a dollar character
@@ -3161,19 +3184,19 @@
          $<n> or ${<n>}      insert the contents of group <n>
          $*MARK or ${*MARK}  insert a control verb name

-       Either a group number or a group name  can  be  given  for  <n>.  Curly
-       brackets  are  required only if the following character would be inter-
+       Either  a  group  number  or  a  group name can be given for <n>. Curly
+       brackets are required only if the following character would  be  inter-
        preted as part of the number or name. The number may be zero to include
-       the  entire  matched  string.   For  example,  if  the pattern a(b)c is
-       matched with "=abc=" and the replacement string "+$1$0$1+", the  result
+       the entire matched string.   For  example,  if  the  pattern  a(b)c  is
+       matched  with "=abc=" and the replacement string "+$1$0$1+", the result
        is "=+babcb+=".

        $*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
-       (*MARK), (*PRUNE), or (*THEN) on the matching path  that  has  a  name.
-       (*MARK)  must  always include a name, but the other verbs need not. For
+       (*MARK),  (*PRUNE),  or  (*THEN)  on the matching path that has a name.
+       (*MARK) must always include a name, but the other verbs need  not.  For
        example, in the case of (*MARK:A)(*PRUNE) the name inserted is "A", but
-       for  (*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be
-       used to perform simple simultaneous substitutions,  as  this  pcre2test
+       for (*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can  be
+       used  to  perform  simple simultaneous substitutions, as this pcre2test
        example shows:

          /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
@@ -3180,19 +3203,19 @@
              apple lemon
           2: pear orange

-       As  well as the usual options for pcre2_match(), a number of additional
+       As well as the usual options for pcre2_match(), a number of  additional
        options can be set in the options argument of pcre2_substitute().

        PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
-       string,  replacing every matching substring. If this option is not set,
-       only the first matching substring is replaced. The search  for  matches
-       takes  place in the original subject string (that is, previous replace-
-       ments do not affect it).  Iteration is  implemented  by  advancing  the
-       startoffset  value  for  each search, which is always passed the entire
+       string, replacing every matching substring. If this option is not  set,
+       only  the  first matching substring is replaced. The search for matches
+       takes place in the original subject string (that is, previous  replace-
+       ments  do  not  affect  it).  Iteration is implemented by advancing the
+       startoffset value for each search, which is always  passed  the  entire
        subject string. If an offset limit is set in the match context, search-
        ing stops when that limit is reached.

-       You  can  restrict  the effect of a global substitution to a portion of
+       You can restrict the effect of a global substitution to  a  portion  of
        the subject string by setting either or both of startoffset and an off-
        set limit. Here is a pcre2test example:

@@ -3200,87 +3223,87 @@
          ABC ABC ABC ABC\=offset=3,offset_limit=12
           2: ABC A!C A!C ABC

-       When  continuing  with  global substitutions after matching a substring
+       When continuing with global substitutions after  matching  a  substring
        with zero length, an attempt to find a non-empty match at the same off-
        set is performed.  If this is not successful, the offset is advanced by
        one character except when CRLF is a valid newline sequence and the next
-       two  characters are CR, LF. In this case, the offset is advanced by two
+       two characters are CR, LF. In this case, the offset is advanced by  two
        characters.

-       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when  the  output
+       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  changes  what happens when the output
        buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
-       ORY immediately. If this option  is  set,  however,  pcre2_substitute()
+       ORY  immediately.  If  this  option is set, however, pcre2_substitute()
        continues to go through the motions of matching and substituting (with-
-       out, of course, writing anything) in order to compute the size of  buf-
-       fer  that  is  needed.  This  value is passed back via the outlengthptr
-       variable,   with   the   result   of   the   function    still    being
+       out,  of course, writing anything) in order to compute the size of buf-
+       fer that is needed. This value is  passed  back  via  the  outlengthptr
+       variable,    with    the   result   of   the   function   still   being
        PCRE2_ERROR_NOMEMORY.

-       Passing  a  buffer  size  of zero is a permitted way of finding out how
-       much memory is needed for given substitution. However, this  does  mean
+       Passing a buffer size of zero is a permitted way  of  finding  out  how
+       much  memory  is needed for given substitution. However, this does mean
        that the entire operation is carried out twice. Depending on the appli-
-       cation, it may be more efficient to allocate a large  buffer  and  free
-       the   excess   afterwards,   instead  of  using  PCRE2_SUBSTITUTE_OVER-
+       cation,  it  may  be more efficient to allocate a large buffer and free
+       the  excess  afterwards,  instead   of   using   PCRE2_SUBSTITUTE_OVER-
        FLOW_LENGTH.

-       PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references  to  capturing  groups
-       that  do  not appear in the pattern to be treated as unset groups. This
-       option should be used with care, because it means  that  a  typo  in  a
-       group  name  or  number  no  longer  causes the PCRE2_ERROR_NOSUBSTRING
+       PCRE2_SUBSTITUTE_UNKNOWN_UNSET  causes  references  to capturing groups
+       that do not appear in the pattern to be treated as unset  groups.  This
+       option  should  be  used  with  care, because it means that a typo in a
+       group name or  number  no  longer  causes  the  PCRE2_ERROR_NOSUBSTRING
        error.

-       PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing  groups  (including
+       PCRE2_SUBSTITUTE_UNSET_EMPTY  causes  unset capturing groups (including
        unknown  groups  when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)  to  be
-       treated as empty strings when inserted  as  described  above.  If  this
-       option  is  not  set,  an  attempt  to insert an unset group causes the
-       PCRE2_ERROR_UNSET error. This option does not  influence  the  extended
+       treated  as  empty  strings  when  inserted as described above. If this
+       option is not set, an attempt to  insert  an  unset  group  causes  the
+       PCRE2_ERROR_UNSET  error.  This  option does not influence the extended
        substitution syntax described below.

-       PCRE2_SUBSTITUTE_EXTENDED  causes extra processing to be applied to the
-       replacement string. Without this option, only the dollar  character  is
-       special,  and  only  the  group insertion forms listed above are valid.
+       PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to  the
+       replacement  string.  Without this option, only the dollar character is
+       special, and only the group insertion forms  listed  above  are  valid.
        When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:

-       Firstly, backslash in a replacement string is interpreted as an  escape
+       Firstly,  backslash in a replacement string is interpreted as an escape
        character. The usual forms such as \n or \x{ddd} can be used to specify
-       particular character codes, and backslash followed by any  non-alphanu-
-       meric  character  quotes  that character. Extended quoting can be coded
+       particular  character codes, and backslash followed by any non-alphanu-
+       meric character quotes that character. Extended quoting  can  be  coded
        using \Q...\E, exactly as in pattern strings.

-       There are also four escape sequences for forcing the case  of  inserted
-       letters.   The  insertion  mechanism has three states: no case forcing,
+       There  are  also four escape sequences for forcing the case of inserted
+       letters.  The insertion mechanism has three states:  no  case  forcing,
        force upper case, and force lower case. The escape sequences change the
        current state: \U and \L change to upper or lower case forcing, respec-
-       tively, and \E (when not terminating a \Q quoted sequence)  reverts  to
-       no  case  forcing. The sequences \u and \l force the next character (if
-       it is a letter) to upper or lower  case,  respectively,  and  then  the
+       tively,  and  \E (when not terminating a \Q quoted sequence) reverts to
+       no case forcing. The sequences \u and \l force the next  character  (if
+       it  is  a  letter)  to  upper or lower case, respectively, and then the
        state automatically reverts to no case forcing. Case forcing applies to
        all inserted  characters, including those from captured groups and let-
        ters within \Q...\E quoted sequences.

        Note that case forcing sequences such as \U...\E do not nest. For exam-
-       ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc";  the  final
+       ple,  the  result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
        \E has no effect.

-       The  second  effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
-       flexibility to group substitution. The syntax is similar to  that  used
+       The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to  add  more
+       flexibility  to  group substitution. The syntax is similar to that used
        by Bash:

          ${<n>:-<string>}
          ${<n>:+<string1>:<string2>}

-       As  before,  <n> may be a group number or a name. The first form speci-
-       fies a default value. If group <n> is set, its value  is  inserted;  if
-       not,  <string>  is  expanded  and  the result inserted. The second form
-       specifies strings that are expanded and inserted when group <n> is  set
-       or  unset,  respectively. The first form is just a convenient shorthand
+       As before, <n> may be a group number or a name. The first  form  speci-
+       fies  a  default  value. If group <n> is set, its value is inserted; if
+       not, <string> is expanded and the  result  inserted.  The  second  form
+       specifies  strings that are expanded and inserted when group <n> is set
+       or unset, respectively. The first form is just a  convenient  shorthand
        for

          ${<n>:+${<n>}:<string>}

-       Backslash can be used to escape colons and closing  curly  brackets  in
-       the  replacement  strings.  A change of the case forcing state within a
-       replacement string remains  in  force  afterwards,  as  shown  in  this
+       Backslash  can  be  used to escape colons and closing curly brackets in
+       the replacement strings. A change of the case forcing  state  within  a
+       replacement  string  remains  in  force  afterwards,  as  shown in this
        pcre2test example:

          /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
@@ -3289,16 +3312,16 @@
              somebody
           1: HELLO

-       The  PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
-       substitutions.  However,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET   does   cause
+       The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these  extended
+       substitutions.   However,   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  does  cause
        unknown groups in the extended syntax forms to be treated as unset.

-       If  successful,  pcre2_substitute()  returns the number of replacements
+       If successful, pcre2_substitute() returns the  number  of  replacements
        that were made. This may be zero if no matches were found, and is never
        greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.

        In the event of an error, a negative error code is returned. Except for
-       PCRE2_ERROR_NOMATCH   (which   is   never   returned),   errors    from
+       PCRE2_ERROR_NOMATCH    (which   is   never   returned),   errors   from
        pcre2_match() are passed straight back.

        PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
@@ -3305,26 +3328,26 @@
        tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.

        PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
-       ing  an  unknown  substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
+       ing an unknown substring when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)
        when  the  simple  (non-extended)  syntax  is  used  and  PCRE2_SUBSTI-
        TUTE_UNSET_EMPTY is not set.

-       PCRE2_ERROR_NOMEMORY  is  returned  if  the  output  buffer  is not big
+       PCRE2_ERROR_NOMEMORY is returned  if  the  output  buffer  is  not  big
        enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
-       of  buffer  that is needed is returned via outlengthptr. Note that this
+       of buffer that is needed is returned via outlengthptr. Note  that  this
        does not happen by default.

-       PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax  errors  in
+       PCRE2_ERROR_BADREPLACEMENT  is  used for miscellaneous syntax errors in
        the   replacement   string,   with   more   particular   errors   being
-       PCRE2_ERROR_BADREPESCAPE (invalid  escape  sequence),  PCRE2_ERROR_REP-
-       MISSINGBRACE  (closing curly bracket not found), PCRE2_ERROR_BADSUBSTI-
+       PCRE2_ERROR_BADREPESCAPE  (invalid  escape  sequence), PCRE2_ERROR_REP-
+       MISSINGBRACE (closing curly bracket not found),  PCRE2_ERROR_BADSUBSTI-
        TUTION   (syntax   error   in   extended   group   substitution),   and
-       PCRE2_ERROR_BADSUBSPATTERN  (the  pattern match ended before it started
-       or the match started earlier than the current position in the  subject,
+       PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before  it  started
+       or  the match started earlier than the current position in the subject,
        which can happen if \K is used in an assertion).

        As for all PCRE2 errors, a text message that describes the error can be
-       obtained  by  calling  the  pcre2_get_error_message()   function   (see
+       obtained   by   calling  the  pcre2_get_error_message()  function  (see
        "Obtaining a textual error message" above).

    Substitution callouts
@@ -3333,15 +3356,15 @@
          void (*callout_function)(pcre2_substitute_callout_block *, void *),
          void *callout_data);

-       The  pcre2_set_substitution_callout() function can be used to specify a
-       callout function for pcre2_substitute(). This information is passed  in
-       a  match  context.  The callout function is called after each substitu-
-       tion. It is not called for simulated substitutions  that  happen  as  a
-       result  of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout func-
+       The pcre2_set_substitution_callout() function can be used to specify  a
+       callout  function for pcre2_substitute(). This information is passed in
+       a match context. The callout function is called  after  each  substitu-
+       tion.  It  is  not  called for simulated substitutions that happen as a
+       result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout  func-
        tion should not return any value.

        The first argument of the callout function is a pointer to a substitute
-       callout  block structure, which contains the following fields, not nec-
+       callout block structure, which contains the following fields, not  nec-
        essarily in this order:

          uint32_t    version;
@@ -3348,16 +3371,16 @@
          PCRE2_SIZE  input_offsets[2];
          PCRE2_SIZE  output_offsets[2];

-       The version field contains the version number of the block format.  The
-       current  version  is  0.  The version number will increase in future if
-       more fields are added, but the intention is never to remove any of  the
+       The  version field contains the version number of the block format. The
+       current version is 0. The version number will  increase  in  future  if
+       more  fields are added, but the intention is never to remove any of the
        existing fields.

-       The  input_offsets  vector  contains the code unit offsets in the input
+       The input_offsets vector contains the code unit offsets  in  the  input
        string of the matched substring, and the output_offsets vector contains
        the offsets of the replacement in the output string.

-       The  second  argument  of  the  callout function is the value passed as
+       The second argument of the callout function  is  the  value  passed  as
        callout_data when the function was registered.

@@ -3366,56 +3389,56 @@
        int pcre2_substring_nametable_scan(const pcre2_code *code,
          PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);

-       When a pattern is compiled with the PCRE2_DUPNAMES  option,  names  for
-       subpatterns  are  not required to be unique. Duplicate names are always
-       allowed for subpatterns with the same number, created by using the  (?|
-       feature.  Indeed,  if  such subpatterns are named, they are required to
+       When  a  pattern  is compiled with the PCRE2_DUPNAMES option, names for
+       subpatterns are not required to be unique. Duplicate names  are  always
+       allowed  for subpatterns with the same number, created by using the (?|
+       feature. Indeed, if such subpatterns are named, they  are  required  to
        use the same names.

        Normally, patterns with duplicate names are such that in any one match,
-       only  one of the named subpatterns participates. An example is shown in
+       only one of the named subpatterns participates. An example is shown  in
        the pcre2pattern documentation.

-       When  duplicates   are   present,   pcre2_substring_copy_byname()   and
-       pcre2_substring_get_byname()  return  the first substring corresponding
-       to  the  given  name  that  is  set.  Only   if   none   are   set   is
-       PCRE2_ERROR_UNSET  is  returned. The pcre2_substring_number_from_name()
+       When   duplicates   are   present,   pcre2_substring_copy_byname()  and
+       pcre2_substring_get_byname() return the first  substring  corresponding
+       to   the   given   name   that   is  set.  Only  if  none  are  set  is
+       PCRE2_ERROR_UNSET is returned.  The  pcre2_substring_number_from_name()
        function returns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are
        duplicate names.

-       If  you want to get full details of all captured substrings for a given
-       name, you must use the pcre2_substring_nametable_scan()  function.  The
-       first  argument is the compiled pattern, and the second is the name. If
-       the third and fourth arguments are NULL, the function returns  a  group
+       If you want to get full details of all captured substrings for a  given
+       name,  you  must use the pcre2_substring_nametable_scan() function. The
+       first argument is the compiled pattern, and the second is the name.  If
+       the  third  and fourth arguments are NULL, the function returns a group
        number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.

        When the third and fourth arguments are not NULL, they must be pointers
-       to variables that are updated by the function. After it has  run,  they
+       to  variables  that are updated by the function. After it has run, they
        point to the first and last entries in the name-to-number table for the
-       given name, and the function returns the length of each entry  in  code
-       units.  In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
+       given  name,  and the function returns the length of each entry in code
+       units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there  are
        no entries for the given name.

        The format of the name table is described above in the section entitled
-       Information  about  a  pattern.  Given all the relevant entries for the
-       name, you can extract each of their numbers,  and  hence  the  captured
+       Information about a pattern. Given all the  relevant  entries  for  the
+       name,  you  can  extract  each of their numbers, and hence the captured
        data.

FINDING ALL POSSIBLE MATCHES AT ONE POSITION

-       The  traditional  matching  function  uses a similar algorithm to Perl,
-       which stops when it finds the first match at a given point in the  sub-
+       The traditional matching function uses a  similar  algorithm  to  Perl,
+       which  stops when it finds the first match at a given point in the sub-
        ject. If you want to find all possible matches, or the longest possible
-       match at a given position,  consider  using  the  alternative  matching
-       function  (see  below) instead. If you cannot use the alternative func-
+       match  at  a  given  position,  consider using the alternative matching
+       function (see below) instead. If you cannot use the  alternative  func-
        tion, you can kludge it up by making use of the callout facility, which
        is described in the pcre2callout documentation.

        What you have to do is to insert a callout right at the end of the pat-
-       tern.  When your callout function is called, extract and save the  cur-
-       rent  matched  substring.  Then return 1, which forces pcre2_match() to
-       backtrack and try other alternatives. Ultimately, when it runs  out  of
+       tern.   When your callout function is called, extract and save the cur-
+       rent matched substring. Then return 1, which  forces  pcre2_match()  to
+       backtrack  and  try other alternatives. Ultimately, when it runs out of
        matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.

@@ -3427,26 +3450,26 @@
          pcre2_match_context *mcontext,
          int *workspace, PCRE2_SIZE wscount);

-       The  function  pcre2_dfa_match()  is  called  to match a subject string
-       against a compiled pattern, using a matching algorithm that  scans  the
+       The function pcre2_dfa_match() is called  to  match  a  subject  string
+       against  a  compiled pattern, using a matching algorithm that scans the
        subject string just once (not counting lookaround assertions), and does
-       not backtrack.  This has different characteristics to the normal  algo-
-       rithm,  and  is not compatible with Perl. Some of the features of PCRE2
-       patterns are not supported.  Nevertheless, there are  times  when  this
-       kind  of  matching  can be useful. For a discussion of the two matching
+       not  backtrack.  This has different characteristics to the normal algo-
+       rithm, and is not compatible with Perl. Some of the features  of  PCRE2
+       patterns  are  not  supported.  Nevertheless, there are times when this
+       kind of matching can be useful. For a discussion of  the  two  matching
        algorithms, and a list of features that pcre2_dfa_match() does not sup-
        port, see the pcre2matching documentation.

-       The  arguments  for  the pcre2_dfa_match() function are the same as for
+       The arguments for the pcre2_dfa_match() function are the  same  as  for
        pcre2_match(), plus two extras. The ovector within the match data block
        is used in a different way, and this is described below. The other com-
-       mon arguments are used in the same way as for pcre2_match(),  so  their
+       mon  arguments  are used in the same way as for pcre2_match(), so their
        description is not repeated here.

-       The  two  additional  arguments provide workspace for the function. The
-       workspace vector should contain at least 20 elements. It  is  used  for
+       The two additional arguments provide workspace for  the  function.  The
+       workspace  vector  should  contain at least 20 elements. It is used for
        keeping  track  of  multiple  paths  through  the  pattern  tree.  More
-       workspace is needed for patterns and subjects where there are a lot  of
+       workspace  is needed for patterns and subjects where there are a lot of
        potential matches.

        Here is an example of a simple call to pcre2_dfa_match():
@@ -3466,13 +3489,14 @@

    Option bits for pcre_dfa_match()

-       The  unused  bits of the options argument for pcre2_dfa_match() must be
-       zero. The only bits that may be set  are  PCRE2_ANCHORED,  PCRE2_ENDAN-
-       CHORED,        PCRE2_NOTBOL,        PCRE2_NOTEOL,       PCRE2_NOTEMPTY,
-       PCRE2_NOTEMPTY_ATSTART,     PCRE2_NO_UTF_CHECK,     PCRE2_PARTIAL_HARD,
-       PCRE2_PARTIAL_SOFT,  PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but
-       the last four of these are exactly the same as  for  pcre2_match(),  so
-       their description is not repeated here.
+       The unused bits of the options argument for pcre2_dfa_match()  must  be
+       zero.   The   only   bits   that   may   be   set  are  PCRE2_ANCHORED,
+       PCRE2_COPY_MATCHED_SUBJECT,      PCRE2_ENDANCHORED,       PCRE2_NOTBOL,
+       PCRE2_NOTEOL,          PCRE2_NOTEMPTY,          PCRE2_NOTEMPTY_ATSTART,
+       PCRE2_NO_UTF_CHECK,       PCRE2_PARTIAL_HARD,       PCRE2_PARTIAL_SOFT,
+       PCRE2_DFA_SHORTEST,  and  PCRE2_DFA_RESTART.  All  but the last four of
+       these are exactly the same as for pcre2_match(), so  their  description
+       is not repeated here.

          PCRE2_PARTIAL_HARD
          PCRE2_PARTIAL_SOFT
@@ -3607,7 +3631,7 @@

REVISION

-       Last updated: 21 September 2018
+       Last updated: 16 October 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -4924,15 +4948,16 @@
UNSUPPORTED OPTIONS AND PATTERN ITEMS

        The pcre2_match() options that  are  supported  for  JIT  matching  are
-       PCRE2_NOTBOL,   PCRE2_NOTEOL,  PCRE2_NOTEMPTY,  PCRE2_NOTEMPTY_ATSTART,
-       PCRE2_NO_UTF_CHECK,  PCRE2_PARTIAL_HARD,  and  PCRE2_PARTIAL_SOFT.  The
-       PCRE2_ANCHORED option is not supported at match time.
+       PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
+       PCRE2_NOTEMPTY_ATSTART,  PCRE2_NO_UTF_CHECK,  PCRE2_PARTIAL_HARD,   and
+       PCRE2_PARTIAL_SOFT.  The  PCRE2_ANCHORED  and PCRE2_ENDANCHORED options
+       are not supported at match time.

-       If  the  PCRE2_NO_JIT option is passed to pcre2_match() it disables the
+       If the PCRE2_NO_JIT option is passed to pcre2_match() it  disables  the
        use of JIT, forcing matching by the interpreter code.

-       The only unsupported pattern items are \C (match a  single  data  unit)
-       when  running in a UTF mode, and a callout immediately before an asser-
+       The  only  unsupported  pattern items are \C (match a single data unit)
+       when running in a UTF mode, and a callout immediately before an  asser-
        tion condition in a conditional group.

@@ -4939,14 +4964,14 @@
RETURN VALUES FROM JIT MATCHING

        When a pattern is matched using JIT matching, the return values are the
-       same  as  those  given by the interpretive pcre2_match() code, with the
-       addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This  means
-       that  the memory used for the JIT stack was insufficient. See "Control-
+       same as those given by the interpretive pcre2_match()  code,  with  the
+       addition  of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means
+       that the memory used for the JIT stack was insufficient. See  "Control-
        ling the JIT stack" below for a discussion of JIT stack usage.

-       The error code PCRE2_ERROR_MATCHLIMIT is returned by the  JIT  code  if
-       searching  a  very large pattern tree goes on for too long, as it is in
-       the same circumstance when JIT is not used, but the details of  exactly
+       The  error  code  PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if
+       searching a very large pattern tree goes on for too long, as it  is  in
+       the  same circumstance when JIT is not used, but the details of exactly
        what is counted are not the same. The PCRE2_ERROR_DEPTHLIMIT error code
        is never returned when JIT matching is used.

@@ -4954,25 +4979,25 @@
CONTROLLING THE JIT STACK

        When the compiled JIT code runs, it needs a block of memory to use as a
-       stack.   By  default, it uses 32KiB on the machine stack. However, some
-       large  or  complicated  patterns  need  more  than  this.   The   error
-       PCRE2_ERROR_JIT_STACKLIMIT  is  given  when  there is not enough stack.
-       Three functions are provided for managing blocks of memory for  use  as
-       JIT  stacks. There is further discussion about the use of JIT stacks in
+       stack.  By default, it uses 32KiB on the machine stack.  However,  some
+       large   or   complicated  patterns  need  more  than  this.  The  error
+       PCRE2_ERROR_JIT_STACKLIMIT is given when there  is  not  enough  stack.
+       Three  functions  are provided for managing blocks of memory for use as
+       JIT stacks. There is further discussion about the use of JIT stacks  in
        the section entitled "JIT stack FAQ" below.

-       The pcre2_jit_stack_create() function creates a JIT  stack.  Its  argu-
-       ments  are  a starting size, a maximum size, and a general context (for
-       memory allocation functions, or NULL for standard  memory  allocation).
+       The  pcre2_jit_stack_create()  function  creates a JIT stack. Its argu-
+       ments are a starting size, a maximum size, and a general  context  (for
+       memory  allocation  functions, or NULL for standard memory allocation).
        It returns a pointer to an opaque structure of type pcre2_jit_stack, or
-       NULL if there is an error. The pcre2_jit_stack_free() function is  used
+       NULL  if there is an error. The pcre2_jit_stack_free() function is used
        to free a stack that is no longer needed. If its argument is NULL, this
-       function returns immediately, without doing anything. (For the  techni-
-       cally  minded: the address space is allocated by mmap or VirtualAlloc.)
-       A maximum stack size of 512KiB to 1MiB should be more than  enough  for
+       function  returns immediately, without doing anything. (For the techni-
+       cally minded: the address space is allocated by mmap or  VirtualAlloc.)
+       A  maximum  stack size of 512KiB to 1MiB should be more than enough for
        any pattern.

-       The  pcre2_jit_stack_assign()  function  specifies which stack JIT code
+       The pcre2_jit_stack_assign() function specifies which  stack  JIT  code
        should use. Its arguments are as follows:

          pcre2_match_context  *mcontext
@@ -4982,7 +5007,7 @@
        The first argument is a pointer to a match context. When this is subse-
        quently passed to a matching function, its information determines which
        JIT stack is used. If this argument is NULL, the function returns imme-
-       diately,  without  doing anything. There are three cases for the values
+       diately, without doing anything. There are three cases for  the  values
        of the other two options:

          (1) If callback is NULL and data is NULL, an internal 32KiB block
@@ -5000,34 +5025,34 @@
              return value must be a valid JIT stack, the result of calling
              pcre2_jit_stack_create().

-       A callback function is obeyed whenever JIT code is about to be run;  it
+       A  callback function is obeyed whenever JIT code is about to be run; it
        is not obeyed when pcre2_match() is called with options that are incom-
-       patible for JIT matching. A callback function can therefore be used  to
-       determine  whether  a  match  operation  was  executed by JIT or by the
+       patible  for JIT matching. A callback function can therefore be used to
+       determine whether a match operation was  executed  by  JIT  or  by  the
        interpreter.

        You may safely use the same JIT stack for more than one pattern (either
-       by  assigning  directly  or  by  callback), as long as the patterns are
+       by assigning directly or by callback), as  long  as  the  patterns  are
        matched sequentially in the same thread. Currently, the only way to set
-       up  non-sequential matches in one thread is to use callouts: if a call-
-       out function starts another match, that match must use a different  JIT
+       up non-sequential matches in one thread is to use callouts: if a  call-
+       out  function starts another match, that match must use a different JIT
        stack to the one used for currently suspended match(es).

-       In  a multithread application, if you do not specify a JIT stack, or if
-       you assign or pass back NULL from  a  callback,  that  is  thread-safe,
-       because  each  thread has its own machine stack. However, if you assign
-       or pass back a non-NULL JIT stack, this must be a different  stack  for
+       In a multithread application, if you do not specify a JIT stack, or  if
+       you  assign  or  pass  back  NULL from a callback, that is thread-safe,
+       because each thread has its own machine stack. However, if  you  assign
+       or  pass  back a non-NULL JIT stack, this must be a different stack for
        each thread so that the application is thread-safe.

-       Strictly  speaking,  even more is allowed. You can assign the same non-
-       NULL stack to a match context that is used by any number  of  patterns,
-       as  long  as  they are not used for matching by multiple threads at the
-       same time. For example, you could use the same stack  in  all  compiled
-       patterns,  with  a global mutex in the callback to wait until the stack
+       Strictly speaking, even more is allowed. You can assign the  same  non-
+       NULL  stack  to a match context that is used by any number of patterns,
+       as long as they are not used for matching by multiple  threads  at  the
+       same  time.  For  example, you could use the same stack in all compiled
+       patterns, with a global mutex in the callback to wait until  the  stack
        is available for use. However, this is an inefficient solution, and not
        recommended.

-       This  is a suggestion for how a multithreaded program that needs to set
+       This is a suggestion for how a multithreaded program that needs to  set
        up non-default JIT stacks might operate:

          During thread initalization
@@ -5039,7 +5064,7 @@
          Use a one-line callback function
            return thread_local_var

-       All the functions described in this section do nothing if  JIT  is  not
+       All  the  functions  described in this section do nothing if JIT is not
        available.

@@ -5048,20 +5073,20 @@
        (1) Why do we need JIT stacks?

        PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack
-       where the local data of the current node is pushed before checking  its
+       where  the local data of the current node is pushed before checking its
        child nodes.  Allocating real machine stack on some platforms is diffi-
        cult. For example, the stack chain needs to be updated every time if we
-       extend  the  stack  on  PowerPC.  Although it is possible, its updating
+       extend the stack on PowerPC.  Although it  is  possible,  its  updating
        time overhead decreases performance. So we do the recursion in memory.

        (2) Why don't we simply allocate blocks of memory with malloc()?

-       Modern operating systems have a  nice  feature:  they  can  reserve  an
+       Modern  operating  systems  have  a  nice  feature: they can reserve an
        address space instead of allocating memory. We can safely allocate mem-
-       ory pages inside this address space, so the stack  could  grow  without
+       ory  pages  inside  this address space, so the stack could grow without
        moving memory data (this is important because of pointers). Thus we can
        allocate 1MiB address space, and use only a single memory page (usually
-       4KiB)  if that is enough. However, we can still grow up to 1MiB anytime
+       4KiB) if that is enough. However, we can still grow up to 1MiB  anytime
        if needed.

        (3) Who "owns" a JIT stack?
@@ -5069,8 +5094,8 @@
        The owner of the stack is the user program, not the JIT studied pattern
        or anything else. The user program must ensure that if a stack is being
        used by pcre2_match(), (that is, it is assigned to a match context that
-       is  passed  to  the  pattern currently running), that stack must not be
-       used by any other threads (to avoid overwriting the same memory  area).
+       is passed to the pattern currently running), that  stack  must  not  be
+       used  by any other threads (to avoid overwriting the same memory area).
        The best practice for multithreaded programs is to allocate a stack for
        each thread, and return this stack through the JIT callback function.

@@ -5078,36 +5103,36 @@

        You can free a JIT stack at any time, as long as it will not be used by
        pcre2_match() again. When you assign the stack to a match context, only
-       a pointer is set. There is no reference counting or  any  other  magic.
+       a  pointer  is  set. There is no reference counting or any other magic.
        You can free compiled patterns, contexts, and stacks in any order, any-
-       time. Just do not call pcre2_match() with a match context  pointing  to
+       time.  Just  do not call pcre2_match() with a match context pointing to
        an already freed stack, as that will cause SEGFAULT. (Also, do not free
-       a stack currently used by pcre2_match() in  another  thread).  You  can
-       also  replace the stack in a context at any time when it is not in use.
+       a  stack  currently  used  by pcre2_match() in another thread). You can
+       also replace the stack in a context at any time when it is not in  use.
        You should free the previous stack before assigning a replacement.

-       (5) Should I allocate/free a  stack  every  time  before/after  calling
+       (5)  Should  I  allocate/free  a  stack every time before/after calling
        pcre2_match()?

-       No,  because  this  is  too  costly in terms of resources. However, you
-       could implement some clever idea which release the stack if it  is  not
-       used  in  let's  say  two minutes. The JIT callback can help to achieve
+       No, because this is too costly in  terms  of  resources.  However,  you
+       could  implement  some clever idea which release the stack if it is not
+       used in let's say two minutes. The JIT callback  can  help  to  achieve
        this without keeping a list of patterns.

-       (6) OK, the stack is for long term memory allocation. But what  happens
-       if  a  pattern causes stack overflow with a stack of 1MiB? Is that 1MiB
+       (6)  OK, the stack is for long term memory allocation. But what happens
+       if a pattern causes stack overflow with a stack of 1MiB? Is  that  1MiB
        kept until the stack is freed?

-       Especially on embedded sytems, it might be a good idea to release  mem-
-       ory  sometimes  without  freeing the stack. There is no API for this at
-       the moment.  Probably a function call which returns with the  currently
-       allocated  memory for any stack and another which allows releasing mem-
+       Especially  on embedded sytems, it might be a good idea to release mem-
+       ory sometimes without freeing the stack. There is no API  for  this  at
+       the  moment.  Probably a function call which returns with the currently
+       allocated memory for any stack and another which allows releasing  mem-
        ory (shrinking the stack) would be a good idea if someone needs this.

        (7) This is too much of a headache. Isn't there any better solution for
        JIT stack handling?

-       No,  thanks to Windows. If POSIX threads were used everywhere, we could
+       No, thanks to Windows. If POSIX threads were used everywhere, we  could
        throw out this complicated API.

@@ -5116,10 +5141,10 @@
        void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);

        The JIT executable allocator does not free all memory when it is possi-
-       ble.   It expects new allocations, and keeps some free memory around to
-       improve allocation speed. However, in low memory conditions,  it  might
-       be  better to free all possible memory. You can cause this to happen by
-       calling pcre2_jit_free_unused_memory(). Its argument is a general  con-
+       ble.  It expects new allocations, and keeps some free memory around  to
+       improve  allocation  speed. However, in low memory conditions, it might
+       be better to free all possible memory. You can cause this to happen  by
+       calling  pcre2_jit_free_unused_memory(). Its argument is a general con-
        text, for custom memory management, or NULL for standard memory manage-
        ment.

@@ -5126,8 +5151,8 @@

EXAMPLE CODE

-       This is a single-threaded example that specifies a  JIT  stack  without
-       using  a  callback.  A real program should include error checking after
+       This  is  a  single-threaded example that specifies a JIT stack without
+       using a callback. A real program should include  error  checking  after
        all the function calls.

          int rc;
@@ -5155,29 +5180,31 @@
 JIT FAST PATH API

        Because the API described above falls back to interpreted matching when
-       JIT  is  not  available, it is convenient for programs that are written
+       JIT is not available, it is convenient for programs  that  are  written
        for  general  use  in  many  environments.  However,  calling  JIT  via
        pcre2_match() does have a performance impact. Programs that are written
-       for use where JIT is known to be available, and  which  need  the  best
-       possible  performance,  can  instead  use a "fast path" API to call JIT
-       matching directly instead of calling pcre2_match() (obviously only  for
+       for  use  where  JIT  is known to be available, and which need the best
+       possible performance, can instead use a "fast path"  API  to  call  JIT
+       matching  directly instead of calling pcre2_match() (obviously only for
        patterns that have been successfully processed by pcre2_jit_compile()).

-       The  fast  path  function  is  called  pcre2_jit_match(),  and it takes
-       exactly the same arguments as pcre2_match(). The return values are also
-       the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or
-       complete) is requested that was not compiled. Unsupported  option  bits
-       (for  example,  PCRE2_ANCHORED)  are  ignored,  as  is the PCRE2_NO_JIT
-       option.
+       The fast path  function  is  called  pcre2_jit_match(),  and  it  takes
+       exactly  the  same  arguments  as  pcre2_match().  However, the subject
+       string must be specified with a length;  PCRE2_ZERO_TERMINATED  is  not
+       supported.   Unsupported  option  bits  (for  example,  PCRE2_ANCHORED,
+       PCRE2_ENDANCHORED and PCRE2_COPY_MATCHED_SUBJECT) are  ignored,  as  is
+       the  PCRE2_NO_JIT  option.  The  return values are also the same as for
+       pcre2_match(), plus PCRE2_ERROR_JIT_BADOPTION if a matching mode  (par-
+       tial or complete) is requested that was not compiled.

-       When you call pcre2_match(), as well as testing for invalid options,  a
+       When  you call pcre2_match(), as well as testing for invalid options, a
        number of other sanity checks are performed on the arguments. For exam-
        ple, if the subject pointer is NULL, an immediate error is given. Also,
-       unless  PCRE2_NO_UTF_CHECK  is  set, a UTF subject string is tested for
-       validity. In the interests of speed, these checks do not happen on  the
+       unless PCRE2_NO_UTF_CHECK is set, a UTF subject string  is  tested  for
+       validity.  In the interests of speed, these checks do not happen on the
        JIT fast path, and if invalid data is passed, the result is undefined.

-       Bypassing  the  sanity  checks  and the pcre2_match() wrapping can give
+       Bypassing the sanity checks and the  pcre2_match()  wrapping  can  give
        speedups of more than 10%.

@@ -5195,7 +5222,7 @@

REVISION

-       Last updated: 28 June 2018
+       Last updated: 16 October 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

Modified: code/trunk/doc/pcre2_dfa_match.3
===================================================================
--- code/trunk/doc/pcre2_dfa_match.3    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/pcre2_dfa_match.3    2018-10-17 08:33:38 UTC (rev 1028)
@@ -1,4 +1,4 @@
-.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
+.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -39,6 +39,8 @@
 characters. The options are:
 .sp
   PCRE2_ANCHORED          Match only at the first position
+  PCRE2_COPY_MATCHED_SUBJECT
+                          On success, make a private subject copy  
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_NOTBOL            Subject is not the beginning of a line
   PCRE2_NOTEOL            Subject is not the end of a line

Modified: code/trunk/doc/pcre2_match.3
===================================================================
--- code/trunk/doc/pcre2_match.3    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/pcre2_match.3    2018-10-17 08:33:38 UTC (rev 1028)
@@ -1,4 +1,4 @@
-.TH PCRE2_MATCH 3 "14 November 2017" "PCRE2 10.31"
+.TH PCRE2_MATCH 3 "16 October 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -43,11 +43,13 @@
   Change the backtracking depth limit
   Set custom memory management specifically for the match
 .sp
-The \fIlength\fP and \fIstartoffset\fP values are code
-units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
-subject that is terminated by a binary zero code unit. The options are:
+The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
+The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
+terminated by a binary zero code unit. The options are:
 .sp
   PCRE2_ANCHORED          Match only at the first position
+  PCRE2_COPY_MATCHED_SUBJECT
+                          On success, make a private subject copy   
   PCRE2_ENDANCHORED       Pattern can match only at end of subject
   PCRE2_NOTBOL            Subject string is not the beginning of a line
   PCRE2_NOTEOL            Subject string is not the end of a line

Modified: code/trunk/doc/pcre2_match_data_free.3
===================================================================
--- code/trunk/doc/pcre2_match_data_free.3    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/pcre2_match_data_free.3    2018-10-17 08:33:38 UTC (rev 1028)
@@ -1,4 +1,4 @@
-.TH PCRE2_MATCH_DATA_FREE 3 "28 June 2018" "PCRE2 10.32"
+.TH PCRE2_MATCH_DATA_FREE 3 "16 October 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -18,6 +18,10 @@
 using the memory freeing function from the general context or compiled pattern
 with which it was created, or \fBfree()\fP if that was not set.
 .P
+If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this 
+match data block, the copy of the subject that was remembered with the block is
+also freed.
+.P
 There is a complete description of the PCRE2 native API in the
 .\" HREF
 \fBpcre2api\fP

Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/pcre2api.3    2018-10-17 08:33:38 UTC (rev 1028)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "21 September 2018" "PCRE2 10.33"
+.TH PCRE2API 3 "16 October 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -1237,13 +1237,19 @@
 NOTE: When one of the matching functions is called, pointers to the compiled
 pattern and the subject string are set in the match data block so that they can
 be referenced by the substring extraction functions. After running a match, you
-must not free a compiled pattern (or a subject string) until after all
+must not free a compiled pattern or a subject string until after all
 operations on the
 .\" HTML <a href="#matchdatablock">
 .\" </a>
 match data block
 .\"
-have taken place.
+have taken place, unless, in the case of the subject string, you have used the 
+PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
+"Option bits for \fBpcre2_match()\fP"
+.\" HTML <a href="#matchoptions>">
+.\" </a>
+below.
+.\"
 .P
 The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
 settings that affect the compilation. It should be zero if no options are
@@ -2390,7 +2396,13 @@
 and the subject string are set in the match data block so that they can be
 referenced by the extraction functions. After running a match, you must not
 free a compiled pattern or a subject string until after all operations on the
-match data block (for that match) have taken place.
+match data block (for that match) have taken place, unless, in the case of the
+subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
+described in the section entitled "Option bits for \fBpcre2_match()\fP"
+.\" HTML <a href="#matchoptions>">
+.\" </a>
+below.
+.\"
 .P
 When a match data block itself is no longer needed, it should be freed by
 calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
@@ -2507,10 +2519,10 @@
 .rs
 .sp
 The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
-zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
-Their action is described below.
+zero. The only bits that may be set are PCRE2_ANCHORED, 
+PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
+PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
+PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
 .P
 Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
 the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
@@ -2525,6 +2537,22 @@
 matching time. Note that setting the option at match time disables JIT
 matching.
 .sp
+  PCRE2_COPY_MATCHED_SUBJECT
+.sp
+By default, a pointer to the subject is remembered in the match data block so 
+that, after a successful match, it can be referenced by the substring 
+extraction functions. This means that the subject's memory must not be freed
+until all such operations are complete. For some applications where the
+lifetime of the subject string is not guaranteed, it may be necessary to make a
+copy of the subject string, but it is wasteful to do this unless the match is
+successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
+subject is copied and the new pointer is remembered in the match data block
+instead of the original subject pointer. The memory allocator that was used for
+the match block itself is used. The copy is automatically freed when
+\fBpcre2_match_data_free()\fP is called to free the match data block. It is also
+automatically freed if the match data block is re-used for another match
+operation.
+.sp
   PCRE2_ENDANCHORED
 .sp
 If the PCRE2_ENDANCHORED option is set, any string that \fBpcre2_match()\fP
@@ -2961,7 +2989,8 @@
 If a pattern contains many nested backtracking points, heap memory is used to
 remember them. This error is given when the memory allocation function (default
 or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
-if the amount of memory needed exceeds the heap limit.
+if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is 
+also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
 .sp
   PCRE2_ERROR_NULL
 .sp
@@ -3579,11 +3608,12 @@
 .rs
 .sp
 The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
-be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
-and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
-for \fBpcre2_match()\fP, so their description is not repeated here.
+be zero. The only bits that may be set are PCRE2_ANCHORED,
+PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
+PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
+PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
+four of these are exactly the same as for \fBpcre2_match()\fP, so their
+description is not repeated here.
 .sp
   PCRE2_PARTIAL_HARD
   PCRE2_PARTIAL_SOFT
@@ -3737,6 +3767,6 @@
 .rs
 .sp
 .nf
-Last updated: 21 September 2018
+Last updated: 16 October 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2jit.3
===================================================================
--- code/trunk/doc/pcre2jit.3    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/doc/pcre2jit.3    2018-10-17 08:33:38 UTC (rev 1028)
@@ -1,4 +1,4 @@
-.TH PCRE2JIT 3 "28 June 2018" "PCRE2 10.32"
+.TH PCRE2JIT 3 "16 October 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
@@ -124,9 +124,10 @@
 .rs
 .sp
 The \fBpcre2_match()\fP options that are supported for JIT matching are
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
-PCRE2_ANCHORED option is not supported at match time.
+PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
+PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
+PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
+supported at match time.
 .P
 If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the
 use of JIT, forcing matching by the interpreter code.
@@ -376,10 +377,13 @@
 processed by \fBpcre2_jit_compile()\fP).
 .P
 The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly
-the same arguments as \fBpcre2_match()\fP. The return values are also the same,
-plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
-requested that was not compiled. Unsupported option bits (for example,
-PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
+the same arguments as \fBpcre2_match()\fP. However, the subject string must be
+specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
+option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
+PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
+return values are also the same as for \fBpcre2_match()\fP, plus
+PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
+that was not compiled.
 .P
 When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
 number of other sanity checks are performed on the arguments. For example, if
@@ -412,6 +416,6 @@
 .rs
 .sp
 .nf
-Last updated: 28 June 2018
+Last updated: 16 October 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2.h.in    2018-10-17 08:33:38 UTC (rev 1028)
@@ -167,37 +167,28 @@
 #define PCRE2_JIT_PARTIAL_HARD    0x00000004u
 #define PCRE2_JIT_INVALID_UTF     0x00000100u

-/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note
-that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these
-functions (though pcre2_jit_match() ignores the latter since it bypasses all
-sanity checks). */
+/* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and
+pcre2_substitute(). Some are allowed only for one of the functions, and in
+these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and
+PCRE2_NO_UTF_CHECK can also be passed to these functions (though
+pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */

-#define PCRE2_NOTBOL              0x00000001u
-#define PCRE2_NOTEOL              0x00000002u
-#define PCRE2_NOTEMPTY            0x00000004u  /* ) These two must be kept */
-#define PCRE2_NOTEMPTY_ATSTART    0x00000008u  /* ) adjacent to each other. */
-#define PCRE2_PARTIAL_SOFT        0x00000010u
-#define PCRE2_PARTIAL_HARD        0x00000020u
+#define PCRE2_NOTBOL                      0x00000001u
+#define PCRE2_NOTEOL                      0x00000002u
+#define PCRE2_NOTEMPTY                    0x00000004u  /* ) These two must be kept */
+#define PCRE2_NOTEMPTY_ATSTART            0x00000008u  /* ) adjacent to each other. */
+#define PCRE2_PARTIAL_SOFT                0x00000010u
+#define PCRE2_PARTIAL_HARD                0x00000020u
+#define PCRE2_DFA_RESTART                 0x00000040u  /* pcre2_dfa_match() only */
+#define PCRE2_DFA_SHORTEST                0x00000080u  /* pcre2_dfa_match() only */
+#define PCRE2_SUBSTITUTE_GLOBAL           0x00000100u  /* pcre2_substitute() only */
+#define PCRE2_SUBSTITUTE_EXTENDED         0x00000200u  /* pcre2_substitute() only */
+#define PCRE2_SUBSTITUTE_UNSET_EMPTY      0x00000400u  /* pcre2_substitute() only */
+#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET    0x00000800u  /* pcre2_substitute() only */
+#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  0x00001000u  /* pcre2_substitute() only */
+#define PCRE2_NO_JIT                      0x00002000u  /* Not for pcre2_dfa_match() */
+#define PCRE2_COPY_MATCHED_SUBJECT        0x00004000u

-/* These are additional options for pcre2_dfa_match(). */
-
-#define PCRE2_DFA_RESTART         0x00000040u
-#define PCRE2_DFA_SHORTEST        0x00000080u
-
-/* These are additional options for pcre2_substitute(), which passes any others
-through to pcre2_match(). */
-
-#define PCRE2_SUBSTITUTE_GLOBAL           0x00000100u
-#define PCRE2_SUBSTITUTE_EXTENDED         0x00000200u
-#define PCRE2_SUBSTITUTE_UNSET_EMPTY      0x00000400u
-#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET    0x00000800u
-#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  0x00001000u
-
-/* A further option for pcre2_match(), not allowed for pcre2_dfa_match(),
-ignored for pcre2_jit_match(). */
-
-#define PCRE2_NO_JIT              0x00002000u
-
 /* Options for pcre2_pattern_convert(). */

 #define PCRE2_CONVERT_UTF                    0x00000001u

Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2_dfa_match.c    2018-10-17 08:33:38 UTC (rev 1028)
@@ -85,7 +85,8 @@
 #define PUBLIC_DFA_MATCH_OPTIONS \
   (PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
    PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
-   PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART)
+   PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \
+   PCRE2_COPY_MATCHED_SUBJECT)

/*************************************************
@@ -3228,6 +3229,8 @@
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
{
int rc;
+int was_zero_terminated = 0;
+
const pcre2_real_code *re = (const pcre2_real_code *)code;

PCRE2_SPTR start_match;
@@ -3267,7 +3270,11 @@
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */

-if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
+if (length == PCRE2_ZERO_TERMINATED)
+ {
+ length = PRIV(strlen)(subject);
+ was_zero_terminated = 1;
+ }

/* Plausibility checks */

@@ -3520,10 +3527,21 @@
     }
   }

+/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
+free the memory that was obtained. */
+
+if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
+  {
+  match_data->memctl.free((void *)match_data->subject, 
+    match_data->memctl.memory_data);
+  match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
+  }    
+
 /* Fill in fields that are always returned in the match data. */

match_data->code = re;
match_data->subject = subject;
+match_data->flags = 0;
match_data->mark = NULL;
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;

@@ -3818,6 +3836,17 @@
     match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
     match_data->startchar = (PCRE2_SIZE)(start_match - subject);
     match_data->rc = rc;
+
+    if (rc >= 0 &&(options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
+      {
+      length = CU2BYTES(length + was_zero_terminated);
+      match_data->subject = match_data->memctl.malloc(length,
+        match_data->memctl.memory_data);
+      if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
+      memcpy((void *)match_data->subject, subject, length);
+      match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
+      }
+ 
     goto EXIT;
     }

Modified: code/trunk/src/pcre2_internal.h
===================================================================
--- code/trunk/src/pcre2_internal.h    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2_internal.h    2018-10-17 08:33:38 UTC (rev 1028)
@@ -534,7 +534,11 @@
 enum { PCRE2_MATCHEDBY_INTERPRETER,     /* pcre2_match() */
        PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
        PCRE2_MATCHEDBY_JIT };           /* pcre2_jit_match() */
+       
+/* Values for the flags field in a match data block. */

+#define PCRE2_MD_COPIED_SUBJECT 0x01u
+
/* Magic number to provide a small check against being handed junk. */

#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */

Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2_intmodedep.h    2018-10-17 08:33:38 UTC (rev 1028)
@@ -658,7 +658,8 @@
   PCRE2_SIZE       leftchar;      /* Offset to leftmost code unit */
   PCRE2_SIZE       rightchar;     /* Offset to rightmost code unit */
   PCRE2_SIZE       startchar;     /* Offset to starting code unit */
-  uint16_t         matchedby;     /* Type of match (normal, JIT, DFA) */
+  uint8_t          matchedby;     /* Type of match (normal, JIT, DFA) */
+  uint8_t          flags;         /* Various flags */
   uint16_t         oveccount;     /* Number of pairs */
   int              rc;            /* The return code from the match */
   PCRE2_SIZE       ovector[131072]; /* Must be last in the structure */

Modified: code/trunk/src/pcre2_jit_match.c
===================================================================
--- code/trunk/src/pcre2_jit_match.c    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2_jit_match.c    2018-10-17 08:33:38 UTC (rev 1028)
@@ -7,7 +7,7 @@

                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-         New API code Copyright (c) 2016 University of Cambridge
+          New API code Copyright (c) 2016-2018 University of Cambridge

-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -174,6 +174,7 @@
rc = 0;
match_data->code = re;
match_data->subject = subject;
+match_data->flags = 0;
match_data->rc = rc;
match_data->startchar = arguments.startchar_ptr - subject;
match_data->leftchar = 0;

Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2_match.c    2018-10-17 08:33:38 UTC (rev 1028)
@@ -69,11 +69,12 @@
 #define PUBLIC_MATCH_OPTIONS \
   (PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
    PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
-   PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT)
+   PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT|PCRE2_COPY_MATCHED_SUBJECT)

 #define PUBLIC_JIT_MATCH_OPTIONS \
    (PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
-    PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD)
+    PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD|\
+    PCRE2_COPY_MATCHED_SUBJECT)

 /* Non-error returns from and within the match() function. Error returns are
 externally defined PCRE2_ERROR_xxx codes, which are all negative. */
@@ -5014,7 +5015,7 @@
     must record a backtracking point and also set up a chained frame. */

     case OP_ONCE:
-    case OP_SCRIPT_RUN: 
+    case OP_SCRIPT_RUN:
     case OP_SBRA:
     Lframe_type = GF_NOCAPTURE | Fop;

@@ -5526,14 +5527,14 @@
       case OP_ASSERT_NOT:
       case OP_ASSERTBACK_NOT:
       RRETURN(MATCH_MATCH);
-      
-      /* At the end of a script run, apply the script-checking rules. This code 
-      will never by exercised if Unicode support it not compiled, because in 
+
+      /* At the end of a script run, apply the script-checking rules. This code
+      will never by exercised if Unicode support it not compiled, because in
       that environment script runs cause an error at compile time. */
-      
+
       case OP_SCRIPT_RUN:
       if (!PRIV(script_run)(P->eptr, Feptr, utf)) RRETURN(MATCH_NOMATCH);
-      break;  
+      break;

       /* Whole-pattern recursion is coded as a recurse into group 0, so it
       won't be picked up here. Instead, we catch it when the OP_END is reached.
@@ -6009,10 +6010,11 @@
   pcre2_match_context *mcontext)
 {
 int rc;
+int was_zero_terminated = 0;
 const uint8_t *start_bits = NULL;
-
 const pcre2_real_code *re = (const pcre2_real_code *)code;

+
BOOL anchored;
BOOL firstline;
BOOL has_first_cu = FALSE;
@@ -6052,7 +6054,11 @@
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */

-if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
+if (length == PCRE2_ZERO_TERMINATED)
+ {
+ length = PRIV(strlen)(subject);
+ was_zero_terminated = 1;
+ }
end_subject = subject + length;

 /* Plausibility checks */
@@ -6166,7 +6172,17 @@
 if (mcontext != NULL && mcontext->offset_limit != PCRE2_UNSET &&
      (re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0)
   return PCRE2_ERROR_BADOFFSETLIMIT;
+  
+/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
+free the memory that was obtained. */

+if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
+  {
+  match_data->memctl.free((void *)match_data->subject, 
+    match_data->memctl.memory_data);
+  match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
+  }    
+
 /* If the pattern was successfully studied with JIT support, run the JIT
 executable instead of the rest of this function. Most options must be set at
 compile time for the JIT code to be usable. Fallback to the normal code path if
@@ -6178,7 +6194,19 @@
   {
   rc = pcre2_jit_match(code, subject, length, start_offset, options,
     match_data, mcontext);
-  if (rc != PCRE2_ERROR_JIT_BADOPTION) return rc;
+  if (rc != PCRE2_ERROR_JIT_BADOPTION) 
+    {
+    if (rc >= 0 && (options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
+      {
+      length = CU2BYTES(length + was_zero_terminated);
+      match_data->subject = match_data->memctl.malloc(length,
+        match_data->memctl.memory_data);
+      if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
+      memcpy((void *)match_data->subject, subject, length);
+      match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
+      }
+    return rc;
+    } 
   }
 #endif

@@ -6819,12 +6847,14 @@

match_data->code = re;
match_data->subject = subject;
+match_data->flags = 0;
match_data->mark = mb->mark;
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;

/* Handle a fully successful match. Set the return code to the number of
captured strings, or 0 if there were too many to fit into the ovector, and then
-set the remaining returned values before returning. */
+set the remaining returned values before returning. Make a copy of the subject
+string if requested. */

 if (rc == MATCH_MATCH)
   {
@@ -6834,6 +6864,17 @@
   match_data->leftchar = mb->start_used_ptr - subject;
   match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)?
     mb->last_used_ptr : mb->end_match_ptr) - subject;
+    
+  if ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
+    {
+    length = CU2BYTES(length + was_zero_terminated);
+    match_data->subject = match_data->memctl.malloc(length,
+      match_data->memctl.memory_data);
+    if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
+    memcpy((void *)match_data->subject, subject, length);
+    match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
+    }
+     
   return match_data->rc;
   }

Modified: code/trunk/src/pcre2_match_data.c
===================================================================
--- code/trunk/src/pcre2_match_data.c    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2_match_data.c    2018-10-17 08:33:38 UTC (rev 1028)
@@ -7,7 +7,7 @@

                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2017 University of Cambridge
+          New API code Copyright (c) 2016-2018 University of Cambridge

-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -63,6 +63,7 @@
(pcre2_memctl *)gcontext);
if (yield == NULL) return NULL;
yield->oveccount = oveccount;
+yield->flags = 0;
return yield;
}

@@ -93,7 +94,12 @@
 pcre2_match_data_free(pcre2_match_data *match_data)
 {
 if (match_data != NULL)
+  {
+  if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
+    match_data->memctl.free((void *)match_data->subject, 
+      match_data->memctl.memory_data);
   match_data->memctl.free(match_data, match_data->memctl.memory_data);
+  } 
 }

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/src/pcre2test.c    2018-10-17 08:33:38 UTC (rev 1028)
@@ -620,6 +620,7 @@
   { "convert_glob_separator",     MOD_PAT,  MOD_CHR, 0,                          PO(convert_glob_separator) },
   { "convert_length",             MOD_PAT,  MOD_INT, 0,                          PO(convert_length) },
   { "copy",                       MOD_DAT,  MOD_NN,  DO(copy_numbers),           DO(copy_names) },
+  { "copy_matched_subject",       MOD_DAT,  MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
   { "debug",                      MOD_PAT,  MOD_CTL, CTL_DEBUG,                  PO(control) },
   { "depth_limit",                MOD_CTM,  MOD_INT, 0,                          MO(depth_limit) },
   { "dfa",                        MOD_DAT,  MOD_CTL, CTL_DFA,                    DO(control) },
@@ -4180,7 +4181,7 @@
   ((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
   ((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
   ((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
-  ((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "", 
+  ((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
   after);
 }

@@ -4196,11 +4197,13 @@
 static void
 show_match_options(uint32_t options)
 {
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s",
   ((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
+  ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)? " copy_matched_subject" : "",
   ((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
   ((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
   ((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "",
+  ((options & PCRE2_NO_JIT) != 0)? " no_jit" : "",
   ((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
   ((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
   ((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",
@@ -7442,6 +7445,25 @@
         }
       }

+    /* If PCRE2_COPY_MATCHED_SUBJECT was set, check that things are as they
+    should be, but not for fast JIT, where it isn't supported. */
+
+    if ((dat_datctl.options & PCRE2_COPY_MATCHED_SUBJECT) != 0 &&
+        (pat_patctl.control & CTL_JITFAST) == 0)
+      {
+      if ((FLD(match_data, flags) & PCRE2_MD_COPIED_SUBJECT) == 0)
+        fprintf(outfile,
+          "** PCRE2 error: flag not set after copy_matched_subject\n");
+
+      if (CASTFLD(void *, match_data, subject) == pp)
+        fprintf(outfile,
+          "** PCRE2 error: copy_matched_subject has not copied\n");
+
+      if (memcmp(CASTFLD(void *, match_data, subject), pp, ulen) != 0)
+        fprintf(outfile,
+          "** PCRE2 error: copy_matched_subject mismatch\n");
+      }
+
     /* If this is not the first time round a global loop, check that the
     returned string has changed. If it has not, check for an empty string match
     at different starting offset from the previous match. This is a failed test

Modified: code/trunk/testdata/testinput17
===================================================================
--- code/trunk/testdata/testinput17    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/testdata/testinput17    2018-10-17 08:33:38 UTC (rev 1028)
@@ -299,9 +299,9 @@
 # ----

 /[aC]/mg,firstline,newline=lf
-match\nmatch
+    match\nmatch

 /[aCz]/mg,firstline,newline=lf
-match\nmatch
+    match\nmatch

# End of testinput17

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/testdata/testinput2    2018-10-17 08:33:38 UTC (rev 1028)
@@ -5531,4 +5531,11 @@

/(?(*script_run:xxx)zzz)/

+/foobar/
+    the foobar thing\=copy_matched_subject
+    the foobar thing\=copy_matched_subject,zero_terminate
+
+/foobar/g
+    the foobar thing foobar again\=copy_matched_subject
+
 # End of testinput2

Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/testdata/testinput6    2018-10-17 08:33:38 UTC (rev 1028)
@@ -4955,4 +4955,11 @@
 \= Expect no match
     \na

+/foobar/
+    the foobar thing\=copy_matched_subject
+    the foobar thing\=copy_matched_subject,zero_terminate
+
+/foobar/g
+    the foobar thing foobar again\=copy_matched_subject
+
 # End of testinput6

Modified: code/trunk/testdata/testoutput17
===================================================================
--- code/trunk/testdata/testoutput17    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/testdata/testoutput17    2018-10-17 08:33:38 UTC (rev 1028)
@@ -543,11 +543,11 @@
 # ----

 /[aC]/mg,firstline,newline=lf
-match\nmatch
+    match\nmatch
  0: a (JIT)

 /[aCz]/mg,firstline,newline=lf
-match\nmatch
+    match\nmatch
  0: a (JIT)

# End of testinput17

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/testdata/testoutput2    2018-10-17 08:33:38 UTC (rev 1028)
@@ -16821,6 +16821,17 @@
 /(?(*script_run:xxx)zzz)/
 Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)

+/foobar/
+    the foobar thing\=copy_matched_subject
+ 0: foobar
+    the foobar thing\=copy_matched_subject,zero_terminate
+ 0: foobar
+
+/foobar/g
+    the foobar thing foobar again\=copy_matched_subject
+ 0: foobar
+ 0: foobar
+
 # End of testinput2
 Error -70: PCRE2_ERROR_BADDATA (unknown error number)
 Error -62: bad serialized data

Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2018-10-15 11:01:24 UTC (rev 1027)
+++ code/trunk/testdata/testoutput6    2018-10-17 08:33:38 UTC (rev 1028)
@@ -7783,4 +7783,15 @@
     \na
 No match

+/foobar/
+    the foobar thing\=copy_matched_subject
+ 0: foobar
+    the foobar thing\=copy_matched_subject,zero_terminate
+ 0: foobar
+
+/foobar/g
+    the foobar thing foobar again\=copy_matched_subject
+ 0: foobar
+ 0: foobar
+
 # End of testinput6

Questo messaggio è parte di questo thread:
	il thread completo ordinato per data

[Pcre-svn] [1028] code/trunk: Implement PCRE2_COPY_MATCHED_S…