[Pcre-svn] [1308] code/trunk: Documentation update

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1308] code/trunk: Documentation update
Revision: 1308
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1308
Author:   ph10
Date:     2021-04-28 16:37:48 +0100 (Wed, 28 Apr 2021)
Log Message:
-----------
Documentation update


Modified Paths:
--------------
    code/trunk/NON-AUTOTOOLS-BUILD
    code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt
    code/trunk/doc/html/pcre2.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2test.txt


Modified: code/trunk/NON-AUTOTOOLS-BUILD
===================================================================
--- code/trunk/NON-AUTOTOOLS-BUILD    2021-04-28 14:21:38 UTC (rev 1307)
+++ code/trunk/NON-AUTOTOOLS-BUILD    2021-04-28 15:37:48 UTC (rev 1308)
@@ -40,7 +40,11 @@


The following are generic instructions for building the PCRE2 C library "by
hand". If you are going to use CMake, this section does not apply to you; you
-can skip ahead to the CMake section.
+can skip ahead to the CMake section. Note that the settings concerned with
+8-bit, 16-bit, and 32-bit code units relate to the type of data string that
+PCRE2 processes. They are NOT referring to the underlying operating system bit
+width. You do not have to do anything special to compile in a 64-bit
+environment, for example.

  (1) Copy or rename the file src/config.h.generic as src/config.h, and edit the
      macro settings that it contains to whatever is appropriate for your
@@ -86,11 +90,11 @@
      The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
      specify alternative tables at run time.


- (4) For an 8-bit library, compile the following source files from the src
-     directory, setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also
-     set -DHAVE_CONFIG_H if you have set up src/config.h with your
-     configuration, or else use other -D settings to change the configuration
-     as required.
+ (4) For a library that supports 8-bit code units in the character strings that 
+     it processes, compile the following source files from the src directory,
+     setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also set
+     -DHAVE_CONFIG_H if you have set up src/config.h with your configuration,
+     or else use other -D settings to change the configuration as required.


        pcre2_auto_possess.c
        pcre2_chartables.c
@@ -142,9 +146,9 @@
      If your system has static and shared libraries, you may have to do this
      once for each type.


- (6) If you want to build a 16-bit library or 32-bit library (as well as, or
-     instead of the 8-bit library) just supply 16 or 32 as the value of
-     -DPCRE2_CODE_UNIT_WIDTH when you are compiling.
+ (6) If you want to build a library that supports 16-bit or 32-bit code units,
+     (as well as, or instead of the 8-bit library) just supply 16 or 32 as the
+     value of -DPCRE2_CODE_UNIT_WIDTH when you are compiling.


  (7) If you want to build the POSIX wrapper functions (which apply only to the
      8-bit library), ensure that you have the src/pcre2posix.h file and then
@@ -401,6 +405,6 @@
 z/OS file formats. The port provides an API for LE languages such as COBOL and
 for the z/OS and z/VM versions of the Rexx languages.


-==============================
-Last Updated: 14 November 2018
-==============================
+===========================
+Last Updated: 28 April 2021
+===========================

Modified: code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt
===================================================================
--- code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt    2021-04-28 14:21:38 UTC (rev 1307)
+++ code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt    2021-04-28 15:37:48 UTC (rev 1308)
@@ -40,7 +40,11 @@


The following are generic instructions for building the PCRE2 C library "by
hand". If you are going to use CMake, this section does not apply to you; you
-can skip ahead to the CMake section.
+can skip ahead to the CMake section. Note that the settings concerned with
+8-bit, 16-bit, and 32-bit code units relate to the type of data string that
+PCRE2 processes. They are NOT referring to the underlying operating system bit
+width. You do not have to do anything special to compile in a 64-bit
+environment, for example.

  (1) Copy or rename the file src/config.h.generic as src/config.h, and edit the
      macro settings that it contains to whatever is appropriate for your
@@ -86,11 +90,11 @@
      The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
      specify alternative tables at run time.


- (4) For an 8-bit library, compile the following source files from the src
-     directory, setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also
-     set -DHAVE_CONFIG_H if you have set up src/config.h with your
-     configuration, or else use other -D settings to change the configuration
-     as required.
+ (4) For a library that supports 8-bit code units in the character strings that 
+     it processes, compile the following source files from the src directory,
+     setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also set
+     -DHAVE_CONFIG_H if you have set up src/config.h with your configuration,
+     or else use other -D settings to change the configuration as required.


        pcre2_auto_possess.c
        pcre2_chartables.c
@@ -142,9 +146,9 @@
      If your system has static and shared libraries, you may have to do this
      once for each type.


- (6) If you want to build a 16-bit library or 32-bit library (as well as, or
-     instead of the 8-bit library) just supply 16 or 32 as the value of
-     -DPCRE2_CODE_UNIT_WIDTH when you are compiling.
+ (6) If you want to build a library that supports 16-bit or 32-bit code units,
+     (as well as, or instead of the 8-bit library) just supply 16 or 32 as the
+     value of -DPCRE2_CODE_UNIT_WIDTH when you are compiling.


  (7) If you want to build the POSIX wrapper functions (which apply only to the
      8-bit library), ensure that you have the src/pcre2posix.h file and then
@@ -401,6 +405,6 @@
 z/OS file formats. The port provides an API for LE languages such as COBOL and
 for the z/OS and z/VM versions of the Rexx languages.


-==============================
-Last Updated: 14 November 2018
-==============================
+===========================
+Last Updated: 28 April 2021
+===========================

Modified: code/trunk/doc/html/pcre2.html
===================================================================
--- code/trunk/doc/html/pcre2.html    2021-04-28 14:21:38 UTC (rev 1307)
+++ code/trunk/doc/html/pcre2.html    2021-04-28 15:37:48 UTC (rev 1308)
@@ -38,8 +38,14 @@
 that give better ECMAScript (aka JavaScript) compatibility.
 </P>
 <P>
-The source code for PCRE2 can be compiled to support 8-bit, 16-bit, or 32-bit
-code units, which means that up to three separate libraries may be installed.
+The source code for PCRE2 can be compiled to support strings of 8-bit, 16-bit,
+or 32-bit code units, which means that up to three separate libraries may be
+installed, one for each code unit size. The size of code unit is not related to
+the bit size of the underlying hardware. In a 64-bit environment that also
+supports 32-bit applications, versions of PCRE2 that are compiled in both
+64-bit and 32-bit modes may be needed.
+</P>
+<P>
 The original work to extend PCRE to 16-bit and 32-bit code units was done by
 Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
 can be interpreted either as one character per code unit, or as UTF-encoded
@@ -198,9 +204,9 @@
 </P>
 <br><a name="SEC5" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 17 September 2018
+Last updated: 28 April 2021
 <br>
-Copyright &copy; 1997-2018 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2021-04-28 14:21:38 UTC (rev 1307)
+++ code/trunk/doc/html/pcre2test.html    2021-04-28 15:37:48 UTC (rev 1308)
@@ -1213,7 +1213,7 @@
 The following modifiers affect the matching process or request additional
 information. Some of them may also be specified on a pattern line (see above),
 in which case they apply to every subject line that is matched against that
-pattern.
+pattern, but can be overridden by modifiers on the subject.
 <pre>
       aftertext                  show text after match
       allaftertext               show text after captures
@@ -1421,6 +1421,11 @@
 a modifier. This is not thought to be an issue in a test program.
 </P>
 <P>
+Specifying a completely empty replacement string disables this modifier. 
+However, it is possible to specify an empty replacement by providing a buffer 
+length, as described below, for an otherwise empty replacement.
+</P>
+<P>
 Unlike subject strings, <b>pcre2test</b> does not process replacement strings
 for escape sequences. In UTF mode, a replacement string is checked to see if it
 is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
@@ -2119,9 +2124,9 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 September 2020
+Last updated: 28 April 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2021-04-28 14:21:38 UTC (rev 1307)
+++ code/trunk/doc/pcre2.txt    2021-04-28 15:37:48 UTC (rev 1308)
@@ -34,93 +34,98 @@
        requesting  some  minor  changes that give better ECMAScript (aka Java-
        Script) compatibility.


-       The source code for PCRE2 can be compiled to support 8-bit, 16-bit,  or
-       32-bit  code units, which means that up to three separate libraries may
-       be installed.  The original work to extend PCRE to  16-bit  and  32-bit
-       code  units  was  done  by Zoltan Herczeg and Christian Persch, respec-
-       tively. In all three cases, strings can be interpreted  either  as  one
-       character  per  code  unit, or as UTF-encoded Unicode, with support for
-       Unicode general category properties. Unicode  support  is  optional  at
-       build  time  (but  is  the default). However, processing strings as UTF
-       code units must be enabled explicitly at run time. The version of  Uni-
-       code in use can be discovered by running
+       The source code for PCRE2 can be compiled to support strings of  8-bit,
+       16-bit, or 32-bit code units, which means that up to three separate li-
+       braries may be installed, one for each code unit size. The size of code
+       unit  is  not  related to the bit size of the underlying hardware. In a
+       64-bit environment that also supports 32-bit applications, versions  of
+       PCRE2 that are compiled in both 64-bit and 32-bit modes may be needed.


+       The  original  work  to extend PCRE to 16-bit and 32-bit code units was
+       done by Zoltan Herczeg and Christian Persch, respectively. In all three
+       cases,  strings  can  be  interpreted  either as one character per code
+       unit, or as UTF-encoded Unicode, with support for Unicode general cate-
+       gory  properties. Unicode support is optional at build time (but is the
+       default). However, processing strings as UTF code units must be enabled
+       explicitly at run time. The version of Unicode in use can be discovered
+       by running
+
          pcre2test -C


-       The  three  libraries  contain  identical sets of functions, with names
-       ending in _8,  _16,  or  _32,  respectively  (for  example,  pcre2_com-
-       pile_8()).  However,  by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or
-       32, a program that uses just one code unit width can be  written  using
+       The three libraries contain identical sets  of  functions,  with  names
+       ending  in  _8,  _16,  or  _32,  respectively  (for example, pcre2_com-
+       pile_8()). However, by defining PCRE2_CODE_UNIT_WIDTH to be 8,  16,  or
+       32,  a  program that uses just one code unit width can be written using
        generic names such as pcre2_compile(), and the documentation is written
        assuming that this is the case.


        In addition to the Perl-compatible matching function, PCRE2 contains an
-       alternative  function that matches the same compiled patterns in a dif-
+       alternative function that matches the same compiled patterns in a  dif-
        ferent way. In certain circumstances, the alternative function has some
-       advantages.   For  a discussion of the two matching algorithms, see the
+       advantages.  For a discussion of the two matching algorithms,  see  the
        pcre2matching page.


-       Details of exactly which Perl regular expression features are  and  are
-       not  supported  by  PCRE2  are  given  in  separate  documents. See the
-       pcre2pattern and pcre2compat pages. There is a syntax  summary  in  the
+       Details  of  exactly which Perl regular expression features are and are
+       not supported by  PCRE2  are  given  in  separate  documents.  See  the
+       pcre2pattern  and  pcre2compat  pages. There is a syntax summary in the
        pcre2syntax page.


-       Some  features  of PCRE2 can be included, excluded, or changed when the
-       library is built. The pcre2_config() function makes it possible  for  a
-       client  to  discover  which  features are available. The features them-
+       Some features of PCRE2 can be included, excluded, or changed  when  the
+       library  is  built. The pcre2_config() function makes it possible for a
+       client to discover which features are  available.  The  features  them-
        selves are described in the pcre2build page. Documentation about build-
-       ing  PCRE2 for various operating systems can be found in the README and
+       ing PCRE2 for various operating systems can be found in the README  and
        NON-AUTOTOOLS_BUILD files in the source distribution.


-       The libraries contains a number of undocumented internal functions  and
-       data  tables  that  are  used by more than one of the exported external
-       functions, but which are not intended  for  use  by  external  callers.
-       Their  names  all begin with "_pcre2", which hopefully will not provoke
+       The  libraries contains a number of undocumented internal functions and
+       data tables that are used by more than one  of  the  exported  external
+       functions,  but  which  are  not  intended for use by external callers.
+       Their names all begin with "_pcre2", which hopefully will  not  provoke
        any name clashes. In some environments, it is possible to control which
-       external  symbols  are  exported when a shared library is built, and in
+       external symbols are exported when a shared library is  built,  and  in
        these cases the undocumented symbols are not exported.



SECURITY CONSIDERATIONS

-       If you are using PCRE2 in a non-UTF application that permits  users  to
-       supply  arbitrary  patterns  for  compilation, you should be aware of a
+       If  you  are using PCRE2 in a non-UTF application that permits users to
+       supply arbitrary patterns for compilation, you should  be  aware  of  a
        feature that allows users to turn on UTF support from within a pattern.
-       For  example, an 8-bit pattern that begins with "(*UTF)" turns on UTF-8
-       mode, which interprets patterns and subjects as strings of  UTF-8  code
+       For example, an 8-bit pattern that begins with "(*UTF)" turns on  UTF-8
+       mode,  which  interprets patterns and subjects as strings of UTF-8 code
        units instead of individual 8-bit characters. This causes both the pat-
-       tern and any data against which it is matched to be checked  for  UTF-8
-       validity.  If the data string is very long, such a check might use suf-
-       ficiently many resources as to cause your application to  lose  perfor-
+       tern  and  any data against which it is matched to be checked for UTF-8
+       validity. If the data string is very long, such a check might use  suf-
+       ficiently  many  resources as to cause your application to lose perfor-
        mance.


-       One  way  of guarding against this possibility is to use the pcre2_pat-
-       tern_info() function  to  check  the  compiled  pattern's  options  for
-       PCRE2_UTF.  Alternatively,  you can set the PCRE2_NEVER_UTF option when
-       calling pcre2_compile(). This causes a compile time error if  the  pat-
+       One way of guarding against this possibility is to use  the  pcre2_pat-
+       tern_info()  function  to  check  the  compiled  pattern's  options for
+       PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF  option  when
+       calling  pcre2_compile().  This causes a compile time error if the pat-
        tern contains a UTF-setting sequence.


-       The  use  of Unicode properties for character types such as \d can also
-       be enabled from within the pattern, by specifying "(*UCP)".  This  fea-
+       The use of Unicode properties for character types such as \d  can  also
+       be  enabled  from within the pattern, by specifying "(*UCP)". This fea-
        ture can be disallowed by setting the PCRE2_NEVER_UCP option.


-       If  your  application  is one that supports UTF, be aware that validity
-       checking can take time. If the same data string is to be  matched  many
-       times,  you  can  use  the PCRE2_NO_UTF_CHECK option for the second and
+       If your application is one that supports UTF, be  aware  that  validity
+       checking  can  take time. If the same data string is to be matched many
+       times, you can use the PCRE2_NO_UTF_CHECK option  for  the  second  and
        subsequent matches to avoid running redundant checks.


        The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead
-       to  problems,  because  it  may leave the current matching point in the
-       middle of a multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C  op-
+       to problems, because it may leave the current  matching  point  in  the
+       middle  of a multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C op-
        tion can be used by an application to lock out the use of \C, causing a
-       compile-time error if it is encountered. It is also possible  to  build
+       compile-time  error  if it is encountered. It is also possible to build
        PCRE2 with the use of \C permanently disabled.


-       Another  way  that  performance can be hit is by running a pattern that
-       has a very large search tree against a string that  will  never  match.
-       Nested  unlimited repeats in a pattern are a common example. PCRE2 pro-
-       vides some protection against  this:  see  the  pcre2_set_match_limit()
-       function  in  the  pcre2api  page.  There  is a similar function called
+       Another way that performance can be hit is by running  a  pattern  that
+       has  a  very  large search tree against a string that will never match.
+       Nested unlimited repeats in a pattern are a common example. PCRE2  pro-
+       vides  some  protection  against  this: see the pcre2_set_match_limit()
+       function in the pcre2api page.  There  is  a  similar  function  called
        pcre2_set_depth_limit() that can be used to restrict the amount of mem-
        ory that is used.


@@ -127,14 +132,14 @@

USER DOCUMENTATION

-       The  user  documentation for PCRE2 comprises a number of different sec-
-       tions. In the "man" format, each of these is a separate "man page".  In
-       the  HTML  format, each is a separate page, linked from the index page.
-       In the plain  text  format,  the  descriptions  of  the  pcre2grep  and
+       The user documentation for PCRE2 comprises a number of  different  sec-
+       tions.  In the "man" format, each of these is a separate "man page". In
+       the HTML format, each is a separate page, linked from the  index  page.
+       In  the  plain  text  format,  the  descriptions  of  the pcre2grep and
        pcre2test programs are in files called pcre2grep.txt and pcre2test.txt,
-       respectively. The remaining sections, except for the pcre2demo  section
-       (which  is a program listing), and the short pages for individual func-
-       tions, are concatenated in pcre2.txt, for ease of searching.  The  sec-
+       respectively.  The remaining sections, except for the pcre2demo section
+       (which is a program listing), and the short pages for individual  func-
+       tions,  are  concatenated in pcre2.txt, for ease of searching. The sec-
        tions are as follows:


          pcre2              this document
@@ -160,7 +165,7 @@
          pcre2test          description of the pcre2test command
          pcre2unicode       discussion of Unicode and UTF support


-       In  the  "man"  and HTML formats, there is also a short page for each C
+       In the "man" and HTML formats, there is also a short page  for  each  C
        library function, listing its arguments and results.



@@ -170,15 +175,15 @@
        University Computing Service
        Cambridge, England.


-       Putting an actual email address here is a spam magnet. If you  want  to
-       email  me,  use  my two initials, followed by the two digits 10, at the
+       Putting  an  actual email address here is a spam magnet. If you want to
+       email me, use my two initials, followed by the two digits  10,  at  the
        domain cam.ac.uk.



REVISION

-       Last updated: 17 September 2018
-       Copyright (c) 1997-2018 University of Cambridge.
+       Last updated: 28 April 2021
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------




Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2021-04-28 14:21:38 UTC (rev 1307)
+++ code/trunk/doc/pcre2test.txt    2021-04-28 15:37:48 UTC (rev 1308)
@@ -1084,7 +1084,8 @@
        The  following  modifiers  affect the matching process or request addi-
        tional information. Some of them may also be  specified  on  a  pattern
        line  (see  above), in which case they apply to every subject line that
-       is matched against that pattern.
+       is matched against that pattern, but can be overridden by modifiers  on
+       the subject.


              aftertext                  show text after match
              allaftertext               show text after captures
@@ -1132,29 +1133,29 @@
              zero_terminate             pass the subject as zero-terminated


        The effects of these modifiers are described in the following sections.
-       When  matching  via the POSIX wrapper API, the aftertext, allaftertext,
-       and ovector subject modifiers work as described below. All other  modi-
+       When matching via the POSIX wrapper API, the  aftertext,  allaftertext,
+       and  ovector subject modifiers work as described below. All other modi-
        fiers are either ignored, with a warning message, or cause an error.


    Showing more text


-       The  aftertext modifier requests that as well as outputting the part of
+       The aftertext modifier requests that as well as outputting the part  of
        the subject string that matched the entire pattern, pcre2test should in
        addition output the remainder of the subject string. This is useful for
        tests where the subject contains multiple copies of the same substring.
-       The  allaftertext  modifier  requests the same action for captured sub-
+       The allaftertext modifier requests the same action  for  captured  sub-
        strings as well as the main matched substring. In each case the remain-
        der is output on the following line with a plus character following the
        capture number.


-       The allusedtext modifier requests that all the text that was  consulted
-       during  a  successful pattern match by the interpreter should be shown,
-       for both full and partial matches. This feature is  not  supported  for
-       JIT  matching,  and if requested with JIT it is ignored (with a warning
-       message). Setting this modifier affects the output if there is a  look-
-       behind  at  the start of a match, or, for a complete match, a lookahead
+       The  allusedtext modifier requests that all the text that was consulted
+       during a successful pattern match by the interpreter should  be  shown,
+       for  both  full  and partial matches. This feature is not supported for
+       JIT matching, and if requested with JIT it is ignored (with  a  warning
+       message).  Setting this modifier affects the output if there is a look-
+       behind at the start of a match, or, for a complete match,  a  lookahead
        at the end, or if \K is used in the pattern. Characters that precede or
-       follow  the start and end of the actual match are indicated in the out-
+       follow the start and end of the actual match are indicated in the  out-
        put by '<' or '>' characters underneath them.  Here is an example:


            re> /(?<=pqr)abc(?=xyz)/
@@ -1165,16 +1166,16 @@
          Partial match: pqrabcxy
                         <<<


-       The first, complete match shows that the matched string is "abc",  with
-       the  preceding  and  following strings "pqr" and "xyz" having been con-
-       sulted during the match (when processing the assertions).  The  partial
+       The  first, complete match shows that the matched string is "abc", with
+       the preceding and following strings "pqr" and "xyz"  having  been  con-
+       sulted  during  the match (when processing the assertions). The partial
        match can indicate only the preceding string.


-       The  startchar  modifier  requests  that the starting character for the
-       match be indicated, if it is different to  the  start  of  the  matched
+       The startchar modifier requests that the  starting  character  for  the
+       match  be  indicated,  if  it  is different to the start of the matched
        string. The only time when this occurs is when \K has been processed as
        part of the match. In this situation, the output for the matched string
-       is  displayed  from  the  starting  character instead of from the match
+       is displayed from the starting character  instead  of  from  the  match
        point, with circumflex characters under the earlier characters. For ex-
        ample:


@@ -1183,7 +1184,7 @@
           0: abcxyz
              ^^^


-       Unlike  allusedtext, the startchar modifier can be used with JIT.  How-
+       Unlike allusedtext, the startchar modifier can be used with JIT.   How-
        ever, these two modifiers are mutually exclusive.


    Showing the value of all capture groups
@@ -1191,9 +1192,9 @@
        The allcaptures modifier requests that the values of all potential cap-
        tured parentheses be output after a match. By default, only those up to
        the highest one actually used in the match are output (corresponding to
-       the  return  code from pcre2_match()). Groups that did not take part in
-       the match are output as "<unset>". This modifier is  not  relevant  for
-       DFA  matching (which does no capturing) and does not apply when replace
+       the return code from pcre2_match()). Groups that did not take  part  in
+       the  match  are  output as "<unset>". This modifier is not relevant for
+       DFA matching (which does no capturing) and does not apply when  replace
        is specified; it is ignored, with a warning message, if present.


    Showing the entire ovector, for all outcomes
@@ -1200,53 +1201,53 @@


        The allvector modifier requests that the entire ovector be shown, what-
        ever the outcome of the match. Compare allcaptures, which shows only up
-       to the maximum number of capture groups for the pattern, and then  only
-       for  a successful complete non-DFA match. This modifier, which acts af-
-       ter any match result, and also for DFA matching, provides  a  means  of
-       checking  that there are no unexpected modifications to ovector fields.
-       Before each match attempt, the ovector is filled with a special  value,
-       and  if  this  is  found  in  both  elements of a capturing pair, "<un-
-       changed>" is output. After a successful  match,  this  applies  to  all
-       groups  after the maximum capture group for the pattern. In other cases
-       it applies to the entire ovector. After a partial match, the first  two
-       elements  are  the only ones that should be set. After a DFA match, the
-       amount of ovector that is used depends on the number  of  matches  that
+       to  the maximum number of capture groups for the pattern, and then only
+       for a successful complete non-DFA match. This modifier, which acts  af-
+       ter  any  match  result, and also for DFA matching, provides a means of
+       checking that there are no unexpected modifications to ovector  fields.
+       Before  each match attempt, the ovector is filled with a special value,
+       and if this is found in  both  elements  of  a  capturing  pair,  "<un-
+       changed>"  is  output.  After  a  successful match, this applies to all
+       groups after the maximum capture group for the pattern. In other  cases
+       it  applies to the entire ovector. After a partial match, the first two
+       elements are the only ones that should be set. After a DFA  match,  the
+       amount  of  ovector  that is used depends on the number of matches that
        were found.


    Testing pattern callouts


-       A  callout function is supplied when pcre2test calls the library match-
-       ing functions, unless callout_none is specified. Its behaviour  can  be
-       controlled  by  various  modifiers  listed above whose names begin with
-       callout_. Details are given in the section entitled  "Callouts"  below.
-       Testing  callouts  from  pcre2_substitute()  is  decribed separately in
+       A callout function is supplied when pcre2test calls the library  match-
+       ing  functions,  unless callout_none is specified. Its behaviour can be
+       controlled by various modifiers listed above  whose  names  begin  with
+       callout_.  Details  are given in the section entitled "Callouts" below.
+       Testing callouts from  pcre2_substitute()  is  decribed  separately  in
        "Testing the substitution function" below.


    Finding all matches in a string


        Searching for all possible matches within a subject can be requested by
-       the  global  or altglobal modifier. After finding a match, the matching
-       function is called again to search the remainder of  the  subject.  The
-       difference  between  global  and  altglobal is that the former uses the
-       start_offset argument to pcre2_match() or  pcre2_dfa_match()  to  start
-       searching  at  a new point within the entire string (which is what Perl
+       the global or altglobal modifier. After finding a match,  the  matching
+       function  is  called  again to search the remainder of the subject. The
+       difference between global and altglobal is that  the  former  uses  the
+       start_offset  argument  to  pcre2_match() or pcre2_dfa_match() to start
+       searching at a new point within the entire string (which is  what  Perl
        does), whereas the latter passes over a shortened subject. This makes a
        difference to the matching process if the pattern begins with a lookbe-
        hind assertion (including \b or \B).


-       If an empty string  is  matched,  the  next  match  is  done  with  the
+       If  an  empty  string  is  matched,  the  next  match  is done with the
        PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
        for another, non-empty, match at the same point in the subject. If this
-       match  fails, the start offset is advanced, and the normal match is re-
-       tried. This imitates the way Perl handles such cases when using the  /g
-       modifier  or  the  split()  function. Normally, the start offset is ad-
-       vanced by one character, but if the newline convention recognizes  CRLF
-       as  a  newline,  and the current character is CR followed by LF, an ad-
+       match fails, the start offset is advanced, and the normal match is  re-
+       tried.  This imitates the way Perl handles such cases when using the /g
+       modifier or the split() function. Normally, the  start  offset  is  ad-
+       vanced  by one character, but if the newline convention recognizes CRLF
+       as a newline, and the current character is CR followed by  LF,  an  ad-
        vance of two characters occurs.


    Testing substring extraction functions


-       The copy  and  get  modifiers  can  be  used  to  test  the  pcre2_sub-
+       The  copy  and  get  modifiers  can  be  used  to  test  the pcre2_sub-
        string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
        given more than once, and each can specify a capture group name or num-
        ber, for example:
@@ -1253,29 +1254,34 @@


           abcd\=copy=1,copy=3,get=G1


-       If  the  #subject command is used to set default copy and/or get lists,
-       these can be unset by specifying a negative number to cancel  all  num-
+       If the #subject command is used to set default copy and/or  get  lists,
+       these  can  be unset by specifying a negative number to cancel all num-
        bered groups and an empty name to cancel all named groups.


-       The  getall  modifier  tests pcre2_substring_list_get(), which extracts
+       The getall modifier tests  pcre2_substring_list_get(),  which  extracts
        all captured substrings.


-       If the subject line is successfully matched, the  substrings  extracted
-       by  the  convenience  functions  are  output  with C, G, or L after the
-       string number instead of a colon. This is in  addition  to  the  normal
-       full  list.  The string length (that is, the return from the extraction
+       If  the  subject line is successfully matched, the substrings extracted
+       by the convenience functions are output with  C,  G,  or  L  after  the
+       string  number  instead  of  a colon. This is in addition to the normal
+       full list. The string length (that is, the return from  the  extraction
        function) is given in parentheses after each substring, followed by the
        name when the extraction was by name.


    Testing the substitution function


-       If  the  replace  modifier  is  set, the pcre2_substitute() function is
-       called instead of one of the matching functions (or after one  call  of
-       pcre2_match()  in  the case of PCRE2_SUBSTITUTE_MATCHED). Note that re-
-       placement strings cannot contain commas, because a comma signifies  the
-       end  of  a  modifier. This is not thought to be an issue in a test pro-
+       If the replace modifier is  set,  the  pcre2_substitute()  function  is
+       called  instead  of one of the matching functions (or after one call of
+       pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note  that  re-
+       placement  strings cannot contain commas, because a comma signifies the
+       end of a modifier. This is not thought to be an issue in  a  test  pro-
        gram.


+       Specifying  a  completely  empty replacement string disables this modi-
+       fier.  However, it is possible to specify an empty replacement by  pro-
+       viding  a buffer length, as described below, for an otherwise empty re-
+       placement.
+
        Unlike subject strings, pcre2test does not process replacement  strings
        for  escape  sequences. In UTF mode, a replacement string is checked to
        see if it is a valid UTF-8 string. If so, it is correctly converted  to
@@ -1929,5 +1935,5 @@


REVISION

-       Last updated: 14 September 2020
-       Copyright (c) 1997-2020 University of Cambridge.
+       Last updated: 28 April 2021
+       Copyright (c) 1997-2021 University of Cambridge.