[Pcre-svn] [605] code/trunk: Add pcre2_code_copy_with

Autore: Subversion repository
Data:
To: pcre-svn
Oggetto: [Pcre-svn] [605] code/trunk: Add pcre2_code_copy_with_tables().

Revision: 605

          http://www.exim.org/viewvc/pcre2?view=rev&revision=605
Author:   ph10
Date:     2016-11-22 15:37:02 +0000 (Tue, 22 Nov 2016)
Log Message:
-----------
Add pcre2_code_copy_with_tables().

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/Makefile.am
    code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt
    code/trunk/doc/html/README.txt
    code/trunk/doc/html/index.html
    code/trunk/doc/html/pcre2_code_copy.html
    code/trunk/doc/html/pcre2_set_max_pattern_length.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2build.html
    code/trunk/doc/html/pcre2callout.html
    code/trunk/doc/html/pcre2compat.html
    code/trunk/doc/html/pcre2grep.html
    code/trunk/doc/html/pcre2limits.html
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/html/pcre2syntax.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/index.html.src
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_code_copy.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2grep.txt
    code/trunk/doc/pcre2test.txt
    code/trunk/src/pcre2.h
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_compile.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput20
    code/trunk/testdata/testoutput20

Added Paths:
-----------
    code/trunk/doc/html/pcre2_code_copy_with_tables.html
    code/trunk/doc/pcre2_code_copy_with_tables.3

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/ChangeLog    2016-11-22 15:37:02 UTC (rev 605)
@@ -181,7 +181,10 @@

27. In pcre2test, give some offset information for errors in hex patterns.

+28. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to
+pcre2test for testing it.

+
Version 10.22 29-July-2016
--------------------------

@@ -250,7 +253,7 @@
gcc's -Wconversion (which still throws up a lot).

15. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test
-for testing it.
+for testing it.

16. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of
regerror(). When the error buffer is too small, my version of snprintf() puts a

Modified: code/trunk/Makefile.am
===================================================================
--- code/trunk/Makefile.am    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/Makefile.am    2016-11-22 15:37:02 UTC (rev 605)
@@ -25,6 +25,7 @@
   doc/html/pcre2.html \
   doc/html/pcre2_callout_enumerate.html \
   doc/html/pcre2_code_copy.html \
+  doc/html/pcre2_code_copy_with_tables.html \
   doc/html/pcre2_code_free.html \
   doc/html/pcre2_compile.html \
   doc/html/pcre2_compile_context_copy.html \
@@ -107,6 +108,7 @@
   doc/pcre2.3 \
   doc/pcre2_callout_enumerate.3 \
   doc/pcre2_code_copy.3 \
+  doc/pcre2_code_copy_with_tables.3 \
   doc/pcre2_code_free.3 \
   doc/pcre2_compile.3 \
   doc/pcre2_compile_context_copy.3 \

Modified: code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt
===================================================================
--- code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt    2016-11-22 15:37:02 UTC (rev 605)
@@ -174,7 +174,11 @@

 (11) If you want to use the pcre2grep command, compile and link
      src/pcre2grep.c; it uses only the basic 8-bit PCRE2 library (it does not
-     need the pcre2posix library).
+     need the pcre2posix library). If you have built the PCRE2 library with JIT
+     support by defining SUPPORT_JIT in src/config.h, you can also define
+     SUPPORT_PCRE2GREP_JIT, which causes pcre2grep to make use of JIT (unless
+     it is run with --no-jit). If you define SUPPORT_PCRE2GREP_JIT without
+     defining SUPPORT_JIT, pcre2grep does not try to make use of JIT.

STACK SIZE IN WINDOWS ENVIRONMENTS
@@ -389,4 +393,4 @@
recommended download site.

=============================
-Last Updated: 16 July 2015
+Last Updated: 13 October 2016

Modified: code/trunk/doc/html/README.txt
===================================================================
--- code/trunk/doc/html/README.txt    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/README.txt    2016-11-22 15:37:02 UTC (rev 605)
@@ -44,7 +44,7 @@

The distribution does contain a set of C wrapper functions for the 8-bit
library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2posix. Note that this
+man page). These can be found in a library called libpcre2-posix. Note that this
just provides a POSIX calling interface to PCRE2; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE2's facilities.
@@ -58,8 +58,8 @@
If you are using the POSIX interface to PCRE2 and there is already a POSIX
regex library installed on your system, as well as worrying about the regex.h
header file (as mentioned above), you must also take care when linking programs
-to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
-pick up the POSIX functions of the same name from the other library.
+to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
+may pick up the POSIX functions of the same name from the other library.

One way of avoiding this confusion is to compile PCRE2 with the addition of
-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
@@ -204,13 +204,6 @@
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
--enable-newline-is-any to the "configure" command, respectively.

- If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
- the standard tests will fail, because the lines in the test files end with
- LF. Even if the files are edited to change the line endings, there are likely
- to be some failures. With --enable-newline-is-anycrlf or
- --enable-newline-is-any, many tests should succeed, but there may be some
- failures.
-
. By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers
to be the end of a line (see above). However, the caller of PCRE2 can
@@ -253,13 +246,13 @@
sizes in the pcre2stack man page.

. In the 8-bit library, the default maximum compiled pattern size is around
- 64K. You can increase this by adding --with-link-size=3 to the "configure"
- command. PCRE2 then uses three bytes instead of two for offsets to different
- parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
- the same as --with-link-size=4, which (in both libraries) uses four-byte
- offsets. Increasing the internal link size reduces performance in the 8-bit
- and 16-bit libraries. In the 32-bit library, the link size setting is
- ignored, as 4-byte offsets are always used.
+ 64K bytes. You can increase this by adding --with-link-size=3 to the
+ "configure" command. PCRE2 then uses three bytes instead of two for offsets
+ to different parts of the compiled pattern. In the 16-bit library,
+ --with-link-size=3 is the same as --with-link-size=4, which (in both
+ libraries) uses four-byte offsets. Increasing the internal link size reduces
+ performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
+ link size setting is ignored, as 4-byte offsets are always used.

. You can build PCRE2 so that its internal match() function that is called from
pcre2_match() does not call itself recursively. Instead, it uses memory
@@ -339,12 +332,23 @@

Of course, the relevant libraries must be installed on your system.

-. The default size (in bytes) of the internal buffer used by pcre2grep can be
- set by, for example:
+. The default starting size (in bytes) of the internal buffer used by pcre2grep
+ can be set by, for example:

--with-pcre2grep-bufsize=51200

- The value must be a plain integer. The default is 20480.
+ The value must be a plain integer. The default is 20480. The amount of memory
+ used by pcre2grep is actually three times this number, to allow for "before"
+ and "after" lines. If very long lines are encountered, the buffer is
+ automatically enlarged, up to a fixed maximum size.
+
+. The default maximum size of pcre2grep's internal buffer can be set by, for
+ example:
+
+ --with-pcre2grep-max-bufsize=2097152
+
+ The default is either 1048576 or the value of --with-pcre2grep-bufsize,
+ whichever is the larger.

. It is possible to compile pcre2test so that it links with the libreadline
or libedit libraries, by specifying, respectively,
@@ -368,6 +372,22 @@
If you get error messages about missing functions tgetstr, tgetent, tputs,
tgetflag, or tgoto, this is the problem, and linking with the ncurses library
should fix it.
+
+. There is a special option called --enable-fuzz-support for use by people who
+ want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
+ library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
+ be built, but not installed. This contains a single function called
+ LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
+ length of the string. When called, this function tries to compile the string
+ as a pattern, and if that succeeds, to match it. This is done both with no
+ options and with some random options bits that are generated from the string.
+ Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
+ be created. This is normally run under valgrind or used when PCRE2 is
+ compiled with address sanitizing enabled. It calls the fuzzing function and
+ outputs information about it is doing. The input strings are specified by
+ arguments: if an argument starts with "=" the rest of it is a literal input
+ string. Otherwise, it is assumed to be a file name, and the contents of the
+ file are the test string.

The "configure" script builds the following files for the basic C library:

@@ -543,7 +563,7 @@

Testing PCRE2
-------------
+-------------

 To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
 There is another script called RunGrepTest that tests the pcre2grep command.
@@ -757,6 +777,7 @@
   src/pcre2_xclass.c       )

   src/pcre2_printint.c     debugging function that is used by pcre2test,
+  src/pcre2_fuzzsupport.c  function for (optional) fuzzing support

   src/config.h.in          template for config.h, when built by "configure"
   src/pcre2.h.in           template for pcre2.h when built by "configure"
@@ -814,7 +835,7 @@
   libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
   libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
   libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
-  libpcre2posix.pc.in      template for libpcre2posix.pc for pkg-config
+  libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
   ltmain.sh                file used to build a libtool script
   missing                  ) common stub for a few missing GNU programs while
                            )   installing, generated by automake
@@ -845,4 +866,4 @@
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 01 April 2016
+Last updated: 01 November 2016

Modified: code/trunk/doc/html/index.html
===================================================================
--- code/trunk/doc/html/index.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/index.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -94,6 +94,9 @@
 <tr><td><a href="pcre2_code_copy.html">pcre2_code_copy</a></td>
     <td>&nbsp;&nbsp;Copy a compiled pattern</td></tr>

+<tr><td><a href="pcre2_code_copy_with_tables.html">pcre2_code_copy_with_tables</a></td>
+    <td>&nbsp;&nbsp;Copy a compiled pattern and its character tables</td></tr>
+
 <tr><td><a href="pcre2_code_free.html">pcre2_code_free</a></td>
     <td>&nbsp;&nbsp;Free a compiled pattern</td></tr>

Modified: code/trunk/doc/html/pcre2_code_copy.html
===================================================================
--- code/trunk/doc/html/pcre2_code_copy.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2_code_copy.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -28,8 +28,9 @@
 This function makes a copy of the memory used for a compiled pattern, excluding
 any memory used by the JIT compiler. Without a subsequent call to
 <b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching. The
-yield of the function is NULL if <i>code</i> is NULL or if sufficient memory
-cannot be obtained.
+pointer to the character tables is copied, not the tables themselves (see
+<b>pcre2_code_copy_with_tables()</b>). The yield of the function is NULL if
+<i>code</i> is NULL or if sufficient memory cannot be obtained.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the

Added: code/trunk/doc/html/pcre2_code_copy_with_tables.html
===================================================================
--- code/trunk/doc/html/pcre2_code_copy_with_tables.html                            (rev 0)
+++ code/trunk/doc/html/pcre2_code_copy_with_tables.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -0,0 +1,44 @@
+<html>
+<head>
+<title>pcre2_code_copy_with_tables specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_code_copy_with_tables man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function makes a copy of the memory used for a compiled pattern, excluding
+any memory used by the JIT compiler. Without a subsequent call to
+<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching. 
+Unlike <b>pcre2_code_copy()</b>, a separate copy of the character tables is also
+made, with the new code pointing to it. This memory will be automatically freed
+when <b>pcre2_code_free()</b> is called. The yield of the function is NULL if
+<i>code</i> is NULL or if sufficient memory cannot be obtained.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page and a description of the POSIX API in the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>

Modified: code/trunk/doc/html/pcre2_set_max_pattern_length.html
===================================================================
--- code/trunk/doc/html/pcre2_set_max_pattern_length.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2_set_max_pattern_length.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -26,8 +26,11 @@
 DESCRIPTION
 </b><br>
 <P>
-This function sets, in a compile context, the maximum length (in code units) of
-the pattern that can be compiled. The result is always zero.
+This function sets, in a compile context, the maximum text length (in code
+units) of the pattern that can be compiled. The result is always zero. If a
+longer pattern is passed to <b>pcre2_compile()</b> there is an immediate error
+return. The default is effectively unlimited, being the largest value a
+PCRE2_SIZE variable can hold.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the

Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2api.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -294,6 +294,9 @@
 <b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
 <br>
 <br>
+<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b>
+<br>
+<br>
 <b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
 <b>  PCRE2_SIZE <i>bufflen</i>);</b>
 <br>
@@ -567,8 +570,9 @@
 (perhaps waiting to see if the pattern is used often enough) similar logic is
 required. JIT compilation updates a pointer within the compiled code block, so
 a thread must gain unique write access to the pointer before calling
-<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> can be used
-to obtain a private copy of the compiled code.
+<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> or 
+<b>pcre2_code_copy_with_tables()</b> can be used to obtain a private copy of the
+compiled code.
 </P>
 <br><b>
 Context blocks
@@ -736,7 +740,8 @@
 <br>
 This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
 depth of parenthesis nesting in a pattern. This limit stops rogue patterns
-using up too much system stack when being compiled.
+using up too much system stack when being compiled. The limit applies to 
+parentheses of all kinds, not just capturing parentheses.
 <b>int pcre2_set_compile_recursion_guard(pcre2_compile_context *<i>ccontext</i>,</b>
 <b>  int (*<i>guard_function</i>)(uint32_t, void *), void *<i>user_data</i>);</b>
 <br>
@@ -1058,6 +1063,9 @@
 <br>
 <br>
 <b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
+<br>
+<br>
+<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b>
 </P>
 <P>
 The <b>pcre2_compile()</b> function compiles a pattern into an internal form.
@@ -1079,11 +1087,24 @@
 <a href="#jitcompiling">below),</a>
 the JIT information cannot be copied (because it is position-dependent).
 The new copy can initially be used only for non-JIT matching, though it can be
-passed to <b>pcre2_jit_compile()</b> if required. The <b>pcre2_code_copy()</b>
-function provides a way for individual threads in a multithreaded application
-to acquire a private copy of shared compiled code.
+passed to <b>pcre2_jit_compile()</b> if required. 
 </P>
 <P>
+The <b>pcre2_code_copy()</b> function provides a way for individual threads in a
+multithreaded application to acquire a private copy of shared compiled code. 
+However, it does not make a copy of the character tables used by the compiled 
+pattern; the new pattern code points to the same tables as the original code.
+(See
+<a href="#jitcompiling">"Locale Support"</a>
+below for details of these character tables.) In many applications the same
+tables are used throughout, so this behaviour is appropriate. Nevertheless, 
+there are occasions when a copy of a compiled pattern and the relevant tables 
+are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility. 
+Copies of both the code and the tables are made, with the new code pointing to 
+the new tables. The memory for the new tables is automatically freed when
+<b>pcre2_code_free()</b> is called for the new copy of the compiled code.
+</P>
+<P>
 NOTE: When one of the matching functions is called, pointers to the compiled
 pattern and the subject string are set in the match data block so that they can
 be referenced by the substring extraction functions. After running a match, you
@@ -1119,9 +1140,16 @@
 error code and an offset (number of code units) within the pattern,
 respectively, when <b>pcre2_compile()</b> returns NULL because a compilation
 error has occurred. The values are not defined when compilation is successful
-and <b>pcre2_compile()</b> returns a non-NULL value.
+and <b>pcre2_compile()</b> returns a non-NULL value. 
 </P>
 <P>
+The value returned in <i>erroroffset</i> is an indication of where in the
+pattern the error occurred. It is not necessarily the furthest point in the
+pattern that was read. For example, after the error "lookbehind assertion is
+not fixed length", the error offset points to the start of the failing
+assertion.
+</P>
+<P>
 The <b>pcre2_get_error_message()</b> function (see "Obtaining a textual error
 message"
 <a href="#geterrormessage">below)</a>
@@ -1215,8 +1243,8 @@
   PCRE2_AUTO_CALLOUT
 </pre>
 If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
-all with number 255, before each pattern item. For discussion of the callout
-facility, see the
+all with number 255, before each pattern item, except immediately before or
+after a callout in the pattern. For discussion of the callout facility, see the
 <a href="pcre2callout.html"><b>pcre2callout</b></a>
 documentation.
 <pre>
@@ -3235,7 +3263,7 @@
 </P>
 <br><a name="SEC41" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 17 June 2016
+Last updated: 22 November 2016
 <br>
 Copyright &copy; 1997-2016 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2build.html
===================================================================
--- code/trunk/doc/html/pcre2build.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2build.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -34,9 +34,10 @@
 <li><a name="TOC19" href="#SEC19">INCLUDING DEBUGGING CODE</a>
 <li><a name="TOC20" href="#SEC20">DEBUGGING WITH VALGRIND SUPPORT</a>
 <li><a name="TOC21" href="#SEC21">CODE COVERAGE REPORTING</a>
-<li><a name="TOC22" href="#SEC22">SEE ALSO</a>
-<li><a name="TOC23" href="#SEC23">AUTHOR</a>
-<li><a name="TOC24" href="#SEC24">REVISION</a>
+<li><a name="TOC22" href="#SEC22">SUPPORT FOR FUZZERS</a>
+<li><a name="TOC23" href="#SEC23">SEE ALSO</a>
+<li><a name="TOC24" href="#SEC24">AUTHOR</a>
+<li><a name="TOC25" href="#SEC25">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
 <P>
@@ -376,16 +377,19 @@
 <P>
 <b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
 scanning, in order to be able to output "before" and "after" lines when it
-finds a match. The size of the buffer is controlled by a parameter whose
-default value is 20K. The buffer itself is three times this size, but because
-of the way it is used for holding "before" lines, the longest line that is
-guaranteed to be processable is the parameter size. You can change the default
-parameter value by adding, for example,
+finds a match. The starting size of the buffer is controlled by a parameter
+whose default value is 20K. The buffer itself is three times this size, but
+because of the way it is used for holding "before" lines, the longest line that
+is guaranteed to be processable is the parameter size. If a longer line is 
+encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
+specified maximum size, whose default is 1M or the starting size, whichever is
+the larger. You can change the default parameter values by adding, for example,
 <pre>
-  --with-pcre2grep-bufsize=50K
+  --with-pcre2grep-bufsize=51200
+  --with-pcre2grep-max-bufsize=2097152 
 </pre>
-to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override this
-value by using --buffer-size on the command line.
+to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override 
+these values by using --buffer-size and --max-buffer-size on the command line.
 </P>
 <br><a name="SEC18" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
 <P>
@@ -497,11 +501,32 @@
 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
 documentation.
 </P>
-<br><a name="SEC22" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC22" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
 <P>
+There is a special option for use by people who want to run fuzzing tests on
+PCRE2:
+<pre>
+  --enable-fuzz-support
+</pre>
+At present this applies only to the 8-bit library. If set, it causes an extra
+library called libpcre2-fuzzsupport.a to be built, but not installed. This
+contains a single function called LLVMFuzzerTestOneInput() whose arguments are
+a pointer to a string and the length of the string. When called, this function
+tries to compile the string as a pattern, and if that succeeds, to match it.
+This is done both with no options and with some random options bits that are
+generated from the string. Setting --enable-fuzz-support also causes a binary
+called <b>pcre2fuzzcheck</b> to be created. This is normally run under valgrind
+or used when PCRE2 is compiled with address sanitizing enabled. It calls the
+fuzzing function and outputs information about it is doing. The input strings
+are specified by arguments: if an argument starts with "=" the rest of it is a
+literal input string. Otherwise, it is assumed to be a file name, and the
+contents of the file are the test string.
+</P>
+<br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>
+<P>
 <b>pcre2api</b>(3), <b>pcre2-config</b>(3).
 </P>
-<br><a name="SEC23" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC24" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@@ -510,9 +535,9 @@
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC24" href="#TOC1">REVISION</a><br>
+<br><a name="SEC25" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 April 2016
+Last updated: 01 November 2016
 <br>
 Copyright &copy; 1997-2016 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2callout.html
===================================================================
--- code/trunk/doc/html/pcre2callout.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2callout.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -57,11 +57,20 @@
 </pre>
 If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
 automatically inserts callouts, all with number 255, before each item in the
-pattern. For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+pattern except for immediately before or after a callout item in the pattern.
+For example, if PCRE2_AUTO_CALLOUT is used with the pattern
 <pre>
+  A(?C3)B
+</pre>
+it is processed as if it were
+<pre>
+  (?C255)A(?C3)B(?C255)   
+</pre>
+Here is a more complicated example:
+<pre>
   A(\d{2}|--)
 </pre>
-it is processed as if it were
+With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
 <br>
 <br>
 (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
@@ -107,10 +116,10 @@
   No match
 </pre>
 This indicates that when matching [bc] fails, there is no backtracking into a+
-and therefore the callouts that would be taken for the backtracks do not occur.
-You can disable the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS to
-<b>pcre2_compile()</b>, or starting the pattern with (*NO_AUTO_POSSESS). In this
-case, the output changes to this:
+(because it is being treated as a++) and therefore the callouts that would be
+taken for the backtracks do not occur. You can disable the auto-possessify
+feature by passing PCRE2_NO_AUTO_POSSESS to <b>pcre2_compile()</b>, or starting
+the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this:
 <pre>
   ---&#62;aaaa
    +0 ^        a+
@@ -235,8 +244,8 @@
 <P>
 For a numerical callout, <i>callout_string</i> is NULL, and <i>callout_number</i>
 contains the number of the callout, in the range 0-255. This is the number
-that follows (?C for manual callouts; it is 255 for automatically generated
-callouts.
+that follows (?C for callouts that part of the pattern; it is 255 for
+automatically generated callouts.
 </P>
 <br><b>
 Fields for string callouts
@@ -310,10 +319,15 @@
 </P>
 <P>
 The <i>next_item_length</i> field contains the length of the next item to be
-matched in the pattern string. When the callout immediately precedes an
-alternation bar, a closing parenthesis, or the end of the pattern, the length
-is zero. When the callout precedes an opening parenthesis, the length is that
-of the entire subpattern.
+processed in the pattern string. When the callout is at the end of the pattern,
+the length is zero. When the callout precedes an opening parenthesis, the
+length includes meta characters that follow the parenthesis. For example, in a
+callout before an assertion such as (?=ab) the length is 3. For an an
+alternation bar or a closing parenthesis, the length is one, unless a closing
+parenthesis is followed by a quantifier, in which case its length is included.
+(This changed in release 10.23. In earlier releases, before an opening
+parenthesis the length was that of the entire subpattern, and before an
+alternation bar or a closing parenthesis the length was zero.)
 </P>
 <P>
 The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
@@ -399,9 +413,9 @@
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 March 2015
+Last updated: 29 September 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2compat.html
===================================================================
--- code/trunk/doc/html/pcre2compat.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2compat.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -107,7 +107,7 @@
 one that is backtracked onto acts. For example, in the pattern
 A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
 triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
-same as PCRE2, but there are examples where it differs.
+same as PCRE2, but there are cases where it differs.
 </P>
 <P>
 11. Most backtracking verbs in assertions have their normal actions. They are
@@ -123,7 +123,7 @@
 13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
 names is not as general as Perl's. This is a consequence of the fact the PCRE2
 works internally just with numbers, using an external table to translate
-between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b)B),
+between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B),
 where the two capturing parentheses have the same number but different names,
 is not supported, and causes an error at compile time. If it were allowed, it
 would not be possible to distinguish which parentheses matched, because both
@@ -131,10 +131,11 @@
 an error is given at compile time.
 </P>
 <P>
-14. Perl recognizes comments in some places that PCRE2 does not, for example,
-between the ( and ? at the start of a subpattern. If the /x modifier is set,
-Perl allows white space between ( and ? (though current Perls warn that this is
-deprecated) but PCRE2 never does, even if the PCRE2_EXTENDED option is set.
+14. Perl used to recognize comments in some places that PCRE2 does not, for
+example, between the ( and ? at the start of a subpattern. If the /x modifier
+is set, Perl allowed white space between ( and ? though the latest Perls give 
+an error (for a while it was just deprecated). There may still be some cases 
+where Perl behaves differently.
 </P>
 <P>
 15. Perl, when in warning mode, gives warnings for character classes such as
@@ -158,45 +159,50 @@
 <br>
 (a) Although lookbehind assertions in PCRE2 must match fixed length strings,
 each alternative branch of a lookbehind assertion can match a different length
-of string. Perl requires them all to have the same length.
+of string. Perl requires them all to have the same length. 
 <br>
 <br>
-(b) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
+(b) From PCRE2 10.23, back references to groups of fixed length are supported
+in lookbehinds, provided that there is no possibility of referencing a
+non-unique number or name. Perl does not support backreferences in lookbehinds.
+<br>
+<br>
+(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
 meta-character matches only at the very end of the string.
 <br>
 <br>
-(c) A backslash followed by a letter with no special meaning is faulted. (Perl
+(d) A backslash followed by a letter with no special meaning is faulted. (Perl
 can be made to issue a warning.)
 <br>
 <br>
-(d) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
+(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
 inverted, that is, by default they are not greedy, but if followed by a
 question mark they are.
 <br>
 <br>
-(e) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
+(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
 only at the first matching position in the subject string.
 <br>
 <br>
-(f) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
+(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
 PCRE2_NO_AUTO_CAPTURE options have no Perl equivalents.
 <br>
 <br>
-(g) The \R escape sequence can be restricted to match only CR, LF, or CRLF
+(h) The \R escape sequence can be restricted to match only CR, LF, or CRLF
 by the PCRE2_BSR_ANYCRLF option.
 <br>
 <br>
-(h) The callout facility is PCRE2-specific.
+(i) The callout facility is PCRE2-specific.
 <br>
 <br>
-(i) The partial matching facility is PCRE2-specific.
+(j) The partial matching facility is PCRE2-specific.
 <br>
 <br>
-(j) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
+(k) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
 different way and is not Perl-compatible.
 <br>
 <br>
-(k) PCRE2 recognizes some special sequences such as (*CR) at the start of
+(l) PCRE2 recognizes some special sequences such as (*CR) at the start of
 a pattern that set overall options that cannot be changed within the pattern.
 </P>
 <br><b>
@@ -214,9 +220,9 @@
 REVISION
 </b><br>
 <P>
-Last updated: 15 March 2015
+Last updated: 18 October 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2grep.html
===================================================================
--- code/trunk/doc/html/pcre2grep.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2grep.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -80,13 +80,21 @@
 </P>
 <P>
 The amount of memory used for buffering files that are being scanned is
-controlled by a parameter that can be set by the <b>--buffer-size</b> option.
-The default value for this parameter is specified when <b>pcre2grep</b> is
-built, with the default default being 20K. A block of memory three times this
-size is used (to allow for buffering "before" and "after" lines). An error
-occurs if a line overflows the buffer.
+controlled by parameters that can be set by the <b>--buffer-size</b> and
+<b>--max-buffer-size</b> options. The first of these sets the size of buffer
+that is obtained at the start of processing. If an input file contains very
+long lines, a larger buffer may be needed; this is handled by automatically
+extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
+default values for these parameters are specified when <b>pcre2grep</b> is
+built, with the default defaults being 20K and 1M respectively. An error occurs
+if a line is too long and the buffer can no longer be expanded.
 </P>
 <P>
+The block of memory that is actually used is three times the "buffer size", to
+allow for buffering "before" and "after" lines. If the buffer size is too 
+small, fewer than requested "before" and "after" lines may be output.
+</P>
+<P>
 Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
 BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern
 (specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
@@ -155,12 +163,13 @@
 </P>
 <P>
 <b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
-Output <i>number</i> lines of context after each matching line. If file names
-and/or line numbers are being output, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is output between each
-group of lines, unless they are in fact contiguous in the input file. The value
-of <i>number</i> is expected to be relatively small. However, <b>pcre2grep</b>
-guarantees to have up to 8K of following text available for context output.
+Output up to <i>number</i> lines of context after each matching line. Fewer
+lines are output if the next match or the end of the file is reached, or if the
+processing buffer size has been set too small. If file names and/or line
+numbers are being output, a hyphen separator is used instead of a colon for the
+context lines. A line containing "--" is output between each group of lines,
+unless they are in fact contiguous in the input file. The value of <i>number</i>
+is expected to be relatively small. When <b>-c</b> is used, <b>-A</b> is ignored.
 </P>
 <P>
 <b>-a</b>, <b>--text</b>
@@ -169,12 +178,14 @@
 </P>
 <P>
 <b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
-Output <i>number</i> lines of context before each matching line. If file names
-and/or line numbers are being output, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is output between each
-group of lines, unless they are in fact contiguous in the input file. The value
-of <i>number</i> is expected to be relatively small. However, <b>pcre2grep</b>
-guarantees to have up to 8K of preceding text available for context output.
+Output up to <i>number</i> lines of context before each matching line. Fewer 
+lines are output if the previous match or the start of the file is within 
+<i>number</i> lines, or if the processing buffer size has been set too small. If
+file names and/or line numbers are being output, a hyphen separator is used
+instead of a colon for the context lines. A line containing "--" is output
+between each group of lines, unless they are in fact contiguous in the input
+file. The value of <i>number</i> is expected to be relatively small. When
+<b>-c</b> is used, <b>-B</b> is ignored.
 </P>
 <P>
 <b>--binary-files=</b><i>word</i>
@@ -191,8 +202,9 @@
 </P>
 <P>
 <b>--buffer-size=</b><i>number</i>
-Set the parameter that controls how much memory is used for buffering files
-that are being scanned.
+Set the parameter that controls how much memory is obtained at the start of 
+processing for buffering files that are being scanned. See also 
+<b>--max-buffer-size</b> below.
 </P>
 <P>
 <b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
@@ -202,14 +214,16 @@
 <P>
 <b>-c</b>, <b>--count</b>
 Do not output lines from the files that are being scanned; instead output the
-number of matches (or non-matches if <b>-v</b> is used) that would otherwise
-have caused lines to be shown. By default, this count is the same as the number
-of suppressed lines, but if the <b>-M</b> (multiline) option is used (without
-<b>-v</b>), there may be more suppressed lines than the number of matches.
+number of lines that would have been shown, either because they matched, or, if
+<b>-v</b> is set, because they failed to match. By default, this count is
+exactly the same as the number of lines that would have been output, but if the
+<b>-M</b> (multiline) option is used (without <b>-v</b>), there may be more
+suppressed lines than the count (that is, the number of matches).
 <br>
 <br>
 If no lines are selected, the number zero is output. If several files are are
-being scanned, a count is output for each of them. However, if the
+being scanned, a count is output for each of them and the <b>-t</b> option can 
+be used to cause a total to be output at the end. However, if the
 <b>--files-with-matches</b> option is also used, only those files whose counts
 are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
 <b>-B</b>, and <b>-C</b> options are ignored.
@@ -232,11 +246,12 @@
 <br>
 <br>
 The colour that is used can be specified by setting the environment variable
-PCRE2GREP_COLOUR or PCRE2GREP_COLOR. The value of this variable should be a
-string of two numbers, separated by a semicolon. They are copied directly into
-the control string for setting colour on a terminal, so it is your
-responsibility to ensure that they make sense. If neither of the environment
-variables is set, the default is "1;31", which gives red.
+PCRE2GREP_COLOUR or PCRE2GREP_COLOR. If neither of these are set,
+<b>pcre2grep</b> looks for GREP_COLOUR or GREP_COLOR. The value of the variable
+should be a string of two numbers, separated by a semicolon. They are copied
+directly into the control string for setting colour on a terminal, so it is
+your responsibility to ensure that they make sense. If neither of the
+environment variables is set, the default is "1;31", which gives red.
 </P>
 <P>
 <b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
@@ -321,24 +336,24 @@
 </P>
 <P>
 <b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
-Read patterns from the file, one per line, and match them against
-each line of input. What constitutes a newline when reading the file is the
-operating system's default. The <b>--newline</b> option has no effect on this
-option. Trailing white space is removed from each line, and blank lines are
-ignored. An empty file contains no patterns and therefore matches nothing. See
-also the comments about multiple patterns versus a single pattern with
-alternatives in the description of <b>-e</b> above.
+Read patterns from the file, one per line, and match them against each line of
+input. What constitutes a newline when reading the file is the operating
+system's default. The <b>--newline</b> option has no effect on this option.
+Trailing white space is removed from each line, and blank lines are ignored. An
+empty file contains no patterns and therefore matches nothing. See also the
+comments about multiple patterns versus a single pattern with alternatives in
+the description of <b>-e</b> above.
 <br>
 <br>
-If this option is given more than once, all the specified files are
-read. A data line is output if any of the patterns match it. A file name can
-be given as "-" to refer to the standard input. When <b>-f</b> is used, patterns
+If this option is given more than once, all the specified files are read. A
+data line is output if any of the patterns match it. A file name can be given
+as "-" to refer to the standard input. When <b>-f</b> is used, patterns
 specified on the command line using <b>-e</b> may also be present; they are
 tested before the file's patterns. However, no other pattern is taken from the
 command line; all arguments are treated as the names of paths to be searched.
 </P>
 <P>
-<b>--file-list</b>=<i>filename</i>
+<b>--file-list</b>=<i>filename</i> 
 Read a list of files and/or directories that are to be scanned from the given
 file, one per line. Trailing white space is removed from each line, and blank
 lines are ignored. These paths are processed before any that are listed on the
@@ -502,24 +517,26 @@
 when the PCRE2 library is compiled, with the default default being 10 million.
 </P>
 <P>
+\fB--max-buffer-size=<i>number</i>
+This limits the expansion of the processing buffer, whose initial size can be 
+set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no 
+smaller than the starting buffer size.
+</P>
+<P>
 <b>-M</b>, <b>--multiline</b>
-Allow patterns to match more than one line. When this option is given, patterns
-may usefully contain literal newline characters and internal occurrences of ^
-and $ characters. The output for a successful match may consist of more than
-one line. The first is the line in which the match started, and the last is the
-line in which the match ended. If the matched string ends with a newline
-sequence the output ends at the end of that line.
+Allow patterns to match more than one line. When this option is set, the PCRE2
+library is called in "multiline" mode. This allows a matched string to extend
+past the end of a line and continue on one or more subsequent lines. Patterns
+used with <b>-M</b> may usefully contain literal newline characters and internal
+occurrences of ^ and $ characters. The output for a successful match may
+consist of more than one line. The first line is the line in which the match
+started, and the last line is the line in which the match ended. If the matched
+string ends with a newline sequence, the output ends at the end of that line.
+If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
+match has been handled, scanning restarts at the beginning of the line after
+the one in which the match ended.
 <br>
 <br>
-When this option is set, the PCRE2 library is called in "multiline" mode. This
-allows a matched string to extend past the end of a line and continue on one or
-more subsequent lines. However, <b>pcre2grep</b> still processes the input line
-by line. Once a match has been handled, scanning restarts at the beginning of
-the next line, just as it does when <b>-M</b> is not present. This means that it
-is possible for the second or subsequent lines in a multiline match to be
-output again as part of another match.
-<br>
-<br>
 The newline sequence that separates multiple lines must be matched as part of
 the pattern. For example, to find the phrase "regular expression" in a file
 where "regular" might be at the end of a line and "expression" at the start of
@@ -533,11 +550,8 @@
 <br>
 <br>
 There is a limit to the number of lines that can be matched, imposed by the way
-that <b>pcre2grep</b> buffers the input file as it scans it. However,
-<b>pcre2grep</b> ensures that at least 8K characters or the rest of the file
-(whichever is the shorter) are available for forward matching, and similarly
-the previous 8K characters (or all the previous characters, if fewer than 8K)
-are guaranteed to be available for lookbehind assertions. The <b>-M</b> option
+that <b>pcre2grep</b> buffers the input file as it scans it. With a sufficiently
+large processing buffer, this should not be a problem, but the <b>-M</b> option
 does not work when input is read line by line (see \fP--line-buffered\fP.)
 </P>
 <P>
@@ -585,12 +599,13 @@
 Show only the part of the line that matched a pattern instead of the whole
 line. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and
 <b>-C</b> options are ignored. If there is more than one match in a line, each
-of them is shown separately. If <b>-o</b> is combined with <b>-v</b> (invert the
-sense of the match to find non-matching lines), no output is generated, but the
-return code is set appropriately. If the matched portion of the line is empty,
-nothing is output unless the file name or line number are being printed, in
-which case they are shown on an otherwise empty line. This option is mutually
-exclusive with <b>--file-offsets</b> and <b>--line-offsets</b>.
+of them is shown separately, on a separate line of output. If <b>-o</b> is
+combined with <b>-v</b> (invert the sense of the match to find non-matching
+lines), no output is generated, but the return code is set appropriately. If
+the matched portion of the line is empty, nothing is output unless the file
+name or line number are being printed, in which case they are shown on an
+otherwise empty line. This option is mutually exclusive with
+<b>--file-offsets</b> and <b>--line-offsets</b>.
 </P>
 <P>
 <b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
@@ -604,10 +619,11 @@
 match, nothing is output unless the file name or line number are being output.
 <br>
 <br>
-If this option is given multiple times, multiple substrings are output, in the
-order the options are given. For example, -o3 -o1 -o3 causes the substrings
-matched by capturing parentheses 3 and 1 and then 3 again to be output. By
-default, there is no separator (but see the next option).
+If this option is given multiple times, multiple substrings are output for each 
+match, in the order the options are given, and all on one line. For example,
+-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
+then 3 again to be output. By default, there is no separator (but see the next
+option).
 </P>
 <P>
 <b>--om-separator</b>=<i>text</i>
@@ -638,6 +654,18 @@
 found in other files.
 </P>
 <P>
+<b>-t</b>, <b>--total-count</b>
+This option is useful when scanning more than one file. If used on its own,
+<b>-t</b> suppresses all output except for a grand total number of matching
+lines (or non-matching lines if <b>-v</b> is used) in all the files. If <b>-t</b>
+is used with <b>-c</b>, a grand total is output except when the previous output
+is just one line. In other words, it is not output when just one file's count
+is listed. If file names are being output, the grand total is preceded by
+"TOTAL:". Otherwise, it appears as just another number. The <b>-t</b> option is
+ignored when used with <b>-L</b> (list files without matches), because the grand
+total would always be zero.
+</P>
+<P>
 <b>-u</b>, <b>--utf-8</b>
 Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
 with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
@@ -665,11 +693,12 @@
 <P>
 <b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
 Force the patterns to be anchored (each must start matching at the beginning of
-a line) and in addition, require them to match entire lines. This is equivalent
-to having ^ and $ characters at the start and end of each alternative top-level
-branch in every pattern. This option applies only to the patterns that are
-matched against the contents of files; it does not apply to patterns specified
-by any of the <b>--include</b> or <b>--exclude</b> options.
+a line) and in addition, require them to match entire lines. In multiline mode 
+the match may be more than one line. This is equivalent to having \A and \Z
+characters at the start and end of each alternative top-level branch in every
+pattern. This option applies only to the patterns that are matched against the
+contents of files; it does not apply to patterns specified by any of the
+<b>--include</b> or <b>--exclude</b> options.
 </P>
 <br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
 <P>
@@ -831,7 +860,7 @@
 </P>
 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 19 June 2016
+Last updated: 31 October 2016
 <br>
 Copyright &copy; 1997-2016 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2limits.html
===================================================================
--- code/trunk/doc/html/pcre2limits.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2limits.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -61,23 +61,24 @@
 There is no limit to the number of parenthesized subpatterns, but there can be
 no more than 65535 capturing subpatterns. There is, however, a limit to the
 depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
-order to limit the amount of system stack used at compile time. The limit can
-be specified when PCRE2 is built; the default is 250.
+order to limit the amount of system stack used at compile time. The default
+limit can be specified when PCRE2 is built; the default default is 250. An 
+application can change this limit by calling pcre2_set_parens_nest_limit() to 
+set the limit in a compile context.
 </P>
 <P>
-There is a limit to the number of forward references to subsequent subpatterns
-of around 200,000. Repeated forward references with fixed upper limits, for
-example, (?2){0,100} when subpattern number 2 is to the right, are included in
-the count. There is no limit to the number of backward references.
-</P>
-<P>
 The maximum length of name for a named subpattern is 32 code units, and the
 maximum number of named subpatterns is 10000.
 </P>
 <P>
 The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
-is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
+is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
+32-bit libraries.
 </P>
+<P>
+The maximum length of a string argument to a callout is the largest number a 
+32-bit unsigned integer can hold.
+</P>
 <br><b>
 AUTHOR
 </b><br>
@@ -93,9 +94,9 @@
 REVISION
 </b><br>
 <P>
-Last updated: 05 November 2015
+Last updated: 26 October 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2pattern.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -379,8 +379,7 @@
 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
 but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
 code unit following \c has a value less than 32 or greater than 126, a
-compile-time error occurs. This locks out non-printable ASCII characters in all
-modes.
+compile-time error occurs.
 </P>
 <P>
 When PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
@@ -387,24 +386,24 @@
 generate the appropriate EBCDIC code values. The \c escape is processed
 as specified for Perl in the <b>perlebcdic</b> document. The only characters
 that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
-other character provokes a compile-time error. The sequence \@ encodes
-character code 0; the letters (in either case) encode characters 1-26 (hex 01
-to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
-\? becomes either 255 (hex FF) or 95 (hex 5F).
+other character provokes a compile-time error. The sequence \c@ encodes
+character code 0; after \c the letters (in either case) encode characters 1-26
+(hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex
+1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F).
 </P>
 <P>
-Thus, apart from \?, these escapes generate the same character code values as
+Thus, apart from \c?, these escapes generate the same character code values as
 they do in an ASCII environment, though the meanings of the values mostly
-differ. For example, \G always generates code value 7, which is BEL in ASCII
+differ. For example, \cG always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 </P>
 <P>
-The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
+The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, but
 because 127 is not a control character in EBCDIC, Perl makes it generate the
 APC character. Unfortunately, there are several variants of EBCDIC. In most of
 them the APC character has the value 255 (hex FF), but in the one Perl calls
 POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
-values, PCRE2 makes \? generate 95; otherwise it generates 255.
+values, PCRE2 makes \c? generate 95; otherwise it generates 255.
 </P>
 <P>
 After \0 up to two further octal digits are read. If there are fewer than two
@@ -526,9 +525,9 @@
 Absolute and relative back references
 </b><br>
 <P>
-The sequence \g followed by an unsigned or a negative number, optionally
-enclosed in braces, is an absolute or relative back reference. A named back
-reference can be coded as \g{name}. Back references are discussed
+The sequence \g followed by a signed or unsigned number, optionally enclosed
+in braces, is an absolute or relative back reference. A named back reference
+can be coded as \g{name}. Back references are discussed
 <a href="#backreferences">later,</a>
 following the discussion of
 <a href="#subpattern">parenthesized subpatterns.</a>
@@ -1326,15 +1325,34 @@
 class such as [^a] always matches one of these characters.
 </P>
 <P>
+The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
+\V, \w, and \W may appear in a character class, and add the characters that
+they match to the class. For example, [\dABCDEF] matches any hexadecimal
+digit. In UTF modes, the PCRE2_UCP option affects the meanings of \d, \s, \w
+and their upper case partners, just as it does when they appear outside a
+character class, as described in the section entitled
+<a href="#genericchartypes">"Generic character types"</a>
+above. The escape sequence \b has a different meaning inside a character
+class; it matches the backspace character. The sequences \B, \N, \R, and \X
+are not special inside a character class. Like any other unrecognized escape
+sequences, they cause an error.
+</P>
+<P>
 The minus (hyphen) character can be used to specify a range of characters in a
 character class. For example, [d-m] matches any letter between d and m,
 inclusive. If a minus character is required in a class, it must be escaped with
 a backslash or appear in a position where it cannot be interpreted as
-indicating a range, typically as the first or last character in the class, or
-immediately after a range. For example, [b-d-z] matches letters in the range b
-to d, a hyphen character, or z.
+indicating a range, typically as the first or last character in the class,
+or immediately after a range. For example, [b-d-z] matches letters in the range
+b to d, a hyphen character, or z.
 </P>
 <P>
+Perl treats a hyphen as a literal if it appears before a POSIX class (see
+below) or a character type escape such as as \d, but gives a warning in its 
+warning mode, as this is most likely a user error. As PCRE2 has no facility for
+warning, an error is given in these cases.
+</P>
+<P>
 It is not possible to have the literal character "]" as the end character of a
 range. A pattern such as [W-]46] is interpreted as a class of two characters
 ("W" and "-") followed by a literal string "46]", so it would match "W46]" or
@@ -1344,12 +1362,6 @@
 "]" can also be used to end a range.
 </P>
 <P>
-An error is generated if a POSIX character class (see below) or an escape
-sequence other than one that defines a single character appears at a point
-where a range ending character is expected. For example, [z-\xff] is valid,
-but [A-\d] and [A-[:digit:]] are not.
-</P>
-<P>
 Ranges normally include all code points between the start and end characters,
 inclusive. They can also be used for code points specified numerically, for
 example [\000-\037]. Ranges can include any characters that are valid for the
@@ -1372,19 +1384,6 @@
 characters in both cases.
 </P>
 <P>
-The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
-\V, \w, and \W may appear in a character class, and add the characters that
-they match to the class. For example, [\dABCDEF] matches any hexadecimal
-digit. In UTF modes, the PCRE2_UCP option affects the meanings of \d, \s, \w
-and their upper case partners, just as it does when they appear outside a
-character class, as described in the section entitled
-<a href="#genericchartypes">"Generic character types"</a>
-above. The escape sequence \b has a different meaning inside a character
-class; it matches the backspace character. The sequences \B, \N, \R, and \X
-are not special inside a character class. Like any other unrecognized escape
-sequences, they cause an error.
-</P>
-<P>
 A circumflex can conveniently be used with the upper case character types to
 specify a more restricted set of characters than the matching lower case type.
 For example, the class [^\W_] matches any letter or digit, but not underscore,
@@ -1552,13 +1551,8 @@
 <P>
 When one of these option changes occurs at top level (that is, not inside
 subpattern parentheses), the change applies to the remainder of the pattern
-that follows. If the change is placed right at the start of a pattern, PCRE2
-extracts it into the global options (and it will therefore show up in data
-extracted by the <b>pcre2_pattern_info()</b> function).
-</P>
-<P>
-An option change within a subpattern (see below for a description of
-subpatterns) affects only that part of the subpattern that follows it, so
+that follows. An option change within a subpattern (see below for a description
+of subpatterns) affects only that part of the subpattern that follows it, so
 <pre>
   (a(?i)b)c
 </pre>
@@ -2093,9 +2087,9 @@
 </P>
 <P>
 Another way of avoiding the ambiguity inherent in the use of digits following a
-backslash is to use the \g escape sequence. This escape must be followed by an
-unsigned number or a negative number, optionally enclosed in braces. These
-examples are all identical:
+backslash is to use the \g escape sequence. This escape must be followed by a 
+signed or unsigned number, optionally enclosed in braces. These examples are
+all identical:
 <pre>
   (ring), \1
   (ring), \g1
@@ -2103,8 +2097,7 @@
 </pre>
 An unsigned number specifies an absolute reference without the ambiguity that
 is present in the older syntax. It is also useful when literal digits follow
-the reference. A negative number is a relative reference. Consider this
-example:
+the reference. A signed number is a relative reference. Consider this example:
 <pre>
   (abc(def)ghi)\g{-1}
 </pre>
@@ -2115,6 +2108,11 @@
 joining together fragments that contain references within themselves.
 </P>
 <P>
+The sequence \g{+1} is a reference to the next capturing subpattern. This kind 
+of forward reference can be useful it patterns that repeat. Perl does not 
+support the use of + in this way.
+</P>
+<P>
 A back reference matches whatever actually matched the capturing subpattern in
 the current subject string, rather than anything matching the subpattern
 itself (see
@@ -2214,6 +2212,14 @@
 always, does do capturing in negative assertions.)
 </P>
 <P>
+WARNING: If a positive assertion containing one or more capturing subpatterns 
+succeeds, but failure to match later in the pattern causes backtracking over 
+this assertion, the captures within the assertion are reset only if no higher 
+numbered captures are already set. This is, unfortunately, a fundamental 
+limitation of the current implementation; it may get removed in a future 
+reworking.
+</P>
+<P>
 For compatibility with Perl, most assertion subpatterns may be repeated; though
 it makes no sense to assert the same thing several times, the side effect of
 capturing parentheses may occasionally be useful. However, an assertion that
@@ -2310,20 +2316,33 @@
 assertion fails.
 </P>
 <P>
-In a UTF mode, PCRE2 does not allow the \C escape (which matches a single code
-unit even in a UTF mode) to appear in lookbehind assertions, because it makes
-it impossible to calculate the length of the lookbehind. The \X and \R
-escapes, which can match different numbers of code units, are also not
-permitted.
+In UTF-8 and UTF-16 modes, PCRE2 does not allow the \C escape (which matches a
+single code unit even in a UTF mode) to appear in lookbehind assertions,
+because it makes it impossible to calculate the length of the lookbehind. The
+\X and \R escapes, which can match different numbers of code units, are never
+permitted in lookbehinds.
 </P>
 <P>
 <a href="#subpatternsassubroutines">"Subroutine"</a>
 calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
-as the subpattern matches a fixed-length string.
-<a href="#recursion">Recursion,</a>
-however, is not supported.
+as the subpattern matches a fixed-length string. However,
+<a href="#recursion">recursion,</a>
+that is, a "subroutine" call into a group that is already active,
+is not supported.
 </P>
 <P>
+Perl does not support back references in lookbehinds. PCRE2 does support them,
+but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
+must not be set, there must be no use of (?| in the pattern (it creates
+duplicate subpattern numbers), and if the back reference is by name, the name
+must be unique. Of course, the referenced subpattern must itself be of fixed
+length. The following pattern matches words containing at least two characters
+that begin and end with the same character:
+<pre>
+   \b(\w)\w++(?&#60;=\1)
+</PRE>
+</P>
+<P>
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
 specify efficient matching of fixed-length strings at the end of subject
 strings. Consider a simple pattern such as
@@ -2459,7 +2478,9 @@
 <P>
 Perl uses the syntax (?(&#60;name&#62;)...) or (?('name')...) to test for a used
 subpattern by name. For compatibility with earlier versions of PCRE1, which had
-this facility before Perl, the syntax (?(name)...) is also recognized.
+this facility before Perl, the syntax (?(name)...) is also recognized. Note, 
+however, that undelimited names consisting of the letter R followed by digits
+are ambiguous (see the following section).
 </P>
 <P>
 Rewriting the above example to use a named subpattern gives this:
@@ -2474,30 +2495,52 @@
 Checking for pattern recursion
 </b><br>
 <P>
-If the condition is the string (R), and there is no subpattern with the name R,
-the condition is true if a recursive call to the whole pattern or any
-subpattern has been made. If digits or a name preceded by ampersand follow the
-letter R, for example:
+"Recursion" in this sense refers to any subroutine-like call from one part of
+the pattern to another, whether or not it is actually recursive. See the
+sections entitled
+<a href="#recursion">"Recursive patterns"</a>
+and
+<a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
+below for details of recursion and subpattern calls.
+</P>
+<P>
+If a condition is the string (R), and there is no subpattern with the name R,
+the condition is true if matching is currently in a recursion or subroutine
+call to the whole pattern or any subpattern. If digits follow the letter R, and
+there is no subpattern with that name, the condition is true if the most recent
+call is into a subpattern with the given number, which must exist somewhere in 
+the overall pattern. This is a contrived example that is equivalent to a+b:
 <pre>
-  (?(R3)...) or (?(R&name)...)
+  ((?(R1)a+|(?1)b))
 </pre>
-the condition is true if the most recent recursion is into a subpattern whose
-number or name is given. This condition does not check the entire recursion
-stack. If the name used in a condition of this kind is a duplicate, the test is
-applied to all subpatterns of the same name, and is true if any one of them is
-the most recent recursion.
+However, in both cases, if there is a subpattern with a matching name, the
+condition tests for its being set, as described in the section above, instead
+of testing for recursion. For example, creating a group with the name R1 by
+adding (?&#60;R1&#62;) to the above pattern completely changes its meaning.
 </P>
 <P>
+If a name preceded by ampersand follows the letter R, for example:
+<pre>
+  (?(R&name)...)
+</pre>
+the condition is true if the most recent recursion is into a subpattern of that 
+name (which must exist within the pattern).
+</P>
+<P>
+This condition does not check the entire recursion stack. It tests only the 
+current level. If the name used in a condition of this kind is a duplicate, the
+test is applied to all subpatterns of the same name, and is true if any one of
+them is the most recent recursion.
+</P>
+<P>
 At "top level", all these recursion test conditions are false.
-<a href="#recursion">The syntax for recursive patterns</a>
-is described below.
 <a name="subdefine"></a></P>
 <br><b>
 Defining subpatterns for use by reference only
 </b><br>
 <P>
-If the condition is the string (DEFINE), and there is no subpattern with the
-name DEFINE, the condition is always false. In this case, there may be only one
+If the condition is the string (DEFINE), the condition is always false, even if
+there is a group with the name DEFINE. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
 subroutines that can be referenced from elsewhere. (The use of
@@ -2965,14 +3008,24 @@
 By default, for compatibility with Perl, a name is any sequence of characters
 that does not include a closing parenthesis. The name is not processed in
 any way, and it is not possible to include a closing parenthesis in the name.
-However, if the PCRE2_ALT_VERBNAMES option is set, normal backslash processing
-is applied to verb names and only an unescaped closing parenthesis terminates
-the name. A closing parenthesis can be included in a name either as \) or
-between \Q and \E. If the PCRE2_EXTENDED option is set, unescaped whitespace
-in verb names is skipped and #-comments are recognized, exactly as in the rest
-of the pattern.
+This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result 
+is no longer Perl-compatible. 
 </P>
 <P>
+When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
+and only an unescaped closing parenthesis terminates the name. However, the 
+only backslash items that are permitted are \Q, \E, and sequences such as 
+\x{100} that define character code points. Character type escapes such as \d 
+are faulted.
+</P>
+<P>
+A closing parenthesis can be included in a name either as \) or between \Q
+and \E. In addition to backslash processing, if the PCRE2_EXTENDED option is
+also set, unescaped whitespace in verb names is skipped, and #-comments are
+recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not 
+affect verb names unless PCRE2_ALT_VERBNAMES is also set.
+</P>
+<P>
 The maximum length of a name is 255 in the 8-bit library and 65535 in the
 16-bit and 32-bit libraries. If the name is empty, that is, if the closing
 parenthesis immediately follows the colon, the effect is as if the colon were
@@ -3393,7 +3446,7 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 June 2016
+Last updated: 23 October 2016
 <br>
 Copyright &copy; 1997-2016 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2syntax.html
===================================================================
--- code/trunk/doc/html/pcre2syntax.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2syntax.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -492,6 +492,9 @@
   \n              reference by number (can be ambiguous)
   \gn             reference by number
   \g{n}           reference by number
+  \g+n            relative reference by number (PCRE2 extension)
+  \g-n            relative reference by number
+  \g{+n}          relative reference by number (PCRE2 extension) 
   \g{-n}          relative reference by number
   \k&#60;name&#62;        reference by name (Perl)
   \k'name'        reference by name (Perl)
@@ -530,14 +533,17 @@
   (?(-n)              relative reference condition
   (?(&#60;name&#62;)          named reference condition (Perl)
   (?('name')          named reference condition (Perl)
-  (?(name)            named reference condition (PCRE2)
+  (?(name)            named reference condition (PCRE2, deprecated)
   (?(R)               overall recursion condition
-  (?(Rn)              specific group recursion condition
-  (?(R&name)          specific recursion condition
+  (?(Rn)              specific numbered group recursion condition
+  (?(R&name)          specific named group recursion condition
   (?(DEFINE)          define subpattern for reference
   (?(VERSION[&#62;]=n.m)  test PCRE2 version
   (?(assert)          assertion condition
-</PRE>
+</pre>
+Note the ambiguity of (?(R) and (?(Rn) which might be named reference 
+conditions or recursion tests. Such a condition is interpreted as a reference
+condition if the relevant named group exists.
 </P>
 <br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
@@ -589,9 +595,9 @@
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 16 October 2015
+Last updated: 28 September 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/html/pcre2test.html    2016-11-22 15:37:02 UTC (rev 605)
@@ -615,6 +615,7 @@
       pushcopy                  push a copy onto the stack
       stackguard=&#60;number&#62;       test the stackguard feature
       tables=[0|1|2]            select internal tables
+      use_length                do not zero-terminate the pattern 
       utf8_input                treat input as UTF-8 
 </pre>
 The effects of these modifiers are described in the following sections.
@@ -698,6 +699,18 @@
 default values).
 </P>
 <br><b>
+Specifying the pattern's length
+</b><br>
+<P>
+By default, patterns are passed to the compiling functions as zero-terminated
+strings. When using the POSIX wrapper API, there is no other option. However,
+when using PCRE2's native API, patterns can be passed by length instead of
+being zero-terminated. The <b>use_length</b> modifier causes this to happen. 
+Using a length happens automatically (whether or not <b>use_length</b> is set)
+when <b>hex</b> is set, because patterns specified in hexadecimal may contain
+binary zeros.
+</P>
+<br><b>
 Specifying pattern characters in hexadecimal
 </b><br>
 <P>
@@ -720,10 +733,10 @@
 mutually exclusive.
 </P>
 <P>
-By default, <b>pcre2test</b> passes patterns as zero-terminated strings to
-<b>pcre2_compile()</b>, giving the length as PCRE2_ZERO_TERMINATED. However, for
-patterns specified with the <b>hex</b> modifier, the actual length of the
-pattern is passed.
+The POSIX API cannot be used with patterns specified in hexadecimal because 
+they may contain binary zeros, which conflicts with <b>regcomp()</b>'s 
+requirement for a zero-terminated string. Such patterns are always passed to 
+<b>pcre2_compile()</b> as a string with a length, not as zero-terminated.
 </P>
 <br><b>
 Specifying wide characters in 16-bit and 32-bit modes
@@ -1753,7 +1766,7 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 02 August 2016
+Last updated: 04 November 2016
 <br>
 Copyright &copy; 1997-2016 University of Cambridge.
 <br>

Modified: code/trunk/doc/index.html.src
===================================================================
--- code/trunk/doc/index.html.src    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/index.html.src    2016-11-22 15:37:02 UTC (rev 605)
@@ -94,6 +94,9 @@
 <tr><td><a href="pcre2_code_copy.html">pcre2_code_copy</a></td>
     <td>&nbsp;&nbsp;Copy a compiled pattern</td></tr>

+<tr><td><a href="pcre2_code_copy_with_tables.html">pcre2_code_copy_with_tables</a></td>
+    <td>&nbsp;&nbsp;Copy a compiled pattern and its character tables</td></tr>
+
 <tr><td><a href="pcre2_code_free.html">pcre2_code_free</a></td>
     <td>&nbsp;&nbsp;Free a compiled pattern</td></tr>

Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/pcre2.txt    2016-11-22 15:37:02 UTC (rev 605)
@@ -379,6 +379,8 @@

        pcre2_code *pcre2_code_copy(const pcre2_code *code);

+       pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code);
+
        int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
          PCRE2_SIZE bufflen);

@@ -626,8 +628,8 @@
        similar logic is required. JIT compilation updates a pointer within the
        compiled code block, so a thread must gain unique write access  to  the
        pointer     before    calling    pcre2_jit_compile().    Alternatively,
-       pcre2_code_copy() can be used to obtain a private copy of the  compiled
-       code.
+       pcre2_code_copy()  or  pcre2_code_copy_with_tables()  can  be  used  to
+       obtain a private copy of the compiled code.

    Context blocks

@@ -789,7 +791,9 @@

        This parameter ajusts the limit, set when PCRE2 is built (default 250),
        on the depth of parenthesis nesting in  a  pattern.  This  limit  stops
-       rogue patterns using up too much system stack when being compiled.
+       rogue  patterns using up too much system stack when being compiled. The
+       limit applies to parentheses of all kinds, not just capturing parenthe-
+       ses.

        int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext,
          int (*guard_function)(uint32_t, void *), void *user_data);
@@ -1102,6 +1106,8 @@

        pcre2_code *pcre2_code_copy(const pcre2_code *code);

+       pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code);
+
        The pcre2_compile() function compiles a pattern into an internal  form.
        The  pattern  is  defined  by a pointer to a string of code units and a
        length. If the pattern is zero-terminated, the length can be  specified
@@ -1120,54 +1126,71 @@
        However,  if  the  code  has  been  processed  by the JIT compiler (see
        below), the JIT information cannot be copied (because it  is  position-
        dependent).  The new copy can initially be used only for non-JIT match-
-       ing, though it can be passed to pcre2_jit_compile()  if  required.  The
-       pcre2_code_copy()  function  provides a way for individual threads in a
-       multithreaded application to acquire a private copy of shared  compiled
-       code.
+       ing, though it can be passed to pcre2_jit_compile() if required.

-       NOTE:  When  one  of  the matching functions is called, pointers to the
+       The pcre2_code_copy() function provides a way for individual threads in
+       a  multithreaded  application  to acquire a private copy of shared com-
+       piled code.  However, it does not make a copy of the  character  tables
+       used  by  the compiled pattern; the new pattern code points to the same
+       tables as the original code.  (See "Locale Support" below  for  details
+       of  these  character  tables.) In many applications the same tables are
+       used throughout, so this behaviour is appropriate. Nevertheless,  there
+       are occasions when a copy of a compiled pattern and the relevant tables
+       are needed. The pcre2_code_copy_with_tables() provides  this  facility.
+       Copies  of  both  the  code  and the tables are made, with the new code
+       pointing to the new tables. The memory for the new tables is  automati-
+       cally  freed  when  pcre2_code_free() is called for the new copy of the
+       compiled code.
+
+       NOTE: When one of the matching functions is  called,  pointers  to  the
        compiled pattern and the subject string are set in the match data block
-       so  that  they can be referenced by the substring extraction functions.
-       After running a match, you must not free a compiled pattern (or a  sub-
-       ject  string)  until  after all operations on the match data block have
+       so that they can be referenced by the substring  extraction  functions.
+       After  running a match, you must not free a compiled pattern (or a sub-
+       ject string) until after all operations on the match  data  block  have
        taken place.

-       The options argument for pcre2_compile() contains various bit  settings
-       that  affect  the  compilation.  It  should  be  zero if no options are
-       required. The available options are described below. Some of  them  (in
-       particular,  those  that  are  compatible with Perl, but some others as
-       well) can also be set and  unset  from  within  the  pattern  (see  the
+       The  options argument for pcre2_compile() contains various bit settings
+       that affect the compilation. It  should  be  zero  if  no  options  are
+       required.  The  available options are described below. Some of them (in
+       particular, those that are compatible with Perl,  but  some  others  as
+       well)  can  also  be  set  and  unset  from within the pattern (see the
        detailed description in the pcre2pattern documentation).

-       For  those options that can be different in different parts of the pat-
-       tern, the contents of the options argument specifies their settings  at
-       the  start  of  compilation.  The PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK
+       For those options that can be different in different parts of the  pat-
+       tern,  the contents of the options argument specifies their settings at
+       the start of compilation.  The  PCRE2_ANCHORED  and  PCRE2_NO_UTF_CHECK
        options can be set at the time of matching as well as at compile time.

-       Other, less frequently required compile-time parameters  (for  example,
+       Other,  less  frequently required compile-time parameters (for example,
        the newline setting) can be provided in a compile context (as described
        above).

        If errorcode or erroroffset is NULL, pcre2_compile() returns NULL imme-
-       diately.  Otherwise,  the  variables to which these point are set to an
-       error code and an offset (number of code  units)  within  the  pattern,
-       respectively,  when  pcre2_compile() returns NULL because a compilation
+       diately. Otherwise, the variables to which these point are  set  to  an
+       error  code  and  an  offset (number of code units) within the pattern,
+       respectively, when pcre2_compile() returns NULL because  a  compilation
        error has occurred. The values are not defined when compilation is suc-
        cessful and pcre2_compile() returns a non-NULL value.

-       The  pcre2_get_error_message() function (see "Obtaining a textual error
-       message" below) provides a textual message for each error code.  Compi-
+       The value returned in erroroffset is an indication of where in the pat-
+       tern  the  error  occurred. It is not necessarily the furthest point in
+       the pattern that was read. For example,  after  the  error  "lookbehind
+       assertion is not fixed length", the error offset points to the start of
+       the failing assertion.
+
+       The pcre2_get_error_message() function (see "Obtaining a textual  error
+       message"  below) provides a textual message for each error code. Compi-
        lation errors have positive error codes; UTF formatting error codes are
-       negative. For an invalid UTF-8 or UTF-16 string, the offset is that  of
+       negative.  For an invalid UTF-8 or UTF-16 string, the offset is that of
        the first code unit of the failing character.

-       Some  errors are not detected until the whole pattern has been scanned;
-       in these cases, the offset passed back is the length  of  the  pattern.
-       Note  that  the  offset is in code units, not characters, even in a UTF
+       Some errors are not detected until the whole pattern has been  scanned;
+       in  these  cases,  the offset passed back is the length of the pattern.
+       Note that the offset is in code units, not characters, even  in  a  UTF
        mode. It may sometimes point into the middle of a UTF-8 or UTF-16 char-
        acter.

-       This  code  fragment shows a typical straightforward call to pcre2_com-
+       This code fragment shows a typical straightforward call  to  pcre2_com-
        pile():

          pcre2_code *re;
@@ -1181,28 +1204,28 @@
            &erroffset,             /* for error offset */
            NULL);                  /* no compile context */

-       The following names for option bits are defined in the  pcre2.h  header
+       The  following  names for option bits are defined in the pcre2.h header
        file:

          PCRE2_ANCHORED

        If this bit is set, the pattern is forced to be "anchored", that is, it
-       is constrained to match only at the first matching point in the  string
-       that  is being searched (the "subject string"). This effect can also be
-       achieved by appropriate constructs in the pattern itself, which is  the
+       is  constrained to match only at the first matching point in the string
+       that is being searched (the "subject string"). This effect can also  be
+       achieved  by appropriate constructs in the pattern itself, which is the
        only way to do it in Perl.

          PCRE2_ALLOW_EMPTY_CLASS

-       By  default, for compatibility with Perl, a closing square bracket that
-       immediately follows an opening one is treated as a data  character  for
-       the  class.  When  PCRE2_ALLOW_EMPTY_CLASS  is  set,  it terminates the
+       By default, for compatibility with Perl, a closing square bracket  that
+       immediately  follows  an opening one is treated as a data character for
+       the class. When  PCRE2_ALLOW_EMPTY_CLASS  is  set,  it  terminates  the
        class, which therefore contains no characters and so can never match.

          PCRE2_ALT_BSUX

-       This option request alternative handling  of  three  escape  sequences,
-       which  makes  PCRE2's  behaviour more like ECMAscript (aka JavaScript).
+       This  option  request  alternative  handling of three escape sequences,
+       which makes PCRE2's behaviour more like  ECMAscript  (aka  JavaScript).
        When it is set:

        (1) \U matches an upper case "U" character; by default \U causes a com-
@@ -1209,13 +1232,13 @@
        pile time error (Perl uses \U to upper case subsequent characters).

        (2) \u matches a lower case "u" character unless it is followed by four
-       hexadecimal digits, in which case the hexadecimal  number  defines  the
-       code  point  to match. By default, \u causes a compile time error (Perl
+       hexadecimal  digits,  in  which case the hexadecimal number defines the
+       code point to match. By default, \u causes a compile time  error  (Perl
        uses it to upper case the following character).

-       (3) \x matches a lower case "x" character unless it is followed by  two
-       hexadecimal  digits,  in  which case the hexadecimal number defines the
-       code point to match. By default, as in Perl, a  hexadecimal  number  is
+       (3)  \x matches a lower case "x" character unless it is followed by two
+       hexadecimal digits, in which case the hexadecimal  number  defines  the
+       code  point  to  match. By default, as in Perl, a hexadecimal number is
        always expected after \x, but it may have zero, one, or two digits (so,
        for example, \xz matches a binary zero character followed by z).

@@ -1222,30 +1245,31 @@
          PCRE2_ALT_CIRCUMFLEX

        In  multiline  mode  (when  PCRE2_MULTILINE  is  set),  the  circumflex
-       metacharacter  matches at the start of the subject (unless PCRE2_NOTBOL
-       is set), and also after any internal  newline.  However,  it  does  not
+       metacharacter matches at the start of the subject (unless  PCRE2_NOTBOL
+       is  set),  and  also  after  any internal newline. However, it does not
        match after a newline at the end of the subject, for compatibility with
-       Perl. If you want a multiline circumflex also to match after  a  termi-
+       Perl.  If  you want a multiline circumflex also to match after a termi-
        nating newline, you must set PCRE2_ALT_CIRCUMFLEX.

          PCRE2_ALT_VERBNAMES

-       By  default, for compatibility with Perl, the name in any verb sequence
-       such as (*MARK:NAME) is  any  sequence  of  characters  that  does  not
-       include  a  closing  parenthesis. The name is not processed in any way,
-       and it is not possible to include a closing parenthesis  in  the  name.
-       However,  if  the  PCRE2_ALT_VERBNAMES  option is set, normal backslash
-       processing is applied to verb  names  and  only  an  unescaped  closing
-       parenthesis  terminates the name. A closing parenthesis can be included
-       in a name either as \) or between \Q  and  \E.  If  the  PCRE2_EXTENDED
+       By default, for compatibility with Perl, the name in any verb  sequence
+       such  as  (*MARK:NAME)  is  any  sequence  of  characters that does not
+       include a closing parenthesis. The name is not processed  in  any  way,
+       and  it  is  not possible to include a closing parenthesis in the name.
+       However, if the PCRE2_ALT_VERBNAMES option  is  set,  normal  backslash
+       processing  is  applied  to  verb  names  and only an unescaped closing
+       parenthesis terminates the name. A closing parenthesis can be  included
+       in  a  name  either  as  \) or between \Q and \E. If the PCRE2_EXTENDED
        option is set, unescaped whitespace in verb names is skipped and #-com-
        ments are recognized, exactly as in the rest of the pattern.

          PCRE2_AUTO_CALLOUT

-       If this bit  is  set,  pcre2_compile()  automatically  inserts  callout
-       items, all with number 255, before each pattern item. For discussion of
-       the callout facility, see the pcre2callout documentation.
+       If  this  bit  is  set,  pcre2_compile()  automatically inserts callout
+       items, all with number 255, before each pattern  item,  except  immedi-
+       ately  before  or after a callout in the pattern. For discussion of the
+       callout facility, see the pcre2callout documentation.

          PCRE2_CASELESS

@@ -3151,7 +3175,7 @@

REVISION

-       Last updated: 17 June 2016
+       Last updated: 22 November 2016
        Copyright (c) 1997-2016 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -3506,16 +3530,21 @@

        pcre2grep  uses an internal buffer to hold a "window" on the file it is
        scanning, in order to be able to output "before" and "after" lines when
-       it  finds  a match. The size of the buffer is controlled by a parameter
-       whose default value is 20K. The buffer itself is three times this size,
-       but because of the way it is used for holding "before" lines, the long-
-       est line that is guaranteed to be processable is  the  parameter  size.
-       You can change the default parameter value by adding, for example,
+       it  finds  a  match. The starting size of the buffer is controlled by a
+       parameter whose default value is 20K. The buffer itself is three  times
+       this  size,  but  because  of  the  way it is used for holding "before"
+       lines, the longest line that is guaranteed to  be  processable  is  the
+       parameter  size.  If  a longer line is encountered, pcre2grep automati-
+       cally expands the buffer, up to a specified maximum size, whose default
+       is 1M or the starting size, whichever is the larger. You can change the
+       default parameter values by adding, for example,

-         --with-pcre2grep-bufsize=50K
+         --with-pcre2grep-bufsize=51200
+         --with-pcre2grep-max-bufsize=2097152

-       to  the  configure  command.  The caller of pcre2grep can override this
-       value by using --buffer-size on the command line.
+       to the configure command. The caller of pcre2grep  can  override  these
+       values  by  using  --buffer-size  and  --max-buffer-size on the command
+       line.

 PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
@@ -3630,6 +3659,29 @@
        mentation.

+SUPPORT FOR FUZZERS
+
+       There  is  a  special  option for use by people who want to run fuzzing
+       tests on PCRE2:
+
+         --enable-fuzz-support
+
+       At present this applies only to the 8-bit library. If set, it causes an
+       extra  library  called  libpcre2-fuzzsupport.a  to  be  built,  but not
+       installed. This contains a single function called  LLVMFuzzerTestOneIn-
+       put()  whose  arguments are a pointer to a string and the length of the
+       string. When called, this function tries to compile  the  string  as  a
+       pattern,  and if that succeeds, to match it.  This is done both with no
+       options and with some random options bits that are generated  from  the
+       string.  Setting  --enable-fuzz-support  also  causes  a  binary called
+       pcre2fuzzcheck to be created. This is normally run  under  valgrind  or
+       used  when  PCRE2 is compiled with address sanitizing enabled. It calls
+       the fuzzing function and outputs information about  it  is  doing.  The
+       input  strings  are  specified by arguments: if an argument starts with
+       "=" the rest of it is a literal input string. Otherwise, it is  assumed
+       to be a file name, and the contents of the file are the test string.
+
+
 SEE ALSO

        pcre2api(3), pcre2-config(3).
@@ -3644,7 +3696,7 @@

REVISION

-       Last updated: 01 April 2016
+       Last updated: 01 November 2016
        Copyright (c) 1997-2016 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -3689,37 +3741,46 @@

        If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled,
        PCRE2 automatically inserts callouts, all with number 255, before  each
-       item  in  the  pattern. For example, if PCRE2_AUTO_CALLOUT is used with
+       item  in  the  pattern except for immediately before or after a callout
+       item in the pattern.  For example, if PCRE2_AUTO_CALLOUT is  used  with
        the pattern

-         A(\d{2}|--)
+         A(?C3)B

        it is processed as if it were

+         (?C255)A(?C3)B(?C255)
+
+       Here is a more complicated example:
+
+         A(\d{2}|--)
+
+       With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
+
        (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)

-       Notice that there is a callout before and after  each  parenthesis  and
+       Notice  that  there  is a callout before and after each parenthesis and
        alternation bar. If the pattern contains a conditional group whose con-
-       dition is an assertion, an automatic callout  is  inserted  immediately
-       before  the  condition. Such a callout may also be inserted explicitly,
+       dition  is  an  assertion, an automatic callout is inserted immediately
+       before the condition. Such a callout may also be  inserted  explicitly,
        for example:

          (?(?C9)(?=a)ab|de)  (?(?C%text%)(?!=d)ab|de)

-       This applies only to assertion conditions (because they are  themselves
+       This  applies only to assertion conditions (because they are themselves
        independent groups).

-       Callouts  can  be useful for tracking the progress of pattern matching.
+       Callouts can be useful for tracking the progress of  pattern  matching.
        The pcre2test program has a pattern qualifier (/auto_callout) that sets
-       automatic  callouts.   When  any  callouts are present, the output from
-       pcre2test indicates how the pattern is being matched.  This  is  useful
-       information  when  you are trying to optimize the performance of a par-
+       automatic callouts.  When any callouts are  present,  the  output  from
+       pcre2test  indicates  how  the pattern is being matched. This is useful
+       information when you are trying to optimize the performance of  a  par-
        ticular pattern.

MISSING CALLOUTS

-       You should be aware that, because of optimizations  in  the  way  PCRE2
+       You  should  be  aware  that, because of optimizations in the way PCRE2
        compiles and matches patterns, callouts sometimes do not happen exactly
        as you might expect.

@@ -3726,8 +3787,8 @@
    Auto-possessification

        At compile time, PCRE2 "auto-possessifies" repeated items when it knows
-       that  what follows cannot be part of the repeat. For example, a+[bc] is
-       compiled as if it were a++[bc]. The pcre2test output when this  pattern
+       that what follows cannot be part of the repeat. For example, a+[bc]  is
+       compiled  as if it were a++[bc]. The pcre2test output when this pattern
        is compiled with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied
        to the string "aaaa" is:

@@ -3736,11 +3797,12 @@
           +2 ^   ^    [bc]
          No match

-       This indicates that when matching [bc] fails, there is no  backtracking
-       into  a+  and  therefore the callouts that would be taken for the back-
-       tracks do not occur.  You can disable the  auto-possessify  feature  by
-       passing  PCRE2_NO_AUTO_POSSESS to pcre2_compile(), or starting the pat-
-       tern with (*NO_AUTO_POSSESS). In this case, the output changes to this:
+       This  indicates that when matching [bc] fails, there is no backtracking
+       into a+ (because it is being treated as a++) and therefore the callouts
+       that  would  be  taken for the backtracks do not occur. You can disable
+       the  auto-possessify  feature  by  passing   PCRE2_NO_AUTO_POSSESS   to
+       pcre2_compile(),  or  starting  the pattern with (*NO_AUTO_POSSESS). In
+       this case, the output changes to this:

          --->aaaa
           +0 ^        a+
@@ -3859,8 +3921,8 @@

        For  a  numerical  callout,  callout_string is NULL, and callout_number
        contains the number of the callout, in the range  0-255.  This  is  the
-       number  that  follows  (?C for manual callouts; it is 255 for automati-
-       cally generated callouts.
+       number  that  follows  (?C for callouts that part of the pattern; it is
+       255 for automatically generated callouts.

    Fields for string callouts

@@ -3921,10 +3983,16 @@
        the next item to be matched.

        The next_item_length field contains the length of the next item  to  be
-       matched in the pattern string. When the callout immediately precedes an
-       alternation bar, a closing parenthesis, or the end of the pattern,  the
-       length  is  zero. When the callout precedes an opening parenthesis, the
-       length is that of the entire subpattern.
+       processed  in the pattern string. When the callout is at the end of the
+       pattern, the length is zero.  When  the  callout  precedes  an  opening
+       parenthesis, the length includes meta characters that follow the paren-
+       thesis. For example, in a callout before an assertion  such  as  (?=ab)
+       the  length  is  3. For an an alternation bar or a closing parenthesis,
+       the length is one, unless a closing parenthesis is followed by a  quan-
+       tifier, in which case its length is included.  (This changed in release
+       10.23. In earlier releases, before an opening  parenthesis  the  length
+       was  that  of the entire subpattern, and before an alternation bar or a
+       closing parenthesis the length was zero.)

        The pattern_position and next_item_length fields are intended  to  help
        in  distinguishing between different automatic callouts, which all have
@@ -4008,8 +4076,8 @@

REVISION

-       Last updated: 23 March 2015
-       Copyright (c) 1997-2015 University of Cambridge.
+       Last updated: 29 September 2016
+       Copyright (c) 1997-2016 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -4103,7 +4171,7 @@
        first one that is backtracked onto acts. For example,  in  the  pattern
        A(*COMMIT)B(*PRUNE)C  a  failure in B triggers (*COMMIT), but a failure
        in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases
-       it is the same as PCRE2, but there are examples where it differs.
+       it is the same as PCRE2, but there are cases where it differs.

        11.  Most  backtracking  verbs in assertions have their normal actions.
        They are not confined to the assertion.
@@ -4117,7 +4185,7 @@
        pattern names is not as general as Perl's. This is a consequence of the
        fact  the  PCRE2  works internally just with numbers, using an external
        table to translate between numbers and names. In particular, a  pattern
-       such  as  (?|(?<a>A)|(?<b)B),  where the two capturing parentheses have
+       such  as  (?|(?<a>A)|(?<b>B),  where the two capturing parentheses have
        the same number but different names, is not supported,  and  causes  an
        error  at compile time. If it were allowed, it would not be possible to
        distinguish which parentheses matched, because both names map  to  cap-
@@ -4124,11 +4192,11 @@
        turing subpattern number 1. To avoid this confusing situation, an error
        is given at compile time.

-       14. Perl recognizes comments in some places that PCRE2  does  not,  for
-       example,  between  the  ( and ? at the start of a subpattern. If the /x
-       modifier is set, Perl allows white space between ( and ?  (though  cur-
-       rent  Perls warn that this is deprecated) but PCRE2 never does, even if
-       the PCRE2_EXTENDED option is set.
+       14. Perl used to recognize comments in some places that PCRE2 does not,
+       for  example,  between the ( and ? at the start of a subpattern. If the
+       /x modifier is set, Perl allowed white space between ( and ? though the
+       latest  Perls give an error (for a while it was just deprecated). There
+       may still be some cases where Perl behaves differently.

        15. Perl, when in warning mode, gives warnings  for  character  classes
        such  as  [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter-
@@ -4152,34 +4220,39 @@
        different  length  of  string.  Perl requires them all to have the same
        length.

-       (b) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set,  the
+       (b) From PCRE2 10.23, back references to groups  of  fixed  length  are
+       supported in lookbehinds, provided that there is no possibility of ref-
+       erencing a non-unique number or name. Perl does not support  backrefer-
+       ences in lookbehinds.
+
+       (c)  If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
        $ meta-character matches only at the very end of the string.

-       (c)  A  backslash  followed  by  a  letter  with  no special meaning is
+       (d) A backslash followed  by  a  letter  with  no  special  meaning  is
        faulted. (Perl can be made to issue a warning.)

-       (d) If PCRE2_UNGREEDY is set, the greediness of the repetition  quanti-
+       (e)  If PCRE2_UNGREEDY is set, the greediness of the repetition quanti-
        fiers is inverted, that is, by default they are not greedy, but if fol-
        lowed by a question mark they are.

-       (e) PCRE2_ANCHORED can be used at matching time to force a  pattern  to
+       (f)  PCRE2_ANCHORED  can be used at matching time to force a pattern to
        be tried only at the first matching position in the subject string.

-       (f)      The      PCRE2_NOTBOL,      PCRE2_NOTEOL,      PCRE2_NOTEMPTY,
-       PCRE2_NOTEMPTY_ATSTART, and PCRE2_NO_AUTO_CAPTURE options have no  Perl
+       (g)      The      PCRE2_NOTBOL,      PCRE2_NOTEOL,      PCRE2_NOTEMPTY,
+       PCRE2_NOTEMPTY_ATSTART,  and PCRE2_NO_AUTO_CAPTURE options have no Perl
        equivalents.

-       (g)  The  \R escape sequence can be restricted to match only CR, LF, or
+       (h) The \R escape sequence can be restricted to match only CR,  LF,  or
        CRLF by the PCRE2_BSR_ANYCRLF option.

-       (h) The callout facility is PCRE2-specific.
+       (i) The callout facility is PCRE2-specific.

-       (i) The partial matching facility is PCRE2-specific.
+       (j) The partial matching facility is PCRE2-specific.

-       (j) The alternative matching function (pcre2_dfa_match() matches  in  a
+       (k)  The  alternative matching function (pcre2_dfa_match() matches in a
        different way and is not Perl-compatible.

-       (k)  PCRE2 recognizes some special sequences such as (*CR) at the start
+       (l) PCRE2 recognizes some special sequences such as (*CR) at the  start
        of a pattern that set overall options that cannot be changed within the
        pattern.

@@ -4193,8 +4266,8 @@

REVISION

-       Last updated: 15 March 2015
-       Copyright (c) 1997-2015 University of Cambridge.
+       Last updated: 18 October 2016
+       Copyright (c) 1997-2016 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -4642,23 +4715,22 @@
        can be no more than 65535 capturing subpatterns. There is,  however,  a
        limit  to  the  depth  of  nesting  of parenthesized subpatterns of all
        kinds. This is imposed in order to limit the  amount  of  system  stack
-       used  at  compile time. The limit can be specified when PCRE2 is built;
-       the default is 250.
+       used  at compile time. The default limit can be specified when PCRE2 is
+       built; the default default is 250. An application can change this limit
+       by  calling pcre2_set_parens_nest_limit() to set the limit in a compile
+       context.

-       There is a limit to the number of forward references to subsequent sub-
-       patterns  of  around  200,000.  Repeated  forward references with fixed
-       upper limits, for example, (?2){0,100} when subpattern number 2  is  to
-       the  right,  are included in the count. There is no limit to the number
-       of backward references.
-
        The maximum length of name for a named subpattern is 32 code units, and
        the maximum number of named subpatterns is 10000.

        The  maximum  length  of  a  name  in  a (*MARK), (*PRUNE), (*SKIP), or
-       (*THEN) verb is 255 for the 8-bit library and 65535 for the 16-bit  and
-       32-bit libraries.
+       (*THEN) verb is 255 code units for the 8-bit  library  and  65535  code
+       units for the 16-bit and 32-bit libraries.

+       The  maximum  length  of  a string argument to a callout is the largest
+       number a 32-bit unsigned integer can hold.

+
AUTHOR

        Philip Hazel
@@ -4668,8 +4740,8 @@

REVISION

-       Last updated: 05 November 2015
-       Copyright (c) 1997-2015 University of Cambridge.
+       Last updated: 26 October 2016
+       Copyright (c) 1997-2016 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -5644,29 +5716,29 @@
        character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A
        (A  is  41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes
        hex 7B (; is 3B). If the code unit following \c has a value  less  than
-       32  or  greater  than  126, a compile-time error occurs. This locks out
-       non-printable ASCII characters in all modes.
+       32 or greater than 126, a compile-time error occurs.

-       When PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t  gen-
+       When  PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t gen-
        erate the appropriate EBCDIC code values. The \c escape is processed as
        specified for Perl in the perlebcdic document. The only characters that
-       are  allowed  after  \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?.
-       Any other character provokes a  compile-time  error.  The  sequence  \@
-       encodes  character  code 0; the letters (in either case) encode charac-
-       ters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31
-       (hex 1B to hex 1F), and \? becomes either 255 (hex FF) or 95 (hex 5F).
+       are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^,  _,  or  ?.
+       Any  other  character  provokes  a compile-time error. The sequence \c@
+       encodes character code 0; after \c the letters (in either case)  encode
+       characters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters
+       27-31 (hex 1B to hex 1F), and \c? becomes either 255  (hex  FF)  or  95
+       (hex 5F).

-       Thus,  apart  from  \?,  these escapes generate the same character code
+       Thus,  apart  from  \c?, these escapes generate the same character code
        values as they do in an ASCII environment, though the meanings  of  the
-       values  mostly  differ.  For example, \G always generates code value 7,
+       values  mostly  differ. For example, \cG always generates code value 7,
        which is BEL in ASCII but DEL in EBCDIC.

-       The sequence \? generates DEL (127, hex 7F) in  an  ASCII  environment,
+       The sequence \c? generates DEL (127, hex 7F) in an  ASCII  environment,
        but  because  127  is  not a control character in EBCDIC, Perl makes it
        generate the APC character. Unfortunately, there are  several  variants
        of  EBCDIC.  In  most  of them the APC character has the value 255 (hex
        FF), but in the one Perl calls POSIX-BC its value is 95  (hex  5F).  If
-       certain  other characters have POSIX-BC values, PCRE2 makes \? generate
+       certain other characters have POSIX-BC values, PCRE2 makes \c? generate
        95; otherwise it generates 255.

        After \0 up to two further octal digits are read. If  there  are  fewer
@@ -5776,10 +5848,10 @@

    Absolute and relative back references

-       The sequence \g followed by an unsigned or a negative  number,  option-
-       ally  enclosed  in braces, is an absolute or relative back reference. A
-       named back reference can be coded as \g{name}. Back references are dis-
-       cussed later, following the discussion of parenthesized subpatterns.
+       The sequence \g followed by a signed  or  unsigned  number,  optionally
+       enclosed  in braces, is an absolute or relative back reference. A named
+       back reference can be coded as \g{name}. Back references are  discussed
+       later, following the discussion of parenthesized subpatterns.

    Absolute and relative subroutine calls

@@ -6404,6 +6476,18 @@
        PCRE2_MULTILINE  options  is  used. A class such as [^a] always matches
        one of these characters.

+       The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,  \V,
+       \w, and \W may appear in a character class, and add the characters that
+       they match to the class. For example, [\dABCDEF] matches any  hexadeci-
+       mal  digit.  In UTF modes, the PCRE2_UCP option affects the meanings of
+       \d, \s, \w and their upper case partners, just as  it  does  when  they
+       appear  outside a character class, as described in the section entitled
+       "Generic character types" above. The escape sequence \b has a different
+       meaning  inside  a character class; it matches the backspace character.
+       The sequences \B, \N, \R, and \X are not  special  inside  a  character
+       class.  Like  any  other  unrecognized  escape sequences, they cause an
+       error.
+
        The minus (hyphen) character can be used to specify a range of  charac-
        ters  in  a  character  class.  For  example,  [d-m] matches any letter
        between d and m, inclusive. If a  minus  character  is  required  in  a
@@ -6413,20 +6497,20 @@
        example, [b-d-z] matches letters in the range b to d, a hyphen  charac-
        ter, or z.

+       Perl  treats  a  hyphen as a literal if it appears before a POSIX class
+       (see below) or a character type escape such as as \d, but gives a warn-
+       ing  in its warning mode, as this is most likely a user error. As PCRE2
+       has no facility for warning, an error is given in these cases.
+
        It is not possible to have the literal character "]" as the end charac-
-       ter of a range. A pattern such as [W-]46] is interpreted as a class  of
-       two  characters ("W" and "-") followed by a literal string "46]", so it
-       would match "W46]" or "-46]". However, if the "]"  is  escaped  with  a
-       backslash  it is interpreted as the end of range, so [W-\]46] is inter-
-       preted as a class containing a range followed by two other  characters.
-       The  octal or hexadecimal representation of "]" can also be used to end
+       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
+       two characters ("W" and "-") followed by a literal string "46]", so  it
+       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
+       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
+       preted  as a class containing a range followed by two other characters.
+       The octal or hexadecimal representation of "]" can also be used to  end
        a range.

-       An error is generated if a POSIX character  class  (see  below)  or  an
-       escape  sequence other than one that defines a single character appears
-       at a point where a range ending character  is  expected.  For  example,
-       [z-\xff] is valid, but [A-\d] and [A-[:digit:]] are not.
-
        Ranges normally include all code points between the start and end char-
        acters, inclusive. They can also be  used  for  code  points  specified
        numerically, for example [\000-\037]. Ranges can include any characters
@@ -6446,18 +6530,6 @@
        character  tables  for  a French locale are in use, [\xc8-\xcb] matches
        accented E characters in both cases.

-       The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,  \V,
-       \w, and \W may appear in a character class, and add the characters that
-       they match to the class. For example, [\dABCDEF] matches any  hexadeci-
-       mal  digit.  In UTF modes, the PCRE2_UCP option affects the meanings of
-       \d, \s, \w and their upper case partners, just as  it  does  when  they
-       appear  outside a character class, as described in the section entitled
-       "Generic character types" above. The escape sequence \b has a different
-       meaning  inside  a character class; it matches the backspace character.
-       The sequences \B, \N, \R, and \X are not  special  inside  a  character
-       class.  Like  any  other  unrecognized  escape sequences, they cause an
-       error.
-
        A circumflex can conveniently be used with  the  upper  case  character
        types  to specify a more restricted set of characters than the matching
        lower case type.  For example, the class [^\W_] matches any  letter  or
@@ -6618,19 +6690,14 @@

        When one of these option changes occurs at  top  level  (that  is,  not
        inside  subpattern parentheses), the change applies to the remainder of
-       the pattern that follows. If the change is placed right at the start of
-       a  pattern,  PCRE2  extracts  it  into  the global options (and it will
-       therefore show up in data extracted by the  pcre2_pattern_info()  func-
-       tion).
+       the pattern that follows. An option change  within  a  subpattern  (see
+       below  for  a description of subpatterns) affects only that part of the
+       subpattern that follows it, so

-       An  option  change  within a subpattern (see below for a description of
-       subpatterns) affects only that part of the subpattern that follows  it,
-       so
-
          (a(?i)b)c

-       matches  abc  and  aBc and no other strings (assuming PCRE2_CASELESS is
-       not used).  By this means, options can be made to have  different  set-
+       matches abc and aBc and no other strings  (assuming  PCRE2_CASELESS  is
+       not  used).   By this means, options can be made to have different set-
        tings in different parts of the pattern. Any changes made in one alter-
        native do carry on into subsequent branches within the same subpattern.
        For example,
@@ -6637,13 +6704,13 @@

          (a(?i)b|c)

-       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
-       first branch is abandoned before the option setting.  This  is  because
-       the  effects  of option settings happen at compile time. There would be
+       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
+       first  branch  is  abandoned before the option setting. This is because
+       the effects of option settings happen at compile time. There  would  be
        some very weird behaviour otherwise.

-       As a convenient shorthand, if any option settings are required  at  the
-       start  of a non-capturing subpattern (see the next section), the option
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern (see the next section), the  option
        letters may appear between the "?" and the ":". Thus the two patterns

          (?i:saturday|sunday)
@@ -6651,14 +6718,14 @@

        match exactly the same set of strings.

-       Note: There are other PCRE2-specific options that can  be  set  by  the
+       Note:  There  are  other  PCRE2-specific options that can be set by the
        application when the compiling function is called. The pattern can con-
-       tain special leading sequences such as (*CRLF)  to  override  what  the
-       application  has  set  or what has been defaulted. Details are given in
-       the section entitled "Newline sequences"  above.  There  are  also  the
-       (*UTF)  and  (*UCP)  leading  sequences that can be used to set UTF and
-       Unicode property modes; they are equivalent to  setting  the  PCRE2_UTF
-       and  PCRE2_UCP  options, respectively. However, the application can set
+       tain  special  leading  sequences  such as (*CRLF) to override what the
+       application has set or what has been defaulted. Details  are  given  in
+       the  section  entitled  "Newline  sequences"  above. There are also the
+       (*UTF) and (*UCP) leading sequences that can be used  to  set  UTF  and
+       Unicode  property  modes;  they are equivalent to setting the PCRE2_UTF
+       and PCRE2_UCP options, respectively. However, the application  can  set
        the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP options, which lock out the use
        of the (*UTF) and (*UCP) sequences.

@@ -6672,18 +6739,18 @@

          cat(aract|erpillar|)

-       matches "cataract", "caterpillar", or "cat". Without  the  parentheses,
+       matches  "cataract",  "caterpillar", or "cat". Without the parentheses,
        it would match "cataract", "erpillar" or an empty string.

-       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
+       2. It sets up the subpattern as  a  capturing  subpattern.  This  means
        that, when the whole pattern matches, the portion of the subject string
-       that  matched  the  subpattern is passed back to the caller, separately
-       from the portion that matched the whole pattern. (This applies only  to
-       the  traditional  matching function; the DFA matching function does not
+       that matched the subpattern is passed back to  the  caller,  separately
+       from  the portion that matched the whole pattern. (This applies only to
+       the traditional matching function; the DFA matching function  does  not
        support capturing.)

        Opening parentheses are counted from left to right (starting from 1) to
-       obtain  numbers  for  the  capturing  subpatterns.  For example, if the
+       obtain numbers for the  capturing  subpatterns.  For  example,  if  the
        string "the red king" is matched against the pattern

          the ((red|white) (king|queen))
@@ -6691,12 +6758,12 @@
        the captured substrings are "red king", "red", and "king", and are num-
        bered 1, 2, and 3, respectively.

-       The  fact  that  plain  parentheses  fulfil two functions is not always
-       helpful.  There are often times when a grouping subpattern is  required
-       without  a capturing requirement. If an opening parenthesis is followed
-       by a question mark and a colon, the subpattern does not do any  captur-
-       ing,  and  is  not  counted when computing the number of any subsequent
-       capturing subpatterns. For example, if the string "the white queen"  is
+       The fact that plain parentheses fulfil  two  functions  is  not  always
+       helpful.   There are often times when a grouping subpattern is required
+       without a capturing requirement. If an opening parenthesis is  followed
+       by  a question mark and a colon, the subpattern does not do any captur-
+       ing, and is not counted when computing the  number  of  any  subsequent
+       capturing  subpatterns. For example, if the string "the white queen" is
        matched against the pattern

          the ((?:red|white) (king|queen))
@@ -6704,8 +6771,8 @@
        the captured substrings are "white queen" and "queen", and are numbered
        1 and 2. The maximum number of capturing subpatterns is 65535.

-       As a convenient shorthand, if any option settings are required  at  the
-       start  of  a  non-capturing  subpattern,  the option letters may appear
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern,  the  option  letters  may  appear
        between the "?" and the ":". Thus the two patterns

          (?i:saturday|sunday)
@@ -6712,9 +6779,9 @@
          (?:(?i)saturday|sunday)

        match exactly the same set of strings. Because alternative branches are
-       tried  from  left  to right, and options are not reset until the end of
-       the subpattern is reached, an option setting in one branch does  affect
-       subsequent  branches,  so  the above patterns match "SUNDAY" as well as
+       tried from left to right, and options are not reset until  the  end  of
+       the  subpattern is reached, an option setting in one branch does affect
+       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
        "Saturday".

@@ -6721,20 +6788,20 @@
DUPLICATE SUBPATTERN NUMBERS

        Perl 5.10 introduced a feature whereby each alternative in a subpattern
-       uses  the same numbers for its capturing parentheses. Such a subpattern
-       starts with (?| and is itself a non-capturing subpattern. For  example,
+       uses the same numbers for its capturing parentheses. Such a  subpattern
+       starts  with (?| and is itself a non-capturing subpattern. For example,
        consider this pattern:

          (?|(Sat)ur|(Sun))day

-       Because  the two alternatives are inside a (?| group, both sets of cap-
-       turing parentheses are numbered one. Thus, when  the  pattern  matches,
-       you  can  look  at captured substring number one, whichever alternative
-       matched. This construct is useful when you want to  capture  part,  but
+       Because the two alternatives are inside a (?| group, both sets of  cap-
+       turing  parentheses  are  numbered one. Thus, when the pattern matches,
+       you can look at captured substring number  one,  whichever  alternative
+       matched.  This  construct  is useful when you want to capture part, but
        not all, of one of a number of alternatives. Inside a (?| group, paren-
-       theses are numbered as usual, but the number is reset at the  start  of
-       each  branch.  The numbers of any capturing parentheses that follow the
-       subpattern start after the highest number used in any branch. The  fol-
+       theses  are  numbered as usual, but the number is reset at the start of
+       each branch. The numbers of any capturing parentheses that  follow  the
+       subpattern  start after the highest number used in any branch. The fol-
        lowing example is taken from the Perl documentation. The numbers under-
        neath show in which buffer the captured content will be stored.

@@ -6742,14 +6809,14 @@
          / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
          # 1            2         2  3        2     3     4

-       A back reference to a numbered subpattern uses the  most  recent  value
-       that  is  set  for that number by any subpattern. The following pattern
+       A  back  reference  to a numbered subpattern uses the most recent value
+       that is set for that number by any subpattern.  The  following  pattern
        matches "abcabc" or "defdef":

          /(?|(abc)|(def))\1/

-       In contrast, a subroutine call to a numbered subpattern  always  refers
-       to  the  first  one in the pattern with the given number. The following
+       In  contrast,  a subroutine call to a numbered subpattern always refers
+       to the first one in the pattern with the given  number.  The  following
        pattern matches "abcabc" or "defabc":

          /(?|(abc)|(def))(?1)/
@@ -6757,47 +6824,47 @@
        A relative reference such as (?-1) is no different: it is just a conve-
        nient way of computing an absolute group number.

-       If  a condition test for a subpattern's having matched refers to a non-
-       unique number, the test is true if any of the subpatterns of that  num-
+       If a condition test for a subpattern's having matched refers to a  non-
+       unique  number, the test is true if any of the subpatterns of that num-
        ber have matched.

-       An  alternative approach to using this "branch reset" feature is to use
+       An alternative approach to using this "branch reset" feature is to  use
        duplicate named subpatterns, as described in the next section.

NAMED SUBPATTERNS

-       Identifying capturing parentheses by number is simple, but  it  can  be
-       very  hard  to keep track of the numbers in complicated regular expres-
-       sions. Furthermore, if an  expression  is  modified,  the  numbers  may
+       Identifying  capturing  parentheses  by number is simple, but it can be
+       very hard to keep track of the numbers in complicated  regular  expres-
+       sions.  Furthermore,  if  an  expression  is  modified, the numbers may
        change. To help with this difficulty, PCRE2 supports the naming of sub-
        patterns. This feature was not added to Perl until release 5.10. Python
-       had  the feature earlier, and PCRE1 introduced it at release 4.0, using
-       the Python syntax. PCRE2 supports both the Perl and the Python  syntax.
-       Perl  allows  identically numbered subpatterns to have different names,
+       had the feature earlier, and PCRE1 introduced it at release 4.0,  using
+       the  Python syntax. PCRE2 supports both the Perl and the Python syntax.
+       Perl allows identically numbered subpatterns to have  different  names,
        but PCRE2 does not.

-       In PCRE2, a subpattern can be named in one of three ways:  (?<name>...)
-       or  (?'name'...)  as in Perl, or (?P<name>...) as in Python. References
-       to capturing parentheses from other parts of the pattern, such as  back
-       references,  recursion,  and conditions, can be made by name as well as
+       In  PCRE2, a subpattern can be named in one of three ways: (?<name>...)
+       or (?'name'...) as in Perl, or (?P<name>...) as in  Python.  References
+       to  capturing parentheses from other parts of the pattern, such as back
+       references, recursion, and conditions, can be made by name as  well  as
        by number.

-       Names consist of up to 32 alphanumeric characters and underscores,  but
-       must  start  with  a  non-digit.  Named capturing parentheses are still
-       allocated numbers as well as names, exactly as if the  names  were  not
+       Names  consist of up to 32 alphanumeric characters and underscores, but
+       must start with a non-digit.  Named  capturing  parentheses  are  still
+       allocated  numbers  as  well as names, exactly as if the names were not
        present. The PCRE2 API provides function calls for extracting the name-
-       to-number translation table from a compiled  pattern.  There  are  also
+       to-number  translation  table  from  a compiled pattern. There are also
        convenience functions for extracting a captured substring by name.

-       By  default, a name must be unique within a pattern, but it is possible
-       to relax this constraint by setting the PCRE2_DUPNAMES option  at  com-
-       pile  time.  (Duplicate names are also always permitted for subpatterns
-       with the same number, set up as described  in  the  previous  section.)
-       Duplicate  names  can be useful for patterns where only one instance of
+       By default, a name must be unique within a pattern, but it is  possible
+       to  relax  this constraint by setting the PCRE2_DUPNAMES option at com-
+       pile time.  (Duplicate names are also always permitted for  subpatterns
+       with  the  same  number,  set up as described in the previous section.)
+       Duplicate names can be useful for patterns where only one  instance  of
        the named parentheses can match.  Suppose you want to match the name of
-       a  weekday,  either as a 3-letter abbreviation or as the full name, and
-       in both cases you  want  to  extract  the  abbreviation.  This  pattern
+       a weekday, either as a 3-letter abbreviation or as the full  name,  and
+       in  both  cases  you  want  to  extract  the abbreviation. This pattern
        (ignoring the line breaks) does the job:

          (?<DN>Mon|Fri|Sun)(?:day)?|
@@ -6806,18 +6873,18 @@
          (?<DN>Thu)(?:rsday)?|
          (?<DN>Sat)(?:urday)?

-       There  are  five capturing substrings, but only one is ever set after a
+       There are five capturing substrings, but only one is ever set  after  a
        match.  (An alternative way of solving this problem is to use a "branch
        reset" subpattern, as described in the previous section.)

-       The  convenience  functions for extracting the data by name returns the
-       substring for the first (and in this example, the only)  subpattern  of
-       that  name  that  matched.  This saves searching to find which numbered
+       The convenience functions for extracting the data by name  returns  the
+       substring  for  the first (and in this example, the only) subpattern of
+       that name that matched. This saves searching  to  find  which  numbered
        subpattern it was.

-       If you make a back reference to  a  non-unique  named  subpattern  from
-       elsewhere  in the pattern, the subpatterns to which the name refers are
-       checked in the order in which they appear in the overall  pattern.  The
+       If  you  make  a  back  reference to a non-unique named subpattern from
+       elsewhere in the pattern, the subpatterns to which the name refers  are
+       checked  in  the order in which they appear in the overall pattern. The
        first one that is set is used for the reference. For example, this pat-
        tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo":

@@ -6825,22 +6892,22 @@

        If you make a subroutine call to a non-unique named subpattern, the one
-       that  corresponds  to  the first occurrence of the name is used. In the
+       that corresponds to the first occurrence of the name is  used.  In  the
        absence of duplicate numbers (see the previous section) this is the one
        with the lowest number.

        If you use a named reference in a condition test (see the section about
        conditions below), either to check whether a subpattern has matched, or
-       to  check for recursion, all subpatterns with the same name are tested.
-       If the condition is true for any one of them, the overall condition  is
-       true.  This  is  the  same  behaviour as testing by number. For further
-       details of the interfaces  for  handling  named  subpatterns,  see  the
+       to check for recursion, all subpatterns with the same name are  tested.
+       If  the condition is true for any one of them, the overall condition is
+       true. This is the same behaviour as  testing  by  number.  For  further
+       details  of  the  interfaces  for  handling  named subpatterns, see the
        pcre2api documentation.

        Warning: You cannot use different names to distinguish between two sub-
-       patterns with the same number because PCRE2 uses only the numbers  when
+       patterns  with the same number because PCRE2 uses only the numbers when
        matching. For this reason, an error is given at compile time if differ-
-       ent names are given to subpatterns with the same number.  However,  you
+       ent  names  are given to subpatterns with the same number. However, you
        can always give the same name to subpatterns with the same number, even
        when PCRE2_DUPNAMES is not set.

@@ -6847,7 +6914,7 @@

REPETITION

-       Repetition is specified by quantifiers, which can  follow  any  of  the
+       Repetition  is  specified  by  quantifiers, which can follow any of the
        following items:

          a literal data character
@@ -6861,17 +6928,17 @@
          a parenthesized subpattern (including most assertions)
          a subroutine call to a subpattern (recursive or otherwise)

-       The  general repetition quantifier specifies a minimum and maximum num-
-       ber of permitted matches, by giving the two numbers in  curly  brackets
-       (braces),  separated  by  a comma. The numbers must be less than 65536,
+       The general repetition quantifier specifies a minimum and maximum  num-
+       ber  of  permitted matches, by giving the two numbers in curly brackets
+       (braces), separated by a comma. The numbers must be  less  than  65536,
        and the first must be less than or equal to the second. For example:

          z{2,4}

-       matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
-       special  character.  If  the second number is omitted, but the comma is
-       present, there is no upper limit; if the second number  and  the  comma
-       are  both omitted, the quantifier specifies an exact number of required
+       matches  "zz",  "zzz",  or  "zzzz". A closing brace on its own is not a
+       special character. If the second number is omitted, but  the  comma  is
+       present,  there  is  no upper limit; if the second number and the comma
+       are both omitted, the quantifier specifies an exact number of  required
        matches. Thus

          [aeiou]{3,}
@@ -6880,26 +6947,26 @@

          \d{8}

-       matches exactly 8 digits. An opening curly bracket that  appears  in  a
-       position  where a quantifier is not allowed, or one that does not match
-       the syntax of a quantifier, is taken as a literal character. For  exam-
+       matches  exactly  8  digits. An opening curly bracket that appears in a
+       position where a quantifier is not allowed, or one that does not  match
+       the  syntax of a quantifier, is taken as a literal character. For exam-
        ple, {,6} is not a quantifier, but a literal string of four characters.

        In UTF modes, quantifiers apply to characters rather than to individual
-       code units. Thus, for example, \x{100}{2} matches two characters,  each
+       code  units. Thus, for example, \x{100}{2} matches two characters, each
        of which is represented by a two-byte sequence in a UTF-8 string. Simi-
-       larly, \X{3} matches three Unicode extended grapheme clusters, each  of
-       which  may  be  several  code  units long (and they may be of different
+       larly,  \X{3} matches three Unicode extended grapheme clusters, each of
+       which may be several code units long (and  they  may  be  of  different
        lengths).

        The quantifier {0} is permitted, causing the expression to behave as if
        the previous item and the quantifier were not present. This may be use-
-       ful for subpatterns that are referenced as subroutines  from  elsewhere
+       ful  for  subpatterns that are referenced as subroutines from elsewhere
        in the pattern (but see also the section entitled "Defining subpatterns
-       for use by reference only" below). Items other  than  subpatterns  that
+       for  use  by  reference only" below). Items other than subpatterns that
        have a {0} quantifier are omitted from the compiled pattern.

-       For  convenience, the three most common quantifiers have single-charac-
+       For convenience, the three most common quantifiers have  single-charac-
        ter abbreviations:

          *    is equivalent to {0,}
@@ -6906,24 +6973,24 @@
          +    is equivalent to {1,}
          ?    is equivalent to {0,1}

-       It is possible to construct infinite loops by  following  a  subpattern
+       It  is  possible  to construct infinite loops by following a subpattern
        that can match no characters with a quantifier that has no upper limit,
        for example:

          (a?)*

-       Earlier versions of Perl and PCRE1 used to give  an  error  at  compile
+       Earlier  versions  of  Perl  and PCRE1 used to give an error at compile
        time for such patterns. However, because there are cases where this can
        be useful, such patterns are now accepted, but if any repetition of the
-       subpattern  does in fact match no characters, the loop is forcibly bro-
+       subpattern does in fact match no characters, the loop is forcibly  bro-
        ken.

-       By default, the quantifiers are "greedy", that is, they match  as  much
-       as  possible  (up  to  the  maximum number of permitted times), without
-       causing the rest of the pattern to fail. The classic example  of  where
+       By  default,  the quantifiers are "greedy", that is, they match as much
+       as possible (up to the maximum  number  of  permitted  times),  without
+       causing  the  rest of the pattern to fail. The classic example of where
        this gives problems is in trying to match comments in C programs. These
-       appear between /* and */ and within the comment,  individual  *  and  /
-       characters  may  appear. An attempt to match C comments by applying the
+       appear  between  /*  and  */ and within the comment, individual * and /
+       characters may appear. An attempt to match C comments by  applying  the
        pattern

          /\*.*\*/
@@ -6932,19 +6999,19 @@

          /* first comment */  not comment  /* second comment */

-       fails, because it matches the entire string owing to the greediness  of
+       fails,  because it matches the entire string owing to the greediness of
        the .*  item.

        If a quantifier is followed by a question mark, it ceases to be greedy,
-       and instead matches the minimum number of times possible, so  the  pat-
+       and  instead  matches the minimum number of times possible, so the pat-
        tern

          /\*.*?\*/

-       does  the  right  thing with the C comments. The meaning of the various
-       quantifiers is not otherwise changed,  just  the  preferred  number  of
-       matches.   Do  not  confuse this use of question mark with its use as a
-       quantifier in its own right. Because it has two uses, it can  sometimes
+       does the right thing with the C comments. The meaning  of  the  various
+       quantifiers  is  not  otherwise  changed,  just the preferred number of
+       matches.  Do not confuse this use of question mark with its  use  as  a
+       quantifier  in its own right. Because it has two uses, it can sometimes
        appear doubled, as in

          \d??\d
@@ -6953,28 +7020,28 @@
        only way the rest of the pattern matches.

        If the PCRE2_UNGREEDY option is set (an option that is not available in
-       Perl),  the  quantifiers are not greedy by default, but individual ones
-       can be made greedy by following them with a  question  mark.  In  other
+       Perl), the quantifiers are not greedy by default, but  individual  ones
+       can  be  made  greedy  by following them with a question mark. In other
        words, it inverts the default behaviour.

-       When  a  parenthesized  subpattern  is quantified with a minimum repeat
-       count that is greater than 1 or with a limited maximum, more memory  is
-       required  for  the  compiled  pattern, in proportion to the size of the
+       When a parenthesized subpattern is quantified  with  a  minimum  repeat
+       count  that is greater than 1 or with a limited maximum, more memory is
+       required for the compiled pattern, in proportion to  the  size  of  the
        minimum or maximum.

-       If a pattern starts with  .*  or  .{0,}  and  the  PCRE2_DOTALL  option
-       (equivalent  to  Perl's /s) is set, thus allowing the dot to match new-
-       lines, the pattern is implicitly  anchored,  because  whatever  follows
-       will  be  tried against every character position in the subject string,
-       so there is no point in retrying the  overall  match  at  any  position
+       If  a  pattern  starts  with  .*  or  .{0,} and the PCRE2_DOTALL option
+       (equivalent to Perl's /s) is set, thus allowing the dot to  match  new-
+       lines,  the  pattern  is  implicitly anchored, because whatever follows
+       will be tried against every character position in the  subject  string,
+       so  there  is  no  point  in retrying the overall match at any position
        after the first. PCRE2 normally treats such a pattern as though it were
        preceded by \A.

-       In cases where it is known that the subject  string  contains  no  new-
-       lines,  it  is worth setting PCRE2_DOTALL in order to obtain this opti-
+       In  cases  where  it  is known that the subject string contains no new-
+       lines, it is worth setting PCRE2_DOTALL in order to obtain  this  opti-
        mization, or alternatively, using ^ to indicate anchoring explicitly.

-       However, there are some cases where the optimization  cannot  be  used.
+       However,  there  are  some cases where the optimization cannot be used.
        When .*  is inside capturing parentheses that are the subject of a back
        reference elsewhere in the pattern, a match at the start may fail where
        a later one succeeds. Consider, for example:
@@ -6981,17 +7048,17 @@

          (.*)abc\1

-       If  the subject is "xyz123abc123" the match point is the fourth charac-
+       If the subject is "xyz123abc123" the match point is the fourth  charac-
        ter. For this reason, such a pattern is not implicitly anchored.

-       Another case where implicit anchoring is not applied is when the  lead-
-       ing  .* is inside an atomic group. Once again, a match at the start may
+       Another  case where implicit anchoring is not applied is when the lead-
+       ing .* is inside an atomic group. Once again, a match at the start  may
        fail where a later one succeeds. Consider this pattern:

          (?>.*?a)b

-       It matches "ab" in the subject "aab". The use of the backtracking  con-
-       trol  verbs  (*PRUNE)  and  (*SKIP) also disable this optimization, and
+       It  matches "ab" in the subject "aab". The use of the backtracking con-
+       trol verbs (*PRUNE) and (*SKIP) also  disable  this  optimization,  and
        there is an option, PCRE2_NO_DOTSTAR_ANCHOR, to do so explicitly.

        When a capturing subpattern is repeated, the value captured is the sub-
@@ -7000,8 +7067,8 @@
          (tweedle[dume]{3}\s*)+

        has matched "tweedledum tweedledee" the value of the captured substring
-       is "tweedledee". However, if there are  nested  capturing  subpatterns,
-       the  corresponding captured values may have been set in previous itera-
+       is  "tweedledee".  However,  if there are nested capturing subpatterns,
+       the corresponding captured values may have been set in previous  itera-
        tions. For example, after

          (a|(b))+
@@ -7011,53 +7078,53 @@

ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS

-       With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
-       repetition,  failure  of what follows normally causes the repeated item
-       to be re-evaluated to see if a different number of repeats  allows  the
-       rest  of  the pattern to match. Sometimes it is useful to prevent this,
-       either to change the nature of the match, or to cause it  fail  earlier
-       than  it otherwise might, when the author of the pattern knows there is
+       With  both  maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
+       repetition, failure of what follows normally causes the  repeated  item
+       to  be  re-evaluated to see if a different number of repeats allows the
+       rest of the pattern to match. Sometimes it is useful to  prevent  this,
+       either  to  change the nature of the match, or to cause it fail earlier
+       than it otherwise might, when the author of the pattern knows there  is
        no point in carrying on.

-       Consider, for example, the pattern \d+foo when applied to  the  subject
+       Consider,  for  example, the pattern \d+foo when applied to the subject
        line

          123456bar

        After matching all 6 digits and then failing to match "foo", the normal
-       action of the matcher is to try again with only 5 digits  matching  the
-       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
-       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
-       the  means for specifying that once a subpattern has matched, it is not
+       action  of  the matcher is to try again with only 5 digits matching the
+       \d+ item, and then with  4,  and  so  on,  before  ultimately  failing.
+       "Atomic  grouping"  (a  term taken from Jeffrey Friedl's book) provides
+       the means for specifying that once a subpattern has matched, it is  not
        to be re-evaluated in this way.

-       If we use atomic grouping for the previous example, the  matcher  gives
-       up  immediately  on failing to match "foo" the first time. The notation
+       If  we  use atomic grouping for the previous example, the matcher gives
+       up immediately on failing to match "foo" the first time.  The  notation
        is a kind of special parenthesis, starting with (?> as in this example:

          (?>\d+)foo

-       This kind of parenthesis "locks up" the  part of the  pattern  it  con-
-       tains  once  it  has matched, and a failure further into the pattern is
-       prevented from backtracking into it. Backtracking past it  to  previous
+       This  kind  of  parenthesis "locks up" the  part of the pattern it con-
+       tains once it has matched, and a failure further into  the  pattern  is
+       prevented  from  backtracking into it. Backtracking past it to previous
        items, however, works as normal.

-       An  alternative  description  is that a subpattern of this type matches
-       exactly the string of characters that an identical  standalone  pattern
+       An alternative description is that a subpattern of  this  type  matches
+       exactly  the  string of characters that an identical standalone pattern
        would match, if anchored at the current point in the subject string.

        Atomic grouping subpatterns are not capturing subpatterns. Simple cases
        such as the above example can be thought of as a maximizing repeat that
-       must  swallow  everything  it can. So, while both \d+ and \d+? are pre-
-       pared to adjust the number of digits they match in order  to  make  the
+       must swallow everything it can. So, while both \d+ and  \d+?  are  pre-
+       pared  to  adjust  the number of digits they match in order to make the
        rest of the pattern match, (?>\d+) can only match an entire sequence of
        digits.

-       Atomic groups in general can of course contain arbitrarily  complicated
-       subpatterns,  and  can  be  nested. However, when the subpattern for an
+       Atomic  groups in general can of course contain arbitrarily complicated
+       subpatterns, and can be nested. However, when  the  subpattern  for  an
        atomic group is just a single repeated item, as in the example above, a
-       simpler  notation,  called  a "possessive quantifier" can be used. This
-       consists of an additional + character  following  a  quantifier.  Using
+       simpler notation, called a "possessive quantifier" can  be  used.  This
+       consists  of  an  additional  + character following a quantifier. Using
        this notation, the previous example can be rewritten as

          \d++foo
@@ -7067,46 +7134,46 @@

          (abc|xyz){2,3}+

-       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
-       PCRE2_UNGREEDY  option  is  ignored. They are a convenient notation for
-       the simpler forms of atomic group. However, there is no  difference  in
+       Possessive   quantifiers   are   always  greedy;  the  setting  of  the
+       PCRE2_UNGREEDY option is ignored. They are a  convenient  notation  for
+       the  simpler  forms of atomic group. However, there is no difference in
        the meaning of a possessive quantifier and the equivalent atomic group,
-       though there may be a performance  difference;  possessive  quantifiers
+       though  there  may  be a performance difference; possessive quantifiers
        should be slightly faster.

-       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
-       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
+       The possessive quantifier syntax is an extension to the Perl  5.8  syn-
+       tax.   Jeffrey  Friedl  originated the idea (and the name) in the first
        edition of his book. Mike McCloskey liked it, so implemented it when he
        built Sun's Java package, and PCRE1 copied it from there. It ultimately
        found its way into Perl at release 5.10.

-       PCRE2  has  an  optimization  that automatically "possessifies" certain
-       simple pattern constructs. For example, the sequence A+B is treated  as
-       A++B  because  there is no point in backtracking into a sequence of A's
+       PCRE2 has an optimization  that  automatically  "possessifies"  certain
+       simple  pattern constructs. For example, the sequence A+B is treated as
+       A++B because there is no point in backtracking into a sequence  of  A's
        when B must follow.  This feature can be disabled by the PCRE2_NO_AUTO-
        POSSESS option, or starting the pattern with (*NO_AUTO_POSSESS).

-       When  a  pattern  contains an unlimited repeat inside a subpattern that
-       can itself be repeated an unlimited number of  times,  the  use  of  an
-       atomic  group  is  the  only way to avoid some failing matches taking a
+       When a pattern contains an unlimited repeat inside  a  subpattern  that
+       can  itself  be  repeated  an  unlimited number of times, the use of an
+       atomic group is the only way to avoid some  failing  matches  taking  a
        very long time indeed. The pattern

          (\D+|<\d+>)*[!?]

-       matches an unlimited number of substrings that either consist  of  non-
-       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
+       matches  an  unlimited number of substrings that either consist of non-
+       digits, or digits enclosed in <>, followed by either ! or  ?.  When  it
        matches, it runs quickly. However, if it is applied to

          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

-       it takes a long time before reporting  failure.  This  is  because  the
-       string  can be divided between the internal \D+ repeat and the external
-       * repeat in a large number of ways, and all  have  to  be  tried.  (The
-       example  uses  [!?]  rather than a single character at the end, because
-       both PCRE2 and Perl have an optimization that allows for  fast  failure
-       when  a single character is used. They remember the last single charac-
-       ter that is required for a match, and fail early if it is  not  present
-       in  the  string.)  If  the pattern is changed so that it uses an atomic
+       it  takes  a  long  time  before reporting failure. This is because the
+       string can be divided between the internal \D+ repeat and the  external
+       *  repeat  in  a  large  number of ways, and all have to be tried. (The
+       example uses [!?] rather than a single character at  the  end,  because
+       both  PCRE2  and Perl have an optimization that allows for fast failure
+       when a single character is used. They remember the last single  charac-
+       ter  that  is required for a match, and fail early if it is not present
+       in the string.) If the pattern is changed so that  it  uses  an  atomic
        group, like this:

          ((?>\D+)|<\d+>)*[!?]
@@ -7118,38 +7185,38 @@

        Outside a character class, a backslash followed by a digit greater than
        0 (and possibly further digits) is a back reference to a capturing sub-
-       pattern earlier (that is, to its left) in the pattern,  provided  there
+       pattern  earlier  (that is, to its left) in the pattern, provided there
        have been that many previous capturing left parentheses.

-       However,  if the decimal number following the backslash is less than 8,
-       it is always taken as a back reference, and causes  an  error  only  if
-       there  are  not that many capturing left parentheses in the entire pat-
-       tern. In other words, the parentheses that are referenced need  not  be
-       to  the  left of the reference for numbers less than 8. A "forward back
-       reference" of this type can make sense when a  repetition  is  involved
-       and  the  subpattern to the right has participated in an earlier itera-
+       However, if the decimal number following the backslash is less than  8,
+       it  is  always  taken  as a back reference, and causes an error only if
+       there are not that many capturing left parentheses in the  entire  pat-
+       tern.  In  other words, the parentheses that are referenced need not be
+       to the left of the reference for numbers less than 8. A  "forward  back
+       reference"  of  this  type can make sense when a repetition is involved
+       and the subpattern to the right has participated in an  earlier  itera-
        tion.

-       It is not possible to have a numerical "forward back  reference"  to  a
-       subpattern  whose  number  is  8  or  more  using this syntax because a
-       sequence such as \50 is interpreted as a character  defined  in  octal.
+       It  is  not  possible to have a numerical "forward back reference" to a
+       subpattern whose number is 8  or  more  using  this  syntax  because  a
+       sequence  such  as  \50 is interpreted as a character defined in octal.
        See the subsection entitled "Non-printing characters" above for further
-       details of the handling of digits following a backslash.  There  is  no
-       such  problem  when named parentheses are used. A back reference to any
+       details  of  the  handling of digits following a backslash. There is no
+       such problem when named parentheses are used. A back reference  to  any
        subpattern is possible using named parentheses (see below).

-       Another way of avoiding the ambiguity inherent in  the  use  of  digits
-       following  a  backslash  is  to use the \g escape sequence. This escape
-       must be followed by an unsigned number or a negative number, optionally
-       enclosed in braces. These examples are all identical:
+       Another  way  of  avoiding  the ambiguity inherent in the use of digits
+       following a backslash is to use the \g  escape  sequence.  This  escape
+       must be followed by a signed or unsigned number, optionally enclosed in
+       braces. These examples are all identical:

          (ring), \1
          (ring), \g1
          (ring), \g{1}

-       An  unsigned number specifies an absolute reference without the ambigu-
+       An unsigned number specifies an absolute reference without the  ambigu-
        ity that is present in the older syntax. It is also useful when literal
-       digits follow the reference. A negative number is a relative reference.
+       digits follow the reference. A signed number is a  relative  reference.
        Consider this example:

          (abc(def)ghi)\g{-1}
@@ -7156,33 +7223,37 @@

        The sequence \g{-1} is a reference to the most recently started captur-
        ing subpattern before \g, that is, is it equivalent to \2 in this exam-
-       ple.  Similarly, \g{-2} would be equivalent to \1. The use of  relative
-       references  can  be helpful in long patterns, and also in patterns that
-       are created by  joining  together  fragments  that  contain  references
+       ple.   Similarly, \g{-2} would be equivalent to \1. The use of relative
+       references can be helpful in long patterns, and also in  patterns  that
+       are  created  by  joining  together  fragments  that contain references
        within themselves.

-       A  back  reference matches whatever actually matched the capturing sub-
-       pattern in the current subject string, rather  than  anything  matching
+       The sequence \g{+1} is a reference to the  next  capturing  subpattern.
+       This  kind  of forward reference can be useful it patterns that repeat.
+       Perl does not support the use of + in this way.
+
+       A back reference matches whatever actually matched the  capturing  sub-
+       pattern  in  the  current subject string, rather than anything matching
        the subpattern itself (see "Subpatterns as subroutines" below for a way
        of doing that). So the pattern

          (sens|respons)e and \1ibility

-       matches "sense and sensibility" and "response and responsibility",  but
-       not  "sense and responsibility". If caseful matching is in force at the
-       time of the back reference, the case of letters is relevant. For  exam-
+       matches  "sense and sensibility" and "response and responsibility", but
+       not "sense and responsibility". If caseful matching is in force at  the
+       time  of the back reference, the case of letters is relevant. For exam-
        ple,

          ((?i)rah)\s+\1

-       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
+       matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
        original capturing subpattern is matched caselessly.

-       There are several different ways of writing back  references  to  named
-       subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
-       \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
+       There  are  several  different ways of writing back references to named
+       subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
+       \k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
        unified back reference syntax, in which \g can be used for both numeric
-       and named references, is also supported. We  could  rewrite  the  above
+       and  named  references,  is  also supported. We could rewrite the above
        example in any of the following ways:

          (?<p1>(?i)rah)\s+\k<p1>
@@ -7190,68 +7261,75 @@
          (?P<p1>(?i)rah)\s+(?P=p1)
          (?<p1>(?i)rah)\s+\g{p1}

-       A  subpattern  that  is  referenced  by  name may appear in the pattern
+       A subpattern that is referenced by  name  may  appear  in  the  pattern
        before or after the reference.

-       There may be more than one back reference to the same subpattern. If  a
-       subpattern  has  not actually been used in a particular match, any back
+       There  may be more than one back reference to the same subpattern. If a
+       subpattern has not actually been used in a particular match,  any  back
        references to it always fail by default. For example, the pattern

          (a|(bc))\2

-       always fails if it starts to match "a" rather than  "bc".  However,  if
-       the  PCRE2_MATCH_UNSET_BACKREF  option  is  set at compile time, a back
+       always  fails  if  it starts to match "a" rather than "bc". However, if
+       the PCRE2_MATCH_UNSET_BACKREF option is set at  compile  time,  a  back
        reference to an unset value matches an empty string.

-       Because there may be many capturing parentheses in a pattern, all  dig-
-       its  following a backslash are taken as part of a potential back refer-
-       ence number.  If the pattern continues with  a  digit  character,  some
-       delimiter  must  be  used  to  terminate  the  back  reference.  If the
-       PCRE2_EXTENDED option is set, this can be white space.  Otherwise,  the
+       Because  there may be many capturing parentheses in a pattern, all dig-
+       its following a backslash are taken as part of a potential back  refer-
+       ence  number.   If  the  pattern continues with a digit character, some
+       delimiter must  be  used  to  terminate  the  back  reference.  If  the
+       PCRE2_EXTENDED  option  is set, this can be white space. Otherwise, the
        \g{ syntax or an empty comment (see "Comments" below) can be used.

    Recursive back references

-       A  back reference that occurs inside the parentheses to which it refers
-       fails when the subpattern is first used, so, for example,  (a\1)  never
-       matches.   However,  such references can be useful inside repeated sub-
+       A back reference that occurs inside the parentheses to which it  refers
+       fails  when  the subpattern is first used, so, for example, (a\1) never
+       matches.  However, such references can be useful inside  repeated  sub-
        patterns. For example, the pattern

          (a|b\1)+

        matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
-       ation  of  the  subpattern,  the  back  reference matches the character
-       string corresponding to the previous iteration. In order  for  this  to
-       work,  the  pattern must be such that the first iteration does not need
-       to match the back reference. This can be done using alternation, as  in
+       ation of the subpattern,  the  back  reference  matches  the  character
+       string  corresponding  to  the previous iteration. In order for this to
+       work, the pattern must be such that the first iteration does  not  need
+       to  match the back reference. This can be done using alternation, as in
        the example above, or by a quantifier with a minimum of zero.

-       Back  references of this type cause the group that they reference to be
-       treated as an atomic group.  Once the whole group has been  matched,  a
-       subsequent  matching  failure cannot cause backtracking into the middle
+       Back references of this type cause the group that they reference to  be
+       treated  as  an atomic group.  Once the whole group has been matched, a
+       subsequent matching failure cannot cause backtracking into  the  middle
        of the group.

ASSERTIONS

-       An assertion is a test on the characters  following  or  preceding  the
+       An  assertion  is  a  test on the characters following or preceding the
        current matching point that does not consume any characters. The simple
-       assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are  described
+       assertions  coded  as  \b,  \B,  \A,  \G, \Z, \z, ^ and $ are described
        above.

-       More  complicated  assertions  are  coded as subpatterns. There are two
-       kinds: those that look ahead of the current  position  in  the  subject
-       string,  and  those  that  look  behind  it. An assertion subpattern is
-       matched in the normal way, except that it does not  cause  the  current
+       More complicated assertions are coded as  subpatterns.  There  are  two
+       kinds:  those  that  look  ahead of the current position in the subject
+       string, and those that look  behind  it.  An  assertion  subpattern  is
+       matched  in  the  normal way, except that it does not cause the current
        matching position to be changed.

-       Assertion  subpatterns are not capturing subpatterns. If such an asser-
-       tion contains capturing subpatterns within it, these  are  counted  for
-       the  purposes  of numbering the capturing subpatterns in the whole pat-
-       tern. However, substring capturing is carried  out  only  for  positive
+       Assertion subpatterns are not capturing subpatterns. If such an  asser-
+       tion  contains  capturing  subpatterns within it, these are counted for
+       the purposes of numbering the capturing subpatterns in the  whole  pat-
+       tern.  However,  substring  capturing  is carried out only for positive
        assertions. (Perl sometimes, but not always, does do capturing in nega-
        tive assertions.)

+       WARNING:  If a positive assertion containing one or more capturing sub-
+       patterns succeeds, but failure to match later  in  the  pattern  causes
+       backtracking over this assertion, the captures within the assertion are
+       reset only if no higher numbered captures are  already  set.  This  is,
+       unfortunately,  a fundamental limitation of the current implementation;
+       it may get removed in a future reworking.
+
        For  compatibility  with  Perl,  most  assertion  subpatterns  may   be
        repeated;  though  it  makes  no sense to assert the same thing several
        times, the side effect of capturing  parentheses  may  occasionally  be
@@ -7340,16 +7418,28 @@
        then try to match. If there are insufficient characters before the cur-
        rent position, the assertion fails.

-       In a UTF mode, PCRE2 does not allow the \C escape (which matches a sin-
-       gle  code  unit even in a UTF mode) to appear in lookbehind assertions,
-       because it makes it impossible to calculate the length of  the  lookbe-
-       hind.  The \X and \R escapes, which can match different numbers of code
-       units, are also not permitted.
+       In UTF-8 and UTF-16 modes, PCRE2 does not allow the  \C  escape  (which
+       matches  a single code unit even in a UTF mode) to appear in lookbehind
+       assertions, because it makes it impossible to calculate the  length  of
+       the  lookbehind.  The \X and \R escapes, which can match different num-
+       bers of code units, are never permitted in lookbehinds.

        "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
        lookbehinds,  as  long as the subpattern matches a fixed-length string.
-       Recursion, however, is not supported.
+       However, recursion, that is, a "subroutine" call into a group  that  is
+       already active, is not supported.

+       Perl  does  not support back references in lookbehinds. PCRE2 does sup-
+       port  them,   but   only   if   certain   conditions   are   met.   The
+       PCRE2_MATCH_UNSET_BACKREF  option must not be set, there must be no use
+       of (?| in the pattern (it creates duplicate subpattern numbers), and if
+       the  back reference is by name, the name must be unique. Of course, the
+       referenced subpattern must itself be of  fixed  length.  The  following
+       pattern matches words containing at least two characters that begin and
+       end with the same character:
+
+          \b(\w)\w++(?<=\1)
+
        Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
        assertions to specify efficient matching of fixed-length strings at the
        end of subject strings. Consider a simple pattern such as
@@ -7482,7 +7572,9 @@
        Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
        used  subpattern  by  name.  For compatibility with earlier versions of
        PCRE1, which had this facility before Perl, the syntax (?(name)...)  is
-       also recognized.
+       also  recognized.  Note,  however, that undelimited names consisting of
+       the letter R followed by digits are ambiguous (see the  following  sec-
+       tion).

        Rewriting the above example to use a named subpattern gives this:

@@ -7494,76 +7586,95 @@

    Checking for pattern recursion

-       If the condition is the string (R), and there is no subpattern with the
-       name R, the condition is true if a recursive call to the whole  pattern
-       or any subpattern has been made. If digits or a name preceded by amper-
-       sand follow the letter R, for example:
+       "Recursion"  in  this sense refers to any subroutine-like call from one
+       part of the pattern to another, whether or not it  is  actually  recur-
+       sive.  See  the sections entitled "Recursive patterns" and "Subpatterns
+       as subroutines" below for details of recursion and subpattern calls.

-         (?(R3)...) or (?(R&name)...)
+       If a condition is the string (R), and there is no subpattern  with  the
+       name  R,  the condition is true if matching is currently in a recursion
+       or subroutine call to the whole pattern or any  subpattern.  If  digits
+       follow  the  letter  R,  and there is no subpattern with that name, the
+       condition is true if the most recent call is into a subpattern with the
+       given  number,  which must exist somewhere in the overall pattern. This
+       is a contrived example that is equivalent to a+b:

+         ((?(R1)a+|(?1)b))
+
+       However, in both cases, if there is a subpattern with a matching  name,
+       the  condition  tests  for  its  being set, as described in the section
+       above, instead of testing for recursion. For example, creating a  group
+       with  the  name  R1  by  adding (?<R1>) to the above pattern completely
+       changes its meaning.
+
+       If a name preceded by ampersand follows the letter R, for example:
+
+         (?(R&name)...)
+
        the condition is true if the most recent recursion is into a subpattern
-       whose number or name is given. This condition does not check the entire
-       recursion stack. If the name used in a condition  of  this  kind  is  a
+       of that name (which must exist within the pattern).
+
+       This condition does not check the entire recursion stack. It tests only
+       the current level. If the name used in a condition of this  kind  is  a
        duplicate, the test is applied to all subpatterns of the same name, and
        is true if any one of them is the most recent recursion.

-       At "top level", all these recursion test  conditions  are  false.   The
-       syntax for recursive patterns is described below.
+       At "top level", all these recursion test conditions are false.

    Defining subpatterns for use by reference only

-       If  the  condition  is  the string (DEFINE), and there is no subpattern
-       with the name DEFINE, the condition is  always  false.  In  this  case,
-       there  may  be  only  one  alternative  in the subpattern. It is always
-       skipped if control reaches this point  in  the  pattern;  the  idea  of
-       DEFINE  is that it can be used to define subroutines that can be refer-
-       enced from elsewhere. (The use of subroutines is described below.)  For
-       example,  a  pattern  to match an IPv4 address such as "192.168.23.245"
-       could be written like this (ignore white space and line breaks):
+       If the condition is the string (DEFINE), the condition is always false,
+       even  if there is a group with the name DEFINE. In this case, there may
+       be only one alternative in the subpattern. It is always skipped if con-
+       trol  reaches  this point in the pattern; the idea of DEFINE is that it
+       can be used to define subroutines that can  be  referenced  from  else-
+       where. (The use of subroutines is described below.) For example, a pat-
+       tern to match an IPv4 address such as "192.168.23.245" could be written
+       like this (ignore white space and line breaks):

          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b

-       The first part of the pattern is a DEFINE group inside which a  another
-       group  named "byte" is defined. This matches an individual component of
-       an IPv4 address (a number less than 256). When  matching  takes  place,
-       this  part  of  the pattern is skipped because DEFINE acts like a false
-       condition. The rest of the pattern uses references to the  named  group
-       to  match the four dot-separated components of an IPv4 address, insist-
+       The  first part of the pattern is a DEFINE group inside which a another
+       group named "byte" is defined. This matches an individual component  of
+       an  IPv4  address  (a number less than 256). When matching takes place,
+       this part of the pattern is skipped because DEFINE acts  like  a  false
+       condition.  The  rest of the pattern uses references to the named group
+       to match the four dot-separated components of an IPv4 address,  insist-
        ing on a word boundary at each end.

    Checking the PCRE2 version

-       Programs that link with a PCRE2 library can check the version by  call-
-       ing  pcre2_config()  with  appropriate arguments. Users of applications
-       that do not have access to the underlying code cannot do this.  A  spe-
-       cial  "condition" called VERSION exists to allow such users to discover
+       Programs  that link with a PCRE2 library can check the version by call-
+       ing pcre2_config() with appropriate arguments.  Users  of  applications
+       that  do  not have access to the underlying code cannot do this. A spe-
+       cial "condition" called VERSION exists to allow such users to  discover
        which version of PCRE2 they are dealing with by using this condition to
-       match  a string such as "yesno". VERSION must be followed either by "="
+       match a string such as "yesno". VERSION must be followed either by  "="
        or ">=" and a version number.  For example:

          (?(VERSION>=10.4)yes|no)

-       This pattern matches "yes" if the PCRE2 version is greater or equal  to
-       10.4,  or "no" otherwise. The fractional part of the version number may
+       This  pattern matches "yes" if the PCRE2 version is greater or equal to
+       10.4, or "no" otherwise. The fractional part of the version number  may
        not contain more than two digits.

    Assertion conditions

-       If the condition is not in any of the above  formats,  it  must  be  an
-       assertion.   This may be a positive or negative lookahead or lookbehind
-       assertion. Consider  this  pattern,  again  containing  non-significant
+       If  the  condition  is  not  in any of the above formats, it must be an
+       assertion.  This may be a positive or negative lookahead or  lookbehind
+       assertion.  Consider  this  pattern,  again  containing non-significant
        white space, and with the two alternatives on the second line:

          (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )

-       The  condition  is  a  positive  lookahead  assertion  that  matches an
-       optional sequence of non-letters followed by a letter. In other  words,
-       it  tests  for the presence of at least one letter in the subject. If a
-       letter is found, the subject is matched against the first  alternative;
-       otherwise  it  is  matched  against  the  second.  This pattern matches
-       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
+       The condition  is  a  positive  lookahead  assertion  that  matches  an
+       optional  sequence of non-letters followed by a letter. In other words,
+       it tests for the presence of at least one letter in the subject.  If  a
+       letter  is found, the subject is matched against the first alternative;
+       otherwise it is  matched  against  the  second.  This  pattern  matches
+       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
        letters and dd are digits.

@@ -7570,44 +7681,44 @@
COMMENTS

        There are two ways of including comments in patterns that are processed
-       by PCRE2. In both cases, the start of the comment  must  not  be  in  a
-       character  class,  nor  in  the middle of any other sequence of related
-       characters such as (?: or a subpattern name or number.  The  characters
+       by  PCRE2.  In  both  cases,  the start of the comment must not be in a
+       character class, nor in the middle of any  other  sequence  of  related
+       characters  such  as (?: or a subpattern name or number. The characters
        that make up a comment play no part in the pattern matching.

-       The  sequence (?# marks the start of a comment that continues up to the
-       next closing parenthesis. Nested parentheses are not permitted. If  the
-       PCRE2_EXTENDED  option is set, an unescaped # character also introduces
-       a comment, which in this case continues to immediately after  the  next
-       newline  character  or character sequence in the pattern. Which charac-
-       ters are interpreted as newlines is controlled by an option  passed  to
-       the  compiling  function  or  by a special sequence at the start of the
-       pattern, as described in the  section  entitled  "Newline  conventions"
-       above.  Note  that the end of this type of comment is a literal newline
-       sequence in the pattern; escape sequences that happen  to  represent  a
-       newline   do  not  count.  For  example,  consider  this  pattern  when
-       PCRE2_EXTENDED is set, and the default  newline  convention  (a  single
+       The sequence (?# marks the start of a comment that continues up to  the
+       next  closing parenthesis. Nested parentheses are not permitted. If the
+       PCRE2_EXTENDED option is set, an unescaped # character also  introduces
+       a  comment,  which in this case continues to immediately after the next
+       newline character or character sequence in the pattern.  Which  charac-
+       ters  are  interpreted as newlines is controlled by an option passed to
+       the compiling function or by a special sequence at  the  start  of  the
+       pattern,  as  described  in  the section entitled "Newline conventions"
+       above. Note that the end of this type of comment is a  literal  newline
+       sequence  in  the  pattern; escape sequences that happen to represent a
+       newline  do  not  count.  For  example,  consider  this  pattern   when
+       PCRE2_EXTENDED  is  set,  and  the default newline convention (a single
        linefeed character) is in force:

          abc #comment \n still comment

-       On  encountering  the # character, pcre2_compile() skips along, looking
-       for a newline in the pattern. The sequence \n is still literal at  this
-       stage,  so  it does not terminate the comment. Only an actual character
+       On encountering the # character, pcre2_compile() skips  along,  looking
+       for  a newline in the pattern. The sequence \n is still literal at this
+       stage, so it does not terminate the comment. Only an  actual  character
        with the code value 0x0a (the default newline) does so.

RECURSIVE PATTERNS

-       Consider the problem of matching a string in parentheses, allowing  for
-       unlimited  nested  parentheses.  Without the use of recursion, the best
-       that can be done is to use a pattern that  matches  up  to  some  fixed
-       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
+       Consider  the problem of matching a string in parentheses, allowing for
+       unlimited nested parentheses. Without the use of  recursion,  the  best
+       that  can  be  done  is  to use a pattern that matches up to some fixed
+       depth of nesting. It is not possible to  handle  an  arbitrary  nesting
        depth.

        For some time, Perl has provided a facility that allows regular expres-
-       sions  to recurse (amongst other things). It does this by interpolating
-       Perl code in the expression at run time, and the code can refer to  the
+       sions to recurse (amongst other things). It does this by  interpolating
+       Perl  code in the expression at run time, and the code can refer to the
        expression itself. A Perl pattern using code interpolation to solve the
        parentheses problem can be created like this:

@@ -7617,214 +7728,214 @@
        refers recursively to the pattern in which it appears.

        Obviously,  PCRE2  cannot  support  the  interpolation  of  Perl  code.
-       Instead, it supports special syntax for recursion of  the  entire  pat-
+       Instead,  it  supports  special syntax for recursion of the entire pat-
        tern, and also for individual subpattern recursion. After its introduc-
-       tion in PCRE1 and Python,  this  kind  of  recursion  was  subsequently
+       tion  in  PCRE1  and  Python,  this  kind of recursion was subsequently
        introduced into Perl at release 5.10.

-       A  special  item  that consists of (? followed by a number greater than
-       zero and a closing parenthesis is a recursive subroutine  call  of  the
-       subpattern  of  the  given  number, provided that it occurs inside that
-       subpattern. (If not, it is a non-recursive subroutine  call,  which  is
-       described  in  the  next  section.)  The special item (?R) or (?0) is a
+       A special item that consists of (? followed by a  number  greater  than
+       zero  and  a  closing parenthesis is a recursive subroutine call of the
+       subpattern of the given number, provided that  it  occurs  inside  that
+       subpattern.  (If  not,  it is a non-recursive subroutine call, which is
+       described in the next section.) The special item  (?R)  or  (?0)  is  a
        recursive call of the entire regular expression.

-       This PCRE2 pattern solves the nested parentheses  problem  (assume  the
+       This  PCRE2  pattern  solves the nested parentheses problem (assume the
        PCRE2_EXTENDED option is set so that white space is ignored):

          \( ( [^()]++ | (?R) )* \)

-       First  it matches an opening parenthesis. Then it matches any number of
-       substrings which can either be a  sequence  of  non-parentheses,  or  a
-       recursive  match  of the pattern itself (that is, a correctly parenthe-
+       First it matches an opening parenthesis. Then it matches any number  of
+       substrings  which  can  either  be  a sequence of non-parentheses, or a
+       recursive match of the pattern itself (that is, a  correctly  parenthe-
        sized substring).  Finally there is a closing parenthesis. Note the use
        of a possessive quantifier to avoid backtracking into sequences of non-
        parentheses.

-       If this were part of a larger pattern, you would not  want  to  recurse
+       If  this  were  part of a larger pattern, you would not want to recurse
        the entire pattern, so instead you could use this:

          ( \( ( [^()]++ | (?1) )* \) )

-       We  have  put the pattern into parentheses, and caused the recursion to
+       We have put the pattern into parentheses, and caused the  recursion  to
        refer to them instead of the whole pattern.

-       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
-       tricky.  This is made easier by the use of relative references. Instead
+       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
+       tricky. This is made easier by the use of relative references.  Instead
        of (?1) in the pattern above you can write (?-2) to refer to the second
-       most  recently  opened  parentheses  preceding  the recursion. In other
-       words, a negative number counts capturing  parentheses  leftwards  from
+       most recently opened parentheses  preceding  the  recursion.  In  other
+       words,  a  negative  number counts capturing parentheses leftwards from
        the point at which it is encountered.

        Be aware however, that if duplicate subpattern numbers are in use, rel-
-       ative references refer to the earliest subpattern with the  appropriate
+       ative  references refer to the earliest subpattern with the appropriate
        number. Consider, for example:

          (?|(a)|(b)) (c) (?-2)

-       The  first  two  capturing  groups (a) and (b) are both numbered 1, and
-       group (c) is number 2. When the reference  (?-2)  is  encountered,  the
+       The first two capturing groups (a) and (b) are  both  numbered  1,  and
+       group  (c)  is  number  2. When the reference (?-2) is encountered, the
        second most recently opened parentheses has the number 1, but it is the
-       first such group (the (a) group) to which the  recursion  refers.  This
-       would  be  the  same  if  an absolute reference (?1) was used. In other
-       words, relative references are just a shorthand for computing  a  group
+       first  such  group  (the (a) group) to which the recursion refers. This
+       would be the same if an absolute reference  (?1)  was  used.  In  other
+       words,  relative  references are just a shorthand for computing a group
        number.

-       It  is  also  possible  to refer to subsequently opened parentheses, by
-       writing references such as (?+2). However, these  cannot  be  recursive
-       because  the  reference  is  not inside the parentheses that are refer-
-       enced. They are always non-recursive subroutine calls, as described  in
+       It is also possible to refer to  subsequently  opened  parentheses,  by
+       writing  references  such  as (?+2). However, these cannot be recursive
+       because the reference is not inside the  parentheses  that  are  refer-
+       enced.  They are always non-recursive subroutine calls, as described in
        the next section.

-       An  alternative  approach  is to use named parentheses. The Perl syntax
-       for this is (?&name); PCRE1's earlier syntax  (?P>name)  is  also  sup-
+       An alternative approach is to use named parentheses.  The  Perl  syntax
+       for  this  is  (?&name);  PCRE1's earlier syntax (?P>name) is also sup-
        ported. We could rewrite the above example as follows:

          (?<pn> \( ( [^()]++ | (?&pn) )* \) )

-       If  there  is more than one subpattern with the same name, the earliest
+       If there is more than one subpattern with the same name,  the  earliest
        one is used.

        The example pattern that we have been looking at contains nested unlim-
-       ited  repeats,  and  so the use of a possessive quantifier for matching
-       strings of non-parentheses is important when applying  the  pattern  to
+       ited repeats, and so the use of a possessive  quantifier  for  matching
+       strings  of  non-parentheses  is important when applying the pattern to
        strings that do not match. For example, when this pattern is applied to

          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()

-       it  yields  "no  match" quickly. However, if a possessive quantifier is
-       not used, the match runs for a very long time indeed because there  are
-       so  many  different  ways the + and * repeats can carve up the subject,
+       it yields "no match" quickly. However, if a  possessive  quantifier  is
+       not  used, the match runs for a very long time indeed because there are
+       so many different ways the + and * repeats can carve  up  the  subject,
        and all have to be tested before failure can be reported.

-       At the end of a match, the values of capturing  parentheses  are  those
-       from  the outermost level. If you want to obtain intermediate values, a
+       At  the  end  of a match, the values of capturing parentheses are those
+       from the outermost level. If you want to obtain intermediate values,  a
        callout function can be used (see below and the pcre2callout documenta-
        tion). If the pattern above is matched against

          (ab(cd)ef)

-       the  value  for  the  inner capturing parentheses (numbered 2) is "ef",
-       which is the last value taken on at the top level. If a capturing  sub-
-       pattern  is  not  matched at the top level, its final captured value is
-       unset, even if it was (temporarily) set at a deeper  level  during  the
+       the value for the inner capturing parentheses  (numbered  2)  is  "ef",
+       which  is the last value taken on at the top level. If a capturing sub-
+       pattern is not matched at the top level, its final  captured  value  is
+       unset,  even  if  it was (temporarily) set at a deeper level during the
        matching process.

        If there are more than 15 capturing parentheses in a pattern, PCRE2 has
-       to obtain extra memory from the heap to store data during a  recursion.
-       If   no   memory   can   be   obtained,   the   match  fails  with  the
+       to  obtain extra memory from the heap to store data during a recursion.
+       If  no  memory  can   be   obtained,   the   match   fails   with   the
        PCRE2_ERROR_NOMEMORY error.

-       Do not confuse the (?R) item with the condition (R),  which  tests  for
-       recursion.   Consider  this pattern, which matches text in angle brack-
-       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
-       brackets  (that is, when recursing), whereas any characters are permit-
+       Do  not  confuse  the (?R) item with the condition (R), which tests for
+       recursion.  Consider this pattern, which matches text in  angle  brack-
+       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
+       brackets (that is, when recursing), whereas any characters are  permit-
        ted at the outer level.

          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >

-       In this pattern, (?(R) is the start of a conditional  subpattern,  with
-       two  different  alternatives for the recursive and non-recursive cases.
+       In  this  pattern, (?(R) is the start of a conditional subpattern, with
+       two different alternatives for the recursive and  non-recursive  cases.
        The (?R) item is the actual recursive call.

    Differences in recursion processing between PCRE2 and Perl

-       Recursion processing in PCRE2 differs from Perl in two important  ways.
+       Recursion  processing in PCRE2 differs from Perl in two important ways.
        In PCRE2 (like Python, but unlike Perl), a recursive subpattern call is
        always treated as an atomic group. That is, once it has matched some of
        the subject string, it is never re-entered, even if it contains untried
-       alternatives and there is a subsequent matching failure.  This  can  be
-       illustrated  by the following pattern, which purports to match a palin-
-       dromic string that contains an odd number of characters  (for  example,
+       alternatives  and  there  is a subsequent matching failure. This can be
+       illustrated by the following pattern, which purports to match a  palin-
+       dromic  string  that contains an odd number of characters (for example,
        "a", "aba", "abcba", "abcdcba"):

          ^(.|(.)(?1)\2)$

        The idea is that it either matches a single character, or two identical
-       characters surrounding a sub-palindrome. In Perl, this  pattern  works;
-       in  PCRE2  it  does not if the pattern is longer than three characters.
+       characters  surrounding  a sub-palindrome. In Perl, this pattern works;
+       in PCRE2 it does not if the pattern is longer  than  three  characters.
        Consider the subject string "abcba":

-       At the top level, the first character is matched, but as it is  not  at
+       At  the  top level, the first character is matched, but as it is not at
        the end of the string, the first alternative fails; the second alterna-
        tive is taken and the recursion kicks in. The recursive call to subpat-
-       tern  1  successfully  matches the next character ("b"). (Note that the
+       tern 1 successfully matches the next character ("b").  (Note  that  the
        beginning and end of line tests are not part of the recursion).

-       Back at the top level, the next character ("c") is compared  with  what
-       subpattern  2 matched, which was "a". This fails. Because the recursion
-       is treated as an atomic group, there are now  no  backtracking  points,
-       and  so  the  entire  match fails. (Perl is able, at this point, to re-
-       enter the recursion and try the second alternative.)  However,  if  the
+       Back  at  the top level, the next character ("c") is compared with what
+       subpattern 2 matched, which was "a". This fails. Because the  recursion
+       is  treated  as  an atomic group, there are now no backtracking points,
+       and so the entire match fails. (Perl is able, at  this  point,  to  re-
+       enter  the  recursion  and try the second alternative.) However, if the
        pattern is written with the alternatives in the other order, things are
        different:

          ^((.)(?1)\2|.)$

-       This time, the recursing alternative is tried first, and  continues  to
-       recurse  until  it runs out of characters, at which point the recursion
-       fails. But this time we do have  another  alternative  to  try  at  the
-       higher  level.  That  is  the  big difference: in the previous case the
-       remaining alternative is at a deeper recursion level, which PCRE2  can-
+       This  time,  the recursing alternative is tried first, and continues to
+       recurse until it runs out of characters, at which point  the  recursion
+       fails.  But  this  time  we  do  have another alternative to try at the
+       higher level. That is the big difference:  in  the  previous  case  the
+       remaining  alternative is at a deeper recursion level, which PCRE2 can-
        not use.

-       To  change  the pattern so that it matches all palindromic strings, not
-       just those with an odd number of characters, it is tempting  to  change
+       To change the pattern so that it matches all palindromic  strings,  not
+       just  those  with an odd number of characters, it is tempting to change
        the pattern to this:

          ^((.)(?1)\2|.?)$

-       Again,  this  works in Perl, but not in PCRE2, and for the same reason.
-       When a deeper recursion has matched a single character,  it  cannot  be
-       entered  again  in  order  to match an empty string. The solution is to
-       separate the two cases, and write out the odd and even cases as  alter-
+       Again, this works in Perl, but not in PCRE2, and for the  same  reason.
+       When  a  deeper  recursion has matched a single character, it cannot be
+       entered again in order to match an empty string.  The  solution  is  to
+       separate  the two cases, and write out the odd and even cases as alter-
        natives at the higher level:

          ^(?:((.)(?1)\2|)|((.)(?3)\4|.))

-       If  you  want  to match typical palindromic phrases, the pattern has to
+       If you want to match typical palindromic phrases, the  pattern  has  to
        ignore all non-word characters, which can be done like this:

          ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$

-       If run with the PCRE2_CASELESS option,  this  pattern  matches  phrases
-       such  as  "A  man, a plan, a canal: Panama!" and it works in both PCRE2
-       and Perl. Note the use of the possessive quantifier *+ to  avoid  back-
-       tracking  into  sequences  of  non-word characters. Without this, PCRE2
+       If  run  with  the  PCRE2_CASELESS option, this pattern matches phrases
+       such as "A man, a plan, a canal: Panama!" and it works  in  both  PCRE2
+       and  Perl.  Note the use of the possessive quantifier *+ to avoid back-
+       tracking into sequences of non-word  characters.  Without  this,  PCRE2
        takes a great deal longer (ten times or more) to match typical phrases,
        and Perl takes so long that you think it has gone into a loop.

-       WARNING:  The  palindrome-matching patterns above work only if the sub-
-       ject string does not start with a palindrome that is shorter  than  the
-       entire  string.  For example, although "abcba" is correctly matched, if
-       the subject is "ababa", PCRE2 finds the palindrome "aba" at the  start,
-       then  fails at top level because the end of the string does not follow.
-       Once again, it cannot jump back into the recursion to try other  alter-
+       WARNING: The palindrome-matching patterns above work only if  the  sub-
+       ject  string  does not start with a palindrome that is shorter than the
+       entire string.  For example, although "abcba" is correctly matched,  if
+       the  subject is "ababa", PCRE2 finds the palindrome "aba" at the start,
+       then fails at top level because the end of the string does not  follow.
+       Once  again, it cannot jump back into the recursion to try other alter-
        natives, so the entire match fails.

-       The  second  way in which PCRE2 and Perl differ in their recursion pro-
-       cessing is in the handling of captured values. In Perl, when a  subpat-
-       tern  is  called recursively or as a subpattern (see the next section),
-       it has no access to any values that were captured  outside  the  recur-
-       sion,  whereas  in  PCRE2 these values can be referenced. Consider this
+       The second way in which PCRE2 and Perl differ in their  recursion  pro-
+       cessing  is in the handling of captured values. In Perl, when a subpat-
+       tern is called recursively or as a subpattern (see the  next  section),
+       it  has  no  access to any values that were captured outside the recur-
+       sion, whereas in PCRE2 these values can be  referenced.  Consider  this
        pattern:

          ^(.)(\1|a(?2))

-       In PCRE2, this pattern matches "bab". The first  capturing  parentheses
-       match  "b",  then in the second group, when the back reference \1 fails
-       to match "b", the second alternative matches "a" and then recurses.  In
-       the  recursion,  \1 does now match "b" and so the whole match succeeds.
-       In Perl, the pattern fails to match because inside the  recursive  call
+       In  PCRE2,  this pattern matches "bab". The first capturing parentheses
+       match "b", then in the second group, when the back reference  \1  fails
+       to  match "b", the second alternative matches "a" and then recurses. In
+       the recursion, \1 does now match "b" and so the whole  match  succeeds.
+       In  Perl,  the pattern fails to match because inside the recursive call
        \1 cannot access the externally set value.

SUBPATTERNS AS SUBROUTINES

-       If  the  syntax for a recursive subpattern call (either by number or by
-       name) is used outside the parentheses to which it refers,  it  operates
-       like  a subroutine in a programming language. The called subpattern may
-       be defined before or after the reference. A numbered reference  can  be
+       If the syntax for a recursive subpattern call (either by number  or  by
+       name)  is  used outside the parentheses to which it refers, it operates
+       like a subroutine in a programming language. The called subpattern  may
+       be  defined  before or after the reference. A numbered reference can be
        absolute or relative, as in these examples:

          (...(absolute)...)...(?2)...
@@ -7835,50 +7946,50 @@

          (sens|respons)e and \1ibility

-       matches  "sense and sensibility" and "response and responsibility", but
+       matches "sense and sensibility" and "response and responsibility",  but
        not "sense and responsibility". If instead the pattern

          (sens|respons)e and (?1)ibility

-       is used, it does match "sense and responsibility" as well as the  other
-       two  strings.  Another  example  is  given  in the discussion of DEFINE
+       is  used, it does match "sense and responsibility" as well as the other
+       two strings. Another example is  given  in  the  discussion  of  DEFINE
        above.

-       All subroutine calls, whether recursive or not, are always  treated  as
-       atomic  groups. That is, once a subroutine has matched some of the sub-
+       All  subroutine  calls, whether recursive or not, are always treated as
+       atomic groups. That is, once a subroutine has matched some of the  sub-
        ject string, it is never re-entered, even if it contains untried alter-
-       natives  and  there  is  a  subsequent  matching failure. Any capturing
-       parentheses that are set during the subroutine  call  revert  to  their
+       natives and there is  a  subsequent  matching  failure.  Any  capturing
+       parentheses  that  are  set  during the subroutine call revert to their
        previous values afterwards.

-       Processing  options  such as case-independence are fixed when a subpat-
-       tern is defined, so if it is used as a subroutine, such options  cannot
+       Processing options such as case-independence are fixed when  a  subpat-
+       tern  is defined, so if it is used as a subroutine, such options cannot
        be changed for different calls. For example, consider this pattern:

          (abc)(?i:(?-1))

-       It  matches  "abcabc". It does not match "abcABC" because the change of
+       It matches "abcabc". It does not match "abcABC" because the  change  of
        processing option does not affect the called subpattern.

ONIGURUMA SUBROUTINE SYNTAX

-       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
+       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
        name or a number enclosed either in angle brackets or single quotes, is
-       an alternative syntax for referencing a  subpattern  as  a  subroutine,
-       possibly  recursively. Here are two of the examples used above, rewrit-
+       an  alternative  syntax  for  referencing a subpattern as a subroutine,
+       possibly recursively. Here are two of the examples used above,  rewrit-
        ten using this syntax:

          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility

-       PCRE2 supports an extension to Oniguruma: if a number is preceded by  a
+       PCRE2  supports an extension to Oniguruma: if a number is preceded by a
        plus or a minus sign it is taken as a relative reference. For example:

          (abc)(?i:\g<-1>)

-       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
-       synonymous. The former is a back reference; the latter is a  subroutine
+       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
+       synonymous.  The former is a back reference; the latter is a subroutine
        call.

@@ -7885,54 +7996,54 @@
CALLOUTS

        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl code to be obeyed in the middle of matching a regular  expression.
+       Perl  code to be obeyed in the middle of matching a regular expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.

-       PCRE2  provides  a  similar feature, but of course it cannot obey arbi-
-       trary Perl code. The feature is called "callout". The caller  of  PCRE2
-       provides  an  external  function  by putting its entry point in a match
-       context using the function pcre2_set_callout(), and then  passing  that
-       context  to  pcre2_match() or pcre2_dfa_match(). If no match context is
+       PCRE2 provides a similar feature, but of course it  cannot  obey  arbi-
+       trary  Perl  code. The feature is called "callout". The caller of PCRE2
+       provides an external function by putting its entry  point  in  a  match
+       context  using  the function pcre2_set_callout(), and then passing that
+       context to pcre2_match() or pcre2_dfa_match(). If no match  context  is
        passed, or if the callout entry point is set to NULL, callouts are dis-
        abled.

-       Within  a  regular expression, (?C<arg>) indicates a point at which the
-       external function is to be called. There  are  two  kinds  of  callout:
-       those  with a numerical argument and those with a string argument. (?C)
-       on its own with no argument is treated as (?C0). A  numerical  argument
-       allows  the  application  to  distinguish  between  different callouts.
-       String arguments were added for release 10.20 to make it  possible  for
-       script  languages that use PCRE2 to embed short scripts within patterns
+       Within a regular expression, (?C<arg>) indicates a point at  which  the
+       external  function  is  to  be  called. There are two kinds of callout:
+       those with a numerical argument and those with a string argument.  (?C)
+       on  its  own with no argument is treated as (?C0). A numerical argument
+       allows the  application  to  distinguish  between  different  callouts.
+       String  arguments  were added for release 10.20 to make it possible for
+       script languages that use PCRE2 to embed short scripts within  patterns
        in a similar way to Perl.

        During matching, when PCRE2 reaches a callout point, the external func-
-       tion  is  called.  It is provided with the number or string argument of
-       the callout, the position in the pattern, and one item of data that  is
+       tion is called. It is provided with the number or  string  argument  of
+       the  callout, the position in the pattern, and one item of data that is
        also set in the match block. The callout function may cause matching to
        proceed, to backtrack, or to fail.

-       By default, PCRE2 implements a  number  of  optimizations  at  matching
-       time,  and  one  side-effect is that sometimes callouts are skipped. If
-       you need all possible callouts to happen, you need to set options  that
-       disable  the relevant optimizations. More details, including a complete
-       description of the programming interface to the callout  function,  are
+       By  default,  PCRE2  implements  a  number of optimizations at matching
+       time, and one side-effect is that sometimes callouts  are  skipped.  If
+       you  need all possible callouts to happen, you need to set options that
+       disable the relevant optimizations. More details, including a  complete
+       description  of  the programming interface to the callout function, are
        given in the pcre2callout documentation.

    Callouts with numerical arguments

-       If  you  just  want  to  have  a means of identifying different callout
-       points, put a number less than 256 after the  letter  C.  For  example,
+       If you just want to have  a  means  of  identifying  different  callout
+       points,  put  a  number  less than 256 after the letter C. For example,
        this pattern has two callout points:

          (?C1)abc(?C2)def

-       If  the PCRE2_AUTO_CALLOUT flag is passed to pcre2_compile(), numerical
-       callouts are automatically installed before each item in  the  pattern.
-       They  are all numbered 255. If there is a conditional group in the pat-
+       If the PCRE2_AUTO_CALLOUT flag is passed to pcre2_compile(),  numerical
+       callouts  are  automatically installed before each item in the pattern.
+       They are all numbered 255. If there is a conditional group in the  pat-
        tern whose condition is an assertion, an additional callout is inserted
-       just  before the condition. An explicit callout may also be set at this
+       just before the condition. An explicit callout may also be set at  this
        position, as in this example:

          (?(?C9)(?=a)abc|def)
@@ -7942,43 +8053,52 @@

    Callouts with string arguments

-       A  delimited  string may be used instead of a number as a callout argu-
-       ment. The starting delimiter must be one of ` ' " ^ % #  $  {  and  the
+       A delimited string may be used instead of a number as a  callout  argu-
+       ment.  The  starting  delimiter  must be one of ` ' " ^ % # $ { and the
        ending delimiter is the same as the start, except for {, where the end-
-       ing delimiter is }. If  the  ending  delimiter  is  needed  within  the
+       ing  delimiter  is  }.  If  the  ending  delimiter is needed within the
        string, it must be doubled. For example:

          (?C'ab ''c'' d')xyz(?C{any text})pqr

-       The  doubling  is  removed  before  the string is passed to the callout
+       The doubling is removed before the string  is  passed  to  the  callout
        function.

BACKTRACKING CONTROL

-       Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
-       which  are  still  described in the Perl documentation as "experimental
-       and subject to change or removal in a future version of Perl". It  goes
-       on  to  say:  "Their  usage in production code should be noted to avoid
+       Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
+       which are still described in the Perl  documentation  as  "experimental
+       and  subject to change or removal in a future version of Perl". It goes
+       on to say: "Their usage in production code should  be  noted  to  avoid
        problems during upgrades." The same remarks apply to the PCRE2 features
        described in this section.

-       The  new verbs make use of what was previously invalid syntax: an open-
+       The new verbs make use of what was previously invalid syntax: an  open-
        ing parenthesis followed by an asterisk. They are generally of the form
        (*VERB) or (*VERB:NAME). Some verbs take either form, possibly behaving
        differently depending on whether or not a name is present.

-       By default, for compatibility with Perl, a  name  is  any  sequence  of
+       By  default,  for  compatibility  with  Perl, a name is any sequence of
        characters that does not include a closing parenthesis. The name is not
-       processed in any way, and it is  not  possible  to  include  a  closing
-       parenthesis in the name.  However, if the PCRE2_ALT_VERBNAMES option is
-       set, normal backslash processing is applied to verb names and  only  an
-       unescaped  closing parenthesis terminates the name. A closing parenthe-
-       sis can be included in a name either as \) or between \Q and \E. If the
-       PCRE2_EXTENDED  option  is  set,  unescaped whitespace in verb names is
-       skipped and #-comments are recognized, exactly as in the  rest  of  the
-       pattern.
+       processed  in  any  way,  and  it  is not possible to include a closing
+       parenthesis  in  the  name.   This  can  be  changed  by  setting   the
+       PCRE2_ALT_VERBNAMES  option,  but the result is no longer Perl-compati-
+       ble.

+       When PCRE2_ALT_VERBNAMES is set, backslash  processing  is  applied  to
+       verb  names  and  only  an unescaped closing parenthesis terminates the
+       name. However, the only backslash items that are permitted are \Q,  \E,
+       and  sequences such as \x{100} that define character code points. Char-
+       acter type escapes such as \d are faulted.
+
+       A closing parenthesis can be included in a name either as \) or between
+       \Q  and  \E. In addition to backslash processing, if the PCRE2_EXTENDED
+       option is also set, unescaped whitespace in verb names is skipped,  and
+       #-comments  are  recognized,  exactly  as  in  the rest of the pattern.
+       PCRE2_EXTENDED does not affect verb names unless PCRE2_ALT_VERBNAMES is
+       also set.
+
        The  maximum  length of a name is 255 in the 8-bit library and 65535 in
        the 16-bit and 32-bit libraries. If the name is empty, that is, if  the
        closing  parenthesis immediately follows the colon, the effect is as if
@@ -8367,7 +8487,7 @@

REVISION

-       Last updated: 20 June 2016
+       Last updated: 23 October 2016
        Copyright (c) 1997-2016 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -9589,6 +9709,9 @@
          \n              reference by number (can be ambiguous)
          \gn             reference by number
          \g{n}           reference by number
+         \g+n            relative reference by number (PCRE2 extension)
+         \g-n            relative reference by number
+         \g{+n}          relative reference by number (PCRE2 extension)
          \g{-n}          relative reference by number
          \k<name>        reference by name (Perl)
          \k'name'        reference by name (Perl)
@@ -9625,15 +9748,19 @@
          (?(-n)              relative reference condition
          (?(<name>)          named reference condition (Perl)
          (?('name')          named reference condition (Perl)
-         (?(name)            named reference condition (PCRE2)
+         (?(name)            named reference condition (PCRE2, deprecated)
          (?(R)               overall recursion condition
-         (?(Rn)              specific group recursion condition
-         (?(R&name)          specific recursion condition
+         (?(Rn)              specific numbered group recursion condition
+         (?(R&name)          specific named group recursion condition
          (?(DEFINE)          define subpattern for reference
          (?(VERSION[>]=n.m)  test PCRE2 version
          (?(assert)          assertion condition

+       Note the ambiguity of (?(R) and (?(Rn) which might be  named  reference
+       conditions  or  recursion  tests.  Such a condition is interpreted as a
+       reference condition if the relevant named group exists.

+
BACKTRACKING CONTROL

        The following act immediately they are reached:
@@ -9684,8 +9811,8 @@

REVISION

-       Last updated: 16 October 2015
-       Copyright (c) 1997-2015 University of Cambridge.
+       Last updated: 28 September 2016
+       Copyright (c) 1997-2016 University of Cambridge.
 ------------------------------------------------------------------------------

Modified: code/trunk/doc/pcre2_code_copy.3
===================================================================
--- code/trunk/doc/pcre2_code_copy.3    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/pcre2_code_copy.3    2016-11-22 15:37:02 UTC (rev 605)
@@ -1,4 +1,4 @@
-.TH PCRE2_CODE_COPY 3 "26 February 2016" "PCRE2 10.22"
+.TH PCRE2_CODE_COPY 3 "22 November 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -16,8 +16,9 @@
 This function makes a copy of the memory used for a compiled pattern, excluding
 any memory used by the JIT compiler. Without a subsequent call to
 \fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching. The
-yield of the function is NULL if \fIcode\fP is NULL or if sufficient memory
-cannot be obtained.
+pointer to the character tables is copied, not the tables themselves (see
+\fBpcre2_code_copy_with_tables()\fP). The yield of the function is NULL if
+\fIcode\fP is NULL or if sufficient memory cannot be obtained.
 .P
 There is a complete description of the PCRE2 native API in the
 .\" HREF

Added: code/trunk/doc/pcre2_code_copy_with_tables.3
===================================================================
--- code/trunk/doc/pcre2_code_copy_with_tables.3                            (rev 0)
+++ code/trunk/doc/pcre2_code_copy_with_tables.3    2016-11-22 15:37:02 UTC (rev 605)
@@ -0,0 +1,32 @@
+.TH PCRE2_CODE_COPY 3 "22 November 2016" "PCRE2 10.23"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre2.h>
+.PP
+.nf
+.B pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *\fIcode\fP);
+.fi
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function makes a copy of the memory used for a compiled pattern, excluding
+any memory used by the JIT compiler. Without a subsequent call to
+\fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching. 
+Unlike \fBpcre2_code_copy()\fP, a separate copy of the character tables is also
+made, with the new code pointing to it. This memory will be automatically freed
+when \fBpcre2_code_free()\fP is called. The yield of the function is NULL if
+\fIcode\fP is NULL or if sufficient memory cannot be obtained.
+.P
+There is a complete description of the PCRE2 native API in the
+.\" HREF
+\fBpcre2api\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcre2posix\fP
+.\"
+page.

Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/pcre2api.3    2016-11-22 15:37:02 UTC (rev 605)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "30 September 2016" "PCRE2 10.23"
+.TH PCRE2API 3 "22 November 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -235,6 +235,8 @@
 .nf
 .B pcre2_code *pcre2_code_copy(const pcre2_code *\fIcode\fP);
 .sp
+.B pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *\fIcode\fP);
+.sp
 .B int pcre2_get_error_message(int \fIerrorcode\fP, PCRE2_UCHAR *\fIbuffer\fP,
 .B "  PCRE2_SIZE \fIbufflen\fP);"
 .sp
@@ -509,8 +511,9 @@
 (perhaps waiting to see if the pattern is used often enough) similar logic is
 required. JIT compilation updates a pointer within the compiled code block, so
 a thread must gain unique write access to the pointer before calling
-\fBpcre2_jit_compile()\fP. Alternatively, \fBpcre2_code_copy()\fP can be used
-to obtain a private copy of the compiled code.
+\fBpcre2_jit_compile()\fP. Alternatively, \fBpcre2_code_copy()\fP or 
+\fBpcre2_code_copy_with_tables()\fP can be used to obtain a private copy of the
+compiled code.
 .
 .
 .SS "Context blocks"
@@ -1027,6 +1030,8 @@
 .B void pcre2_code_free(pcre2_code *\fIcode\fP);
 .sp
 .B pcre2_code *pcre2_code_copy(const pcre2_code *\fIcode\fP);
+.sp
+.B pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *\fIcode\fP);
 .fi
 .P
 The \fBpcre2_compile()\fP function compiles a pattern into an internal form.
@@ -1049,10 +1054,25 @@
 .\"
 the JIT information cannot be copied (because it is position-dependent).
 The new copy can initially be used only for non-JIT matching, though it can be
-passed to \fBpcre2_jit_compile()\fP if required. The \fBpcre2_code_copy()\fP
-function provides a way for individual threads in a multithreaded application
-to acquire a private copy of shared compiled code.
+passed to \fBpcre2_jit_compile()\fP if required. 
 .P
+The \fBpcre2_code_copy()\fP function provides a way for individual threads in a
+multithreaded application to acquire a private copy of shared compiled code. 
+However, it does not make a copy of the character tables used by the compiled 
+pattern; the new pattern code points to the same tables as the original code.
+(See
+.\" HTML <a href="#jitcompiling">
+.\" </a>
+"Locale Support"
+.\"
+below for details of these character tables.) In many applications the same
+tables are used throughout, so this behaviour is appropriate. Nevertheless, 
+there are occasions when a copy of a compiled pattern and the relevant tables 
+are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility. 
+Copies of both the code and the tables are made, with the new code pointing to 
+the new tables. The memory for the new tables is automatically freed when
+\fBpcre2_code_free()\fP is called for the new copy of the compiled code.
+.P
 NOTE: When one of the matching functions is called, pointers to the compiled
 pattern and the subject string are set in the match data block so that they can
 be referenced by the substring extraction functions. After running a match, you
@@ -3299,6 +3319,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 September 2016
+Last updated: 22 November 2016
 Copyright (c) 1997-2016 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2grep.txt
===================================================================
--- code/trunk/doc/pcre2grep.txt    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/pcre2grep.txt    2016-11-22 15:37:02 UTC (rev 605)
@@ -51,42 +51,51 @@
        boundary is controlled by the -N (--newline) option.

        The amount of memory used for buffering files that are being scanned is
-       controlled  by a parameter that can be set by the --buffer-size option.
-       The default value for this parameter is  specified  when  pcre2grep  is
-       built,  with  the  default  default  being 20K. A block of memory three
-       times this size is used (to allow for buffering  "before"  and  "after"
-       lines). An error occurs if a line overflows the buffer.
+       controlled  by  parameters  that  can  be  set by the --buffer-size and
+       --max-buffer-size options. The first of these sets the size  of  buffer
+       that  is obtained at the start of processing. If an input file contains
+       very long lines, a larger buffer may be  needed;  this  is  handled  by
+       automatically extending the buffer, up to the limit specified by --max-
+       buffer-size. The default values for these parameters are specified when
+       pcre2grep  is built, with the default defaults being 20K and 1M respec-
+       tively. An error occurs if a line is too long and  the  buffer  can  no
+       longer be expanded.

-       Patterns  can  be  no  longer than 8K or BUFSIZ bytes, whichever is the
-       greater.  BUFSIZ is defined in <stdio.h>. When there is more  than  one
+       The  block  of  memory that is actually used is three times the "buffer
+       size", to allow for buffering "before" and "after" lines. If the buffer
+       size  is too small, fewer than requested "before" and "after" lines may
+       be output.
+
+       Patterns can be no longer than 8K or BUFSIZ  bytes,  whichever  is  the
+       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
        pattern (specified by the use of -e and/or -f), each pattern is applied
-       to each line in the order in which they are defined,  except  that  all
+       to  each  line  in the order in which they are defined, except that all
        the -e patterns are tried before the -f patterns.

-       By  default, as soon as one pattern matches a line, no further patterns
+       By default, as soon as one pattern matches a line, no further  patterns
        are considered. However, if --colour (or --color) is used to colour the
-       matching  substrings, or if --only-matching, --file-offsets, or --line-
-       offsets is used to output only  the  part  of  the  line  that  matched
+       matching substrings, or if --only-matching, --file-offsets, or  --line-
+       offsets  is  used  to  output  only  the  part of the line that matched
        (either shown literally, or as an offset), scanning resumes immediately
-       following the match, so that further matches on the same  line  can  be
-       found.  If  there  are  multiple  patterns,  they  are all tried on the
-       remainder of the line, but patterns that follow the  one  that  matched
+       following  the  match,  so that further matches on the same line can be
+       found. If there are multiple  patterns,  they  are  all  tried  on  the
+       remainder  of  the  line, but patterns that follow the one that matched
        are not tried on the earlier part of the line.

-       This  behaviour  means  that  the  order in which multiple patterns are
-       specified can affect the output when one of the above options is  used.
-       This  is no longer the same behaviour as GNU grep, which now manages to
-       display earlier matches for later patterns (as  long  as  there  is  no
+       This behaviour means that the order  in  which  multiple  patterns  are
+       specified  can affect the output when one of the above options is used.
+       This is no longer the same behaviour as GNU grep, which now manages  to
+       display  earlier  matches  for  later  patterns (as long as there is no
        overlap).

-       Patterns  that can match an empty string are accepted, but empty string
+       Patterns that can match an empty string are accepted, but empty  string
        matches   are   never   recognized.   An   example   is   the   pattern
-       "(super)?(man)?",  in  which  all components are optional. This pattern
-       finds all occurrences of both "super" and  "man";  the  output  differs
-       from  matching  with  "super|man" when only the matching substrings are
+       "(super)?(man)?", in which all components are  optional.  This  pattern
+       finds  all  occurrences  of  both "super" and "man"; the output differs
+       from matching with "super|man" when only the  matching  substrings  are
        being shown.

-       If the LC_ALL or LC_CTYPE environment variable is set,  pcre2grep  uses
+       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
        the value to set a locale when calling the PCRE2 library.  The --locale
        option can be used to override this.

@@ -93,47 +102,48 @@

SUPPORT FOR COMPRESSED FILES

-       It is possible to compile pcre2grep so that it uses libz or  libbz2  to
-       read  files  whose names end in .gz or .bz2, respectively. You can find
+       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
+       read files whose names end in .gz or .bz2, respectively. You  can  find
        out whether your binary has support for one or both of these file types
        by running it with the --help option. If the appropriate support is not
-       present, files are treated as plain text. The standard input is  always
+       present,  files are treated as plain text. The standard input is always
        so treated.

BINARY FILES

-       By  default,  a  file that contains a binary zero byte within the first
-       1024 bytes is identified as a binary file, and is processed  specially.
-       (GNU  grep  also  identifies  binary  files  in  this  manner.) See the
-       --binary-files option for a means of changing the way binary files  are
+       By default, a file that contains a binary zero byte  within  the  first
+       1024  bytes is identified as a binary file, and is processed specially.
+       (GNU grep also  identifies  binary  files  in  this  manner.)  See  the
+       --binary-files  option for a means of changing the way binary files are
        handled.

OPTIONS

-       The  order  in  which some of the options appear can affect the output.
-       For example, both the -h and -l options affect  the  printing  of  file
-       names.  Whichever  comes later in the command line will be the one that
-       takes effect. Similarly, except where noted  below,  if  an  option  is
-       given  twice,  the  later setting is used. Numerical values for options
-       may be followed by K  or  M,  to  signify  multiplication  by  1024  or
+       The order in which some of the options appear can  affect  the  output.
+       For  example,  both  the  -h and -l options affect the printing of file
+       names. Whichever comes later in the command line will be the  one  that
+       takes  effect.  Similarly,  except  where  noted below, if an option is
+       given twice, the later setting is used. Numerical  values  for  options
+       may  be  followed  by  K  or  M,  to  signify multiplication by 1024 or
        1024*1024 respectively.

        --        This terminates the list of options. It is useful if the next
-                 item on the command line starts with a hyphen but is  not  an
-                 option.  This  allows for the processing of patterns and file
+                 item  on  the command line starts with a hyphen but is not an
+                 option. This allows for the processing of patterns  and  file
                  names that start with hyphens.

        -A number, --after-context=number
-                 Output number lines of context after each matching  line.  If
-                 file  names  and/or  line  numbers are being output, a hyphen
-                 separator is used instead of a colon for the context lines. A
-                 line  containing  "--" is output between each group of lines,
-                 unless they are in fact contiguous in  the  input  file.  The
-                 value  of number is expected to be relatively small. However,
-                 pcre2grep guarantees to have  up  to  8K  of  following  text
-                 available for context output.
+                 Output  up  to  number  lines  of context after each matching
+                 line. Fewer lines are output if the next match or the end  of
+                 the  file  is  reached,  or if the processing buffer size has
+                 been set too small. If file names  and/or  line  numbers  are
+                 being  output,  a hyphen separator is used instead of a colon
+                 for the context lines.  A  line  containing  "--"  is  output
+                 between each group of lines, unless they are in fact contigu-
+                 ous in the input file. The value of number is expected to  be
+                 relatively small. When -c is used, -A is ignored.

        -a, --text
                  Treat  binary  files as text. This is equivalent to --binary-
@@ -140,14 +150,16 @@
                  files=text.

        -B number, --before-context=number
-                 Output number lines of context before each matching line.  If
-                 file  names  and/or  line  numbers are being output, a hyphen
-                 separator is used instead of a colon for the context lines. A
-                 line  containing  "--" is output between each group of lines,
-                 unless they are in fact contiguous in  the  input  file.  The
-                 value  of number is expected to be relatively small. However,
-                 pcre2grep guarantees to have  up  to  8K  of  preceding  text
-                 available for context output.
+                 Output up to number lines of  context  before  each  matching
+                 line.  Fewer  lines  are  output if the previous match or the
+                 start of the file is within number lines, or if the  process-
+                 ing  buffer size has been set too small. If file names and/or
+                 line numbers are being output, a  hyphen  separator  is  used
+                 instead  of  a colon for the context lines. A line containing
+                 "--" is output between each group of lines, unless  they  are
+                 in  fact contiguous in the input file. The value of number is
+                 expected to be relatively small.  When  -c  is  used,  -B  is
+                 ignored.

        --binary-files=word
                  Specify  how binary files are to be processed. If the word is
@@ -164,54 +176,58 @@
                  any output or affecting the return code.

        --buffer-size=number
-                 Set the parameter that controls how much memory is  used  for
-                 buffering files that are being scanned.
+                 Set the parameter that controls how much memory  is  obtained
+                 at the start of processing for buffering files that are being
+                 scanned. See also --max-buffer-size below.

        -C number, --context=number
-                 Output  number  lines  of  context both before and after each
-                 matching line.  This is equivalent to setting both -A and  -B
+                 Output number lines of context both  before  and  after  each
+                 matching  line.  This is equivalent to setting both -A and -B
                  to the same value.

        -c, --count
-                 Do  not  output  lines from the files that are being scanned;
-                 instead output the number of matches (or non-matches if -v is
-                 used)  that would otherwise have caused lines to be shown. By
-                 default, this count is the same as the number  of  suppressed
-                 lines, but if the -M (multiline) option is used (without -v),
-                 there may  be  more  suppressed  lines  than  the  number  of
-                 matches.
+                 Do not output lines from the files that  are  being  scanned;
+                 instead  output  the  number  of  lines  that would have been
+                 shown, either because they matched, or, if -v is set, because
+                 they  failed  to match. By default, this count is exactly the
+                 same as the number of lines that would have been output,  but
+                 if  the -M (multiline) option is used (without -v), there may
+                 be more suppressed lines than the count (that is, the  number
+                 of matches).

                  If  no lines are selected, the number zero is output. If sev-
                  eral files are are being scanned, a count is output for  each
-                 of  them. However, if the --files-with-matches option is also
-                 used, only those files whose counts are greater than zero are
-                 listed.  When  -c  is  used,  the  -A, -B, and -C options are
-                 ignored.
+                 of  them and the -t option can be used to cause a total to be
+                 output at  the  end.  However,  if  the  --files-with-matches
+                 option  is  also  used,  only  those  files  whose counts are
+                 greater than zero are listed. When -c is used,  the  -A,  -B,
+                 and -C options are ignored.

        --colour, --color
                  If this option is given without any data, it is equivalent to
-                 "--colour=auto".   If  data  is required, it must be given in
+                 "--colour=auto".  If data is required, it must  be  given  in
                  the same shell item, separated by an equals sign.

        --colour=value, --color=value
                  This option specifies under what circumstances the parts of a
                  line that matched a pattern should be coloured in the output.
-                 By default, the output is not coloured. The value  (which  is
-                 optional,  see above) may be "never", "always", or "auto". In
-                 the latter case, colouring happens only if the standard  out-
-                 put  is connected to a terminal. More resources are used when
+                 By  default,  the output is not coloured. The value (which is
+                 optional, see above) may be "never", "always", or "auto".  In
+                 the  latter case, colouring happens only if the standard out-
+                 put is connected to a terminal. More resources are used  when
                  colouring is enabled, because pcre2grep has to search for all
-                 possible  matches in a line, not just one, in order to colour
+                 possible matches in a line, not just one, in order to  colour
                  them all.

                  The colour that is used can be specified by setting the envi-
-                 ronment  variable  PCRE2GREP_COLOUR  or  PCRE2GREP_COLOR. The
-                 value of this variable should be a  string  of  two  numbers,
-                 separated  by  a semicolon. They are copied directly into the
-                 control string for setting colour on a  terminal,  so  it  is
-                 your  responsibility  to ensure that they make sense. If nei-
-                 ther of the environment variables  is  set,  the  default  is
-                 "1;31", which gives red.
+                 ronment variable PCRE2GREP_COLOUR or PCRE2GREP_COLOR. If nei-
+                 ther  of  these  are  set, pcre2grep looks for GREP_COLOUR or
+                 GREP_COLOR. The value of the variable should be a  string  of
+                 two  numbers,  separated  by  a  semicolon.  They  are copied
+                 directly into the control string for setting colour on a ter-
+                 minal,  so it is your responsibility to ensure that they make
+                 sense. If neither of the environment variables  is  set,  the
+                 default is "1;31", which gives red.

        -D action, --devices=action
                  If  an  input  path  is  not  a  regular file or a directory,
@@ -299,12 +315,12 @@
                  Read patterns from the file, one per  line,  and  match  them
                  against  each  line of input. What constitutes a newline when
                  reading the file  is  the  operating  system's  default.  The
-                 --newline option has no effect on this option. Trailing white
-                 space is removed from each line, and blank lines are ignored.
-                 An  empty  file  contains  no  patterns and therefore matches
-                 nothing. See also the comments about multiple patterns versus
-                 a  single  pattern with alternatives in the description of -e
-                 above.
+                 --newline  option  has  no  effect  on this option.  Trailing
+                 white space is removed from each line, and  blank  lines  are
+                 ignored.  An  empty  file  contains no patterns and therefore
+                 matches nothing. See also the comments  about  multiple  pat-
+                 terns  versus  a  single  pattern  with  alternatives  in the
+                 description of -e above.

                  If this option is given more than  once,  all  the  specified
                  files  are read. A data line is output if any of the patterns
@@ -482,102 +498,101 @@
                  tings are specified when the PCRE2 library is compiled,  with
                  the default default being 10 million.

+       --max-buffer-size=number
+                 This  limits  the  expansion  of the processing buffer, whose
+                 initial size can be set by --buffer-size. The maximum  buffer
+                 size  is  silently  forced to be no smaller than the starting
+                 buffer size.
+
        -M, --multiline
-                 Allow  patterns to match more than one line. When this option
-                 is given, patterns may usefully contain literal newline char-
-                 acters  and  internal  occurrences of ^ and $ characters. The
-                 output for a successful match may consist of  more  than  one
-                 line.  The  first is the line in which the match started, and
-                 the last is the line in which the match ended. If the matched
-                 string  ends  with  a newline sequence the output ends at the
-                 end of that line.
+                 Allow patterns to match more than one line. When this  option
+                 is set, the PCRE2 library is called in "multiline" mode. This
+                 allows a matched string to extend past the end of a line  and
+                 continue  on one or more subsequent lines. Patterns used with
+                 -M may usefully contain literal newline characters and inter-
+                 nal  occurrences of ^ and $ characters. The output for a suc-
+                 cessful match may consist of more than one  line.  The  first
+                 line  is  the  line  in which the match started, and the last
+                 line is the line in which the match  ended.  If  the  matched
+                 string  ends  with a newline sequence, the output ends at the
+                 end of that line.  If -v is set,  none  of  the  lines  in  a
+                 multi-line  match  are output. Once a match has been handled,
+                 scanning restarts at the beginning of the line after the  one
+                 in which the match ended.

-                 When this option is set, the PCRE2 library is called in "mul-
-                 tiline" mode. This allows a matched string to extend past the
-                 end of a line and continue on one or more  subsequent  lines.
-                 However,  pcre2grep  still  processes the input line by line.
-                 Once a match has  been  handled,  scanning  restarts  at  the
-                 beginning  of  the  next line, just as it does when -M is not
-                 present. This means that it is possible  for  the  second  or
-                 subsequent  lines  in a multiline match to be output again as
-                 part of another match.
-
-                 The newline sequence that separates multiple  lines  must  be
-                 matched  as  part  of  the  pattern. For example, to find the
-                 phrase "regular expression" in a file where  "regular"  might
-                 be  at the end of a line and "expression" at the start of the
+                 The  newline  sequence  that separates multiple lines must be
+                 matched as part of the pattern.  For  example,  to  find  the
+                 phrase  "regular  expression" in a file where "regular" might
+                 be at the end of a line and "expression" at the start of  the
                  next line, you could use this command:

                    pcre2grep -M 'regular\s+expression' <file>

-                 The \s escape sequence matches  any  white  space  character,
-                 including  newlines,  and  is  followed  by  + so as to match
-                 trailing white space on the first line as  well  as  possibly
+                 The  \s  escape  sequence  matches any white space character,
+                 including newlines, and is followed  by  +  so  as  to  match
+                 trailing  white  space  on the first line as well as possibly
                  handling a two-character newline sequence.

-                 There  is a limit to the number of lines that can be matched,
-                 imposed by the way that pcre2grep buffers the input  file  as
-                 it  scans  it.  However,  pcre2grep  ensures that at least 8K
-                 characters or the rest of the file (whichever is the shorter)
-                 are  available for forward matching, and similarly the previ-
-                 ous 8K characters (or all the previous characters,  if  fewer
-                 than 8K) are guaranteed to be available for lookbehind asser-
-                 tions. The -M option does not work when input is read line by
-                 line (see --line-buffered.)
+                 There is a limit to the number of lines that can be  matched,
+                 imposed  by  the way that pcre2grep buffers the input file as
+                 it scans it. With a  sufficiently  large  processing  buffer,
+                 this should not be a problem, but the -M option does not work
+                 when input is read line by line (see --line-buffered.)

        -N newline-type, --newline=newline-type
-                 The  PCRE2  library  supports  five different conventions for
-                 indicating the ends of lines. They are  the  single-character
-                 sequences  CR  (carriage  return) and LF (linefeed), the two-
-                 character sequence CRLF, an "anycrlf" convention, which  rec-
-                 ognizes  any  of the preceding three types, and an "any" con-
+                 The PCRE2 library supports  five  different  conventions  for
+                 indicating  the  ends of lines. They are the single-character
+                 sequences CR (carriage return) and LF  (linefeed),  the  two-
+                 character  sequence CRLF, an "anycrlf" convention, which rec-
+                 ognizes any of the preceding three types, and an  "any"  con-
                  vention, in which any Unicode line ending sequence is assumed
-                 to  end a line. The Unicode sequences are the three just men-
-                 tioned, plus  VT  (vertical  tab,  U+000B),  FF  (form  feed,
-                 U+000C),   NEL  (next  line,  U+0085),  LS  (line  separator,
+                 to end a line. The Unicode sequences are the three just  men-
+                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF  (form feed,
+                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
                  U+2028), and PS (paragraph separator, U+2029).

-                 When the  PCRE2  library  is  built,  a  default  line-ending
-                 sequence   is  specified.   This  is  normally  the  standard
+                 When  the  PCRE2  library  is  built,  a  default line-ending
+                 sequence  is  specified.   This  is  normally  the   standard
                  sequence for the operating system. Unless otherwise specified
-                 by  this  option,  pcre2grep uses the library's default.  The
+                 by this option, pcre2grep uses the  library's  default.   The
                  possible values for this option are CR, LF, CRLF, ANYCRLF, or
-                 ANY.  This  makes  it possible to use pcre2grep to scan files
+                 ANY. This makes it possible to use pcre2grep  to  scan  files
                  that have come from other environments without having to mod-
-                 ify  their  line  endings.  If the data that is being scanned
-                 does not agree  with  the  convention  set  by  this  option,
-                 pcre2grep  may  behave in strange ways. Note that this option
-                 does not apply to files specified by the -f,  --exclude-from,
-                 or  --include-from  options,  which  are  expected to use the
+                 ify their line endings. If the data  that  is  being  scanned
+                 does  not  agree  with  the  convention  set  by this option,
+                 pcre2grep may behave in strange ways. Note that  this  option
+                 does  not apply to files specified by the -f, --exclude-from,
+                 or --include-from options, which  are  expected  to  use  the
                  operating system's standard newline sequence.

        -n, --line-number
                  Precede each output line by its line number in the file, fol-
-                 lowed  by  a colon for matching lines or a hyphen for context
+                 lowed by a colon for matching lines or a hyphen  for  context
                  lines. If the file name is also being output, it precedes the
-                 line  number.  When  the  -M option causes a pattern to match
-                 more than one line, only the first is preceded  by  its  line
+                 line number. When the -M option causes  a  pattern  to  match
+                 more  than  one  line, only the first is preceded by its line
                  number. This option is forced if --line-offsets is used.

-       --no-jit  If  the  PCRE2 library is built with support for just-in-time
+       --no-jit  If the PCRE2 library is built with support  for  just-in-time
                  compiling (which speeds up matching), pcre2grep automatically
                  makes use of this, unless it was explicitly disabled at build
-                 time. This option can be used to disable the use  of  JIT  at
-                 run  time. It is provided for testing and working round prob-
+                 time.  This  option  can be used to disable the use of JIT at
+                 run time. It is provided for testing and working round  prob-
                  lems.  It should never be needed in normal use.

        -o, --only-matching
                  Show only the part of the line that matched a pattern instead
-                 of  the  whole  line. In this mode, no context is shown. That
-                 is, the -A, -B, and -C options are ignored. If there is  more
-                 than  one  match in a line, each of them is shown separately.
-                 If -o is combined with -v (invert the sense of the  match  to
-                 find  non-matching  lines),  no  output is generated, but the
-                 return code is set appropriately. If the matched  portion  of
-                 the  line is empty, nothing is output unless the file name or
-                 line number are being printed, in which case they  are  shown
-                 on an otherwise empty line. This option is mutually exclusive
-                 with --file-offsets and --line-offsets.
+                 of the whole line. In this mode, no context  is  shown.  That
+                 is,  the -A, -B, and -C options are ignored. If there is more
+                 than one match in a line, each of them is  shown  separately,
+                 on  a  separate  line  of  output.  If -o is combined with -v
+                 (invert the sense of the match to find  non-matching  lines),
+                 no  output is generated, but the return code is set appropri-
+                 ately. If the matched portion of the line is  empty,  nothing
+                 is  output  unless  the  file  name  or line number are being
+                 printed, in which case they are shown on an  otherwise  empty
+                 line.  This  option is mutually exclusive with --file-offsets
+                 and --line-offsets.

        -onumber, --only-matching=number
                  Show only the part of the line  that  matched  the  capturing
@@ -593,27 +608,28 @@
                  put.

                  If this option is given multiple times,  multiple  substrings
-                 are  output, in the order the options are given. For example,
-                 -o3 -o1 -o3 causes the substrings matched by capturing paren-
-                 theses  3  and  1  and then 3 again to be output. By default,
-                 there is no separator (but see the next option).
+                 are  output  for  each  match,  in  the order the options are
+                 given, and all on one line. For example, -o3 -o1  -o3  causes
+                 the  substrings  matched by capturing parentheses 3 and 1 and
+                 then 3 again to be output. By default, there is no  separator
+                 (but see the next option).

        --om-separator=text
-                 Specify a separating string for multiple occurrences  of  -o.
-                 The  default is an empty string. Separating strings are never
+                 Specify  a  separating string for multiple occurrences of -o.
+                 The default is an empty string. Separating strings are  never
                  coloured.

        -q, --quiet
                  Work quietly, that is, display nothing except error messages.
-                 The  exit  status  indicates  whether or not any matches were
+                 The exit status indicates whether or  not  any  matches  were
                  found.

        -r, --recursive
-                 If any given path is a directory, recursively scan the  files
-                 it  contains, taking note of any --include and --exclude set-
-                 tings. By default, a directory is read as a normal  file;  in
-                 some  operating  systems this gives an immediate end-of-file.
-                 This option is a shorthand  for  setting  the  -d  option  to
+                 If  any given path is a directory, recursively scan the files
+                 it contains, taking note of any --include and --exclude  set-
+                 tings.  By  default, a directory is read as a normal file; in
+                 some operating systems this gives an  immediate  end-of-file.
+                 This  option  is  a  shorthand  for  setting the -d option to
                  "recurse".

        --recursion-limit=number
@@ -620,38 +636,52 @@
                  See --match-limit above.

        -s, --no-messages
-                 Suppress  error  messages  about  non-existent  or unreadable
-                 files. Such files are quietly skipped.  However,  the  return
+                 Suppress error  messages  about  non-existent  or  unreadable
+                 files.  Such  files  are quietly skipped. However, the return
                  code is still 2, even if matches were found in other files.

+       -t, --total-count
+                 This option is useful when scanning more than  one  file.  If
+                 used  on its own, -t suppresses all output except for a grand
+                 total number of matching lines (or non-matching lines  if  -v
+                 is  used)  in  all  the files. If -t is used with -c, a grand
+                 total is output except when the previous output is  just  one
+                 line.  In  other words, it is not output when just one file's
+                 count is listed. If file names are being  output,  the  grand
+                 total  is preceded by "TOTAL:". Otherwise, it appears as just
+                 another number. The -t option is ignored when  used  with  -L
+                 (list  files  without matches), because the grand total would
+                 always be zero.
+
        -u, --utf-8
                  Operate in UTF-8 mode. This option is available only if PCRE2
                  has been compiled with UTF-8 support. All patterns (including
-                 those  for  any --exclude and --include options) and all sub-
-                 ject lines that are scanned must be valid  strings  of  UTF-8
+                 those for any --exclude and --include options) and  all  sub-
+                 ject  lines  that  are scanned must be valid strings of UTF-8
                  characters.

        -V, --version
-                 Write  the version numbers of pcre2grep and the PCRE2 library
-                 to the standard output and then exit. Anything  else  on  the
+                 Write the version numbers of pcre2grep and the PCRE2  library
+                 to  the  standard  output and then exit. Anything else on the
                  command line is ignored.

        -v, --invert-match
-                 Invert  the  sense  of  the match, so that lines which do not
+                 Invert the sense of the match, so that  lines  which  do  not
                  match any of the patterns are the ones that are found.

        -w, --word-regex, --word-regexp
                  Force the patterns to match only whole words. This is equiva-
-                 lent  to  having \b at the start and end of the pattern. This
-                 option applies only to the patterns that are matched  against
-                 the  contents  of files; it does not apply to patterns speci-
+                 lent to having \b at the start and end of the  pattern.  This
+                 option  applies only to the patterns that are matched against
+                 the contents of files; it does not apply to  patterns  speci-
                  fied by any of the --include or --exclude options.

        -x, --line-regex, --line-regexp
-                 Force the patterns to be anchored (each must  start  matching
-                 at  the beginning of a line) and in addition, require them to
-                 match entire lines. This is equivalent  to  having  ^  and  $
-                 characters at the start and end of each alternative top-level
+                 Force  the  patterns to be anchored (each must start matching
+                 at the beginning of a line) and in addition, require them  to
+                 match  entire  lines. In multiline mode the match may be more
+                 than one line. This is equivalent to having \A and \Z charac-
+                 ters  at  the  start  and  end  of each alternative top-level
                  branch in every pattern. This option applies only to the pat-
                  terns that are matched against the contents of files; it does
                  not apply to patterns specified by any of  the  --include  or
@@ -822,5 +852,5 @@

REVISION

-       Last updated: 19 June 2016
+       Last updated: 31 October 2016
        Copyright (c) 1997-2016 University of Cambridge.

Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/doc/pcre2test.txt    2016-11-22 15:37:02 UTC (rev 605)
@@ -558,6 +558,7 @@
              pushcopy                  push a copy onto the stack
              stackguard=<number>       test the stackguard feature
              tables=[0|1|2]            select internal tables
+             use_length                do not zero-terminate the pattern
              utf8_input                treat input as UTF-8

        The effects of these modifiers are described in the following sections.
@@ -631,6 +632,16 @@
        testing  that  pcre2_compile()  behaves correctly in this case (it uses
        default values).

+   Specifying the pattern's length
+
+       By default, patterns are passed to the compiling functions as zero-ter-
+       minated  strings.  When  using the POSIX wrapper API, there is no other
+       option. However, when using PCRE2's native API, patterns can be  passed
+       by  length  instead  of  being zero-terminated. The use_length modifier
+       causes this to happen.  Using a length happens  automatically  (whether
+       or  not  use_length is set) when hex is set, because patterns specified
+       in hexadecimal may contain binary zeros.
+
    Specifying pattern characters in hexadecimal

        The hex modifier specifies that the characters of the  pattern,  except
@@ -652,26 +663,27 @@
        ing the delimiter within a substring. The hex and expand modifiers  are
        mutually exclusive.

-       By  default,  pcre2test  passes  patterns as zero-terminated strings to
-       pcre2_compile(), giving the length as  PCRE2_ZERO_TERMINATED.  However,
-       for  patterns specified with the hex modifier, the actual length of the
-       pattern is passed.
+       The  POSIX  API  cannot  be used with patterns specified in hexadecimal
+       because they may contain binary zeros, which conflicts with regcomp()'s
+       requirement  for  a  zero-terminated  string.  Such patterns are always
+       passed to pcre2_compile() as a string with a length, not as zero-termi-
+       nated.

    Specifying wide characters in 16-bit and 32-bit modes

        In 16-bit and 32-bit modes, all input is automatically treated as UTF-8
-       and  translated  to  UTF-16 or UTF-32 when the utf modifier is set. For
+       and translated to UTF-16 or UTF-32 when the utf modifier  is  set.  For
        testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input
-       modifier  can  be  used. It is mutually exclusive with utf. Input lines
+       modifier can be used. It is mutually exclusive with  utf.  Input  lines
        are interpreted as UTF-8 as a means of specifying wide characters. More
        details are given in "Input encoding" above.

    Generating long repetitive patterns

-       Some  tests use long patterns that are very repetitive. Instead of cre-
-       ating a very long input line for such a pattern, you can use a  special
-       repetition  feature,  similar  to  the  one described for subject lines
-       above. If the expand modifier is present on a  pattern,  parts  of  the
+       Some tests use long patterns that are very repetitive. Instead of  cre-
+       ating  a very long input line for such a pattern, you can use a special
+       repetition feature, similar to the  one  described  for  subject  lines
+       above.  If  the  expand  modifier is present on a pattern, parts of the
        pattern that have the form

          \[<characters>]{<count>}
@@ -678,34 +690,34 @@

        are expanded before the pattern is passed to pcre2_compile(). For exam-
        ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
-       cannot  be  nested. An initial "\[" sequence is recognized only if "]{"
-       followed by decimal digits and "}" is found later in  the  pattern.  If
+       cannot be nested. An initial "\[" sequence is recognized only  if  "]{"
+       followed  by  decimal  digits and "}" is found later in the pattern. If
        not, the characters remain in the pattern unaltered. The expand and hex
        modifiers are mutually exclusive.

-       If part of an expanded pattern looks like an expansion, but  is  really
+       If  part  of an expanded pattern looks like an expansion, but is really
        part of the actual pattern, unwanted expansion can be avoided by giving
        two values in the quantifier. For example, \[AB]{6000,6000} is not rec-
        ognized as an expansion item.

-       If  the  info modifier is set on an expanded pattern, the result of the
+       If the info modifier is set on an expanded pattern, the result  of  the
        expansion is included in the information that is output.

    JIT compilation

-       Just-in-time (JIT) compiling is a  heavyweight  optimization  that  can
-       greatly  speed  up pattern matching. See the pcre2jit documentation for
-       details. JIT compiling happens, optionally, after a  pattern  has  been
-       successfully  compiled into an internal form. The JIT compiler converts
+       Just-in-time  (JIT)  compiling  is  a heavyweight optimization that can
+       greatly speed up pattern matching. See the pcre2jit  documentation  for
+       details.  JIT  compiling  happens, optionally, after a pattern has been
+       successfully compiled into an internal form. The JIT compiler  converts
        this to optimized machine code. It needs to know whether the match-time
        options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used,
-       because different code is generated for the different  cases.  See  the
-       partial  modifier in "Subject Modifiers" below for details of how these
+       because  different  code  is generated for the different cases. See the
+       partial modifier in "Subject Modifiers" below for details of how  these
        options are specified for each match attempt.

-       JIT compilation is requested by the /jit pattern  modifier,  which  may
+       JIT  compilation  is  requested by the /jit pattern modifier, which may
        optionally be followed by an equals sign and a number in the range 0 to
-       7.  The three bits that make up the number specify which of  the  three
+       7.   The  three bits that make up the number specify which of the three
        JIT operating modes are to be compiled:

          1  compile JIT code for non-partial matching
@@ -722,31 +734,31 @@
          6  soft and hard partial matching only
          7  all three modes

-       If  no  number  is  given,  7 is assumed. The phrase "partial matching"
+       If no number is given, 7 is  assumed.  The  phrase  "partial  matching"
        means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the
-       PCRE2_PARTIAL_HARD  option set. Note that such a call may return a com-
+       PCRE2_PARTIAL_HARD option set. Note that such a call may return a  com-
        plete match; the options enable the possibility of a partial match, but
-       do  not  require it. Note also that if you request JIT compilation only
-       for partial matching (for example, /jit=2) but do not set  the  partial
-       modifier  on  a  subject line, that match will not use JIT code because
+       do not require it. Note also that if you request JIT  compilation  only
+       for  partial  matching (for example, /jit=2) but do not set the partial
+       modifier on a subject line, that match will not use  JIT  code  because
        none was compiled for non-partial matching.

-       If JIT compilation is successful, the compiled JIT code will  automati-
-       cally  be  used  when  an appropriate type of match is run, except when
-       incompatible run-time options are specified. For more details, see  the
-       pcre2jit  documentation. See also the jitstack modifier below for a way
+       If  JIT compilation is successful, the compiled JIT code will automati-
+       cally be used when an appropriate type of match  is  run,  except  when
+       incompatible  run-time options are specified. For more details, see the
+       pcre2jit documentation. See also the jitstack modifier below for a  way
        of setting the size of the JIT stack.

-       If the jitfast modifier is specified, matching is done  using  the  JIT
-       "fast  path" interface, pcre2_jit_match(), which skips some of the san-
-       ity checks that are done by pcre2_match(), and of course does not  work
-       when  JIT  is not supported. If jitfast is specified without jit, jit=7
+       If  the  jitfast  modifier is specified, matching is done using the JIT
+       "fast path" interface, pcre2_jit_match(), which skips some of the  san-
+       ity  checks that are done by pcre2_match(), and of course does not work
+       when JIT is not supported. If jitfast is specified without  jit,  jit=7
        is assumed.

-       If the jitverify modifier is specified, information about the  compiled
-       pattern  shows  whether  JIT  compilation was or was not successful. If
-       jitverify is specified without jit, jit=7 is assumed. If  JIT  compila-
-       tion  is successful when jitverify is set, the text "(JIT)" is added to
+       If  the jitverify modifier is specified, information about the compiled
+       pattern shows whether JIT compilation was or  was  not  successful.  If
+       jitverify  is  specified without jit, jit=7 is assumed. If JIT compila-
+       tion is successful when jitverify is set, the text "(JIT)" is added  to
        the first output line after a match or non match when JIT-compiled code
        was actually used in the match.

@@ -757,18 +769,18 @@
          /pattern/locale=fr_FR

        The given locale is set, pcre2_maketables() is called to build a set of
-       character tables for the locale, and this is then passed to  pcre2_com-
-       pile()  when compiling the regular expression. The same tables are used
+       character  tables for the locale, and this is then passed to pcre2_com-
+       pile() when compiling the regular expression. The same tables are  used
        when matching the following subject lines. The /locale modifier applies
        only to the pattern on which it appears, but can be given in a #pattern
-       command if a default is needed. Setting a locale and alternate  charac-
+       command  if a default is needed. Setting a locale and alternate charac-
        ter tables are mutually exclusive.

    Showing pattern memory

-       The  /memory  modifier  causes  the size in bytes of the memory used to
-       hold the compiled pattern to be output. This does not include the  size
-       of  the  pcre2_code  block; it is just the actual compiled data. If the
+       The /memory modifier causes the size in bytes of  the  memory  used  to
+       hold  the compiled pattern to be output. This does not include the size
+       of the pcre2_code block; it is just the actual compiled  data.  If  the
        pattern is subsequently passed to the JIT compiler, the size of the JIT
        compiled code is also output. Here is an example:

@@ -779,27 +791,27 @@

    Limiting nested parentheses

-       The  parens_nest_limit  modifier  sets  a  limit on the depth of nested
-       parentheses in a pattern. Breaching  the  limit  causes  a  compilation
-       error.   The  default  for  the library is set when PCRE2 is built, but
-       pcre2test sets its own default of 220, which is  required  for  running
+       The parens_nest_limit modifier sets a limit  on  the  depth  of  nested
+       parentheses  in  a  pattern.  Breaching  the limit causes a compilation
+       error.  The default for the library is set when  PCRE2  is  built,  but
+       pcre2test  sets  its  own default of 220, which is required for running
        the standard test suite.

    Limiting the pattern length

-       The  max_pattern_length  modifier  sets  a limit, in code units, to the
+       The max_pattern_length modifier sets a limit, in  code  units,  to  the
        length of pattern that pcre2_compile() will accept. Breaching the limit
-       causes  a  compilation  error.  The  default  is  the  largest number a
+       causes a compilation  error.  The  default  is  the  largest  number  a
        PCRE2_SIZE variable can hold (essentially unlimited).

    Using the POSIX wrapper API

-       The /posix and posix_nosub modifiers cause pcre2test to call PCRE2  via
-       the  POSIX  wrapper API rather than its native API. When posix_nosub is
-       used, the POSIX option REG_NOSUB is  passed  to  regcomp().  The  POSIX
-       wrapper  supports  only  the 8-bit library. Note that it does not imply
+       The  /posix and posix_nosub modifiers cause pcre2test to call PCRE2 via
+       the POSIX wrapper API rather than its native API. When  posix_nosub  is
+       used,  the  POSIX  option  REG_NOSUB  is passed to regcomp(). The POSIX
+       wrapper supports only the 8-bit library. Note that it  does  not  imply
        POSIX matching semantics; for more detail see the pcre2posix documenta-
-       tion.  The  following  pattern  modifiers set options for the regcomp()
+       tion. The following pattern modifiers set  options  for  the  regcomp()
        function:

          caseless           REG_ICASE
@@ -809,35 +821,35 @@
          ucp                REG_UCP        )   the POSIX standard
          utf                REG_UTF8       )

-       The regerror_buffsize modifier specifies a size for  the  error  buffer
-       that  is  passed to regerror() in the event of a compilation error. For
+       The  regerror_buffsize  modifier  specifies a size for the error buffer
+       that is passed to regerror() in the event of a compilation  error.  For
        example:

          /abc/posix,regerror_buffsize=20

-       This provides a means of testing the behaviour of regerror()  when  the
-       buffer  is  too  small  for the error message. If this modifier has not
+       This  provides  a means of testing the behaviour of regerror() when the
+       buffer is too small for the error message. If  this  modifier  has  not
        been set, a large buffer is used.

-       The aftertext and allaftertext  subject  modifiers  work  as  described
-       below.  All other modifiers are either ignored, with a warning message,
+       The  aftertext  and  allaftertext  subject  modifiers work as described
+       below. All other modifiers are either ignored, with a warning  message,
        or cause an error.

    Testing the stack guard feature

-       The /stackguard modifier is used to  test  the  use  of  pcre2_set_com-
-       pile_recursion_guard(),  a  function  that  is provided to enable stack
-       availability to be checked during compilation (see the  pcre2api  docu-
-       mentation  for  details).  If  the  number specified by the modifier is
+       The  /stackguard  modifier  is  used  to test the use of pcre2_set_com-
+       pile_recursion_guard(), a function that is  provided  to  enable  stack
+       availability  to  be checked during compilation (see the pcre2api docu-
+       mentation for details). If the number  specified  by  the  modifier  is
        greater than zero, pcre2_set_compile_recursion_guard() is called to set
-       up  callback  from pcre2_compile() to a local function. The argument it
-       receives is the current nesting parenthesis depth; if this  is  greater
+       up callback from pcre2_compile() to a local function. The  argument  it
+       receives  is  the current nesting parenthesis depth; if this is greater
        than the value given by the modifier, non-zero is returned, causing the
        compilation to be aborted.

    Using alternative character tables

-       The value specified for the /tables modifier must be one of the  digits
+       The  value specified for the /tables modifier must be one of the digits
        0, 1, or 2. It causes a specific set of built-in character tables to be
        passed to pcre2_compile(). This is used in the PCRE2 tests to check be-
        haviour with different character tables. The digit specifies the tables
@@ -848,15 +860,15 @@
                pcre2_chartables.c.dist
          2   a set of tables defining ISO 8859 characters

-       In table 2, some characters whose codes are greater than 128 are  iden-
-       tified  as  letters,  digits,  spaces, etc. Setting alternate character
+       In  table 2, some characters whose codes are greater than 128 are iden-
+       tified as letters, digits, spaces,  etc.  Setting  alternate  character
        tables and a locale are mutually exclusive.

    Setting certain match controls

        The following modifiers are really subject modifiers, and are described
-       below.   However, they may be included in a pattern's modifier list, in
-       which case they are applied to every subject  line  that  is  processed
+       below.  However, they may be included in a pattern's modifier list,  in
+       which  case  they  are  applied to every subject line that is processed
        with that pattern. They may not appear in #pattern commands. These mod-
        ifiers do not affect the compilation process.

@@ -873,24 +885,24 @@
              substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
              substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY

-       These modifiers may not appear in a #pattern command. If you want  them
+       These  modifiers may not appear in a #pattern command. If you want them
        as defaults, set them in a #subject command.

    Saving a compiled pattern

-       When  a  pattern with the push modifier is successfully compiled, it is
-       pushed onto a stack of compiled patterns,  and  pcre2test  expects  the
-       next  line to contain a new pattern (or a command) instead of a subject
+       When a pattern with the push modifier is successfully compiled,  it  is
+       pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
+       next line to contain a new pattern (or a command) instead of a  subject
        line. This facility is used when saving compiled patterns to a file, as
-       described  in  the section entitled "Saving and restoring compiled pat-
-       terns" below. If pushcopy is used instead of push, a copy of  the  com-
-       piled  pattern  is  stacked,  leaving the original as current, ready to
-       match the following input lines. This provides a  way  of  testing  the
-       pcre2_code_copy()  function.   The  push  and  pushcopy   modifiers are
-       incompatible with compilation modifiers such  as  global  that  act  at
-       match  time. Any that are specified are ignored (for the stacked copy),
+       described in the section entitled "Saving and restoring  compiled  pat-
+       terns"  below.  If pushcopy is used instead of push, a copy of the com-
+       piled pattern is stacked, leaving the original  as  current,  ready  to
+       match  the  following  input  lines. This provides a way of testing the
+       pcre2_code_copy() function.   The  push  and  pushcopy   modifiers  are
+       incompatible  with  compilation  modifiers  such  as global that act at
+       match time. Any that are specified are ignored (for the stacked  copy),
        with a warning message, except for replace, which causes an error. Note
-       that  jitverify, which is allowed, does not carry through to any subse-
+       that jitverify, which is allowed, does not carry through to any  subse-
        quent matching that uses a stacked pattern.

@@ -901,7 +913,7 @@

    Setting match options

-       The    following   modifiers   set   options   for   pcre2_match()   or
+       The   following   modifiers   set   options   for   pcre2_match()    or
        pcre2_dfa_match(). See pcreapi for a description of their effects.

              anchored                  set PCRE2_ANCHORED
@@ -916,20 +928,20 @@
              partial_hard (or ph)      set PCRE2_PARTIAL_HARD
              partial_soft (or ps)      set PCRE2_PARTIAL_SOFT

-       The partial matching modifiers are provided with abbreviations  because
+       The  partial matching modifiers are provided with abbreviations because
        they appear frequently in tests.

-       If  the  /posix  modifier was present on the pattern, causing the POSIX
+       If the /posix modifier was present on the pattern,  causing  the  POSIX
        wrapper API to be used, the only option-setting modifiers that have any
-       effect   are   notbol,   notempty,   and  noteol,  causing  REG_NOTBOL,
-       REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to  regexec().
+       effect  are  notbol,  notempty,   and   noteol,   causing   REG_NOTBOL,
+       REG_NOTEMPTY,  and REG_NOTEOL, respectively, to be passed to regexec().
        The other modifiers are ignored, with a warning message.

    Setting match controls

-       The  following  modifiers  affect the matching process or request addi-
-       tional information. Some of them may also be  specified  on  a  pattern
-       line  (see  above), in which case they apply to every subject line that
+       The following modifiers affect the matching process  or  request  addi-
+       tional  information.  Some  of  them may also be specified on a pattern
+       line (see above), in which case they apply to every subject  line  that
        is matched against that pattern.

              aftertext                  show text after match
@@ -966,29 +978,29 @@
              zero_terminate             pass the subject as zero-terminated

        The effects of these modifiers are described in the following sections.
-       When  matching  via the POSIX wrapper API, the aftertext, allaftertext,
-       and ovector subject modifiers work as described below. All other  modi-
+       When matching via the POSIX wrapper API, the  aftertext,  allaftertext,
+       and  ovector subject modifiers work as described below. All other modi-
        fiers are either ignored, with a warning message, or cause an error.

    Showing more text

-       The  aftertext modifier requests that as well as outputting the part of
+       The aftertext modifier requests that as well as outputting the part  of
        the subject string that matched the entire pattern, pcre2test should in
        addition output the remainder of the subject string. This is useful for
        tests where the subject contains multiple copies of the same substring.
-       The  allaftertext  modifier  requests the same action for captured sub-
+       The allaftertext modifier requests the same action  for  captured  sub-
        strings as well as the main matched substring. In each case the remain-
        der is output on the following line with a plus character following the
        capture number.

-       The allusedtext modifier requests that all the text that was  consulted
-       during  a  successful pattern match by the interpreter should be shown.
-       This feature is not supported for JIT matching, and if  requested  with
-       JIT  it  is  ignored  (with  a  warning message). Setting this modifier
+       The  allusedtext modifier requests that all the text that was consulted
+       during a successful pattern match by the interpreter should  be  shown.
+       This  feature  is not supported for JIT matching, and if requested with
+       JIT it is ignored (with  a  warning  message).  Setting  this  modifier
        affects the output if there is a lookbehind at the start of a match, or
-       a  lookahead  at  the  end, or if \K is used in the pattern. Characters
-       that precede or follow the start and end of the actual match are  indi-
-       cated  in  the output by '<' or '>' characters underneath them. Here is
+       a lookahead at the end, or if \K is used  in  the  pattern.  Characters
+       that  precede or follow the start and end of the actual match are indi-
+       cated in the output by '<' or '>' characters underneath them.  Here  is
        an example:

            re> /(?<=pqr)abc(?=xyz)/
@@ -996,16 +1008,16 @@
           0: pqrabcxyz
              <<<   >>>

-       This shows that the matched string is "abc",  with  the  preceding  and
-       following  strings  "pqr"  and  "xyz"  having been consulted during the
+       This  shows  that  the  matched string is "abc", with the preceding and
+       following strings "pqr" and "xyz"  having  been  consulted  during  the
        match (when processing the assertions).

-       The startchar modifier requests that the  starting  character  for  the
-       match  be  indicated,  if  it  is different to the start of the matched
+       The  startchar  modifier  requests  that the starting character for the
+       match be indicated, if it is different to  the  start  of  the  matched
        string. The only time when this occurs is when \K has been processed as
        part of the match. In this situation, the output for the matched string
-       is displayed from the starting character  instead  of  from  the  match
-       point,  with  circumflex  characters  under the earlier characters. For
+       is  displayed  from  the  starting  character instead of from the match
+       point, with circumflex characters under  the  earlier  characters.  For
        example:

            re> /abc\Kxyz/
@@ -1013,7 +1025,7 @@
           0: abcxyz
              ^^^

-       Unlike allusedtext, the startchar modifier can be used with JIT.   How-
+       Unlike  allusedtext, the startchar modifier can be used with JIT.  How-
        ever, these two modifiers are mutually exclusive.

    Showing the value of all capture groups
@@ -1021,90 +1033,90 @@
        The allcaptures modifier requests that the values of all potential cap-
        tured parentheses be output after a match. By default, only those up to
        the highest one actually used in the match are output (corresponding to
-       the return code from pcre2_match()). Groups that did not take  part  in
-       the  match  are  output as "<unset>". This modifier is not relevant for
-       DFA matching (which does no capturing); it is ignored, with  a  warning
+       the  return  code from pcre2_match()). Groups that did not take part in
+       the match are output as "<unset>". This modifier is  not  relevant  for
+       DFA  matching  (which does no capturing); it is ignored, with a warning
        message, if present.

    Testing callouts

-       A  callout function is supplied when pcre2test calls the library match-
-       ing functions, unless callout_none is specified. If callout_capture  is
+       A callout function is supplied when pcre2test calls the library  match-
+       ing  functions, unless callout_none is specified. If callout_capture is
        set, the current captured groups are output when a callout occurs.

-       The  callout_fail modifier can be given one or two numbers. If there is
+       The callout_fail modifier can be given one or two numbers. If there  is
        only one number, 1 is returned instead of 0 when a callout of that num-
-       ber  is  reached.  If two numbers are given, 1 is returned when callout
+       ber is reached. If two numbers are given, 1 is  returned  when  callout
        <n> is reached for the <m>th time. Note that callouts with string argu-
-       ments  are  always  given  the  number zero. See "Callouts" below for a
+       ments are always given the number zero.  See  "Callouts"  below  for  a
        description of the output when a callout it taken.

-       The callout_data modifier can be given an unsigned or a  negative  num-
-       ber.   This  is  set  as the "user data" that is passed to the matching
-       function, and passed back when the callout  function  is  invoked.  Any
-       value  other  than  zero  is  used as a return from pcre2test's callout
+       The  callout_data  modifier can be given an unsigned or a negative num-
+       ber.  This is set as the "user data" that is  passed  to  the  matching
+       function,  and  passed  back  when the callout function is invoked. Any
+       value other than zero is used as  a  return  from  pcre2test's  callout
        function.

    Finding all matches in a string

        Searching for all possible matches within a subject can be requested by
-       the  global or /altglobal modifier. After finding a match, the matching
-       function is called again to search the remainder of  the  subject.  The
-       difference  between  global  and  altglobal is that the former uses the
-       start_offset argument to pcre2_match() or  pcre2_dfa_match()  to  start
-       searching  at  a new point within the entire string (which is what Perl
+       the global or /altglobal modifier. After finding a match, the  matching
+       function  is  called  again to search the remainder of the subject. The
+       difference between global and altglobal is that  the  former  uses  the
+       start_offset  argument  to  pcre2_match() or pcre2_dfa_match() to start
+       searching at a new point within the entire string (which is  what  Perl
        does), whereas the latter passes over a shortened subject. This makes a
        difference to the matching process if the pattern begins with a lookbe-
        hind assertion (including \b or \B).

-       If an empty string  is  matched,  the  next  match  is  done  with  the
+       If  an  empty  string  is  matched,  the  next  match  is done with the
        PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
        for another, non-empty, match at the same point in the subject. If this
-       match  fails,  the  start  offset  is advanced, and the normal match is
-       retried. This imitates the way Perl handles such cases when  using  the
-       /g  modifier  or  the  split()  function. Normally, the start offset is
-       advanced by one character, but if  the  newline  convention  recognizes
-       CRLF  as  a newline, and the current character is CR followed by LF, an
+       match fails, the start offset is advanced,  and  the  normal  match  is
+       retried.  This  imitates the way Perl handles such cases when using the
+       /g modifier or the split() function.  Normally,  the  start  offset  is
+       advanced  by  one  character,  but if the newline convention recognizes
+       CRLF as a newline, and the current character is CR followed by  LF,  an
        advance of two characters occurs.

    Testing substring extraction functions

-       The copy  and  get  modifiers  can  be  used  to  test  the  pcre2_sub-
+       The  copy  and  get  modifiers  can  be  used  to  test  the pcre2_sub-
        string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
-       given more than once, and each can specify a group name or number,  for
+       given  more than once, and each can specify a group name or number, for
        example:

           abcd\=copy=1,copy=3,get=G1

-       If  the  #subject command is used to set default copy and/or get lists,
-       these can be unset by specifying a negative number to cancel  all  num-
+       If the #subject command is used to set default copy and/or  get  lists,
+       these  can  be unset by specifying a negative number to cancel all num-
        bered groups and an empty name to cancel all named groups.

-       The  getall  modifier  tests pcre2_substring_list_get(), which extracts
+       The getall modifier tests  pcre2_substring_list_get(),  which  extracts
        all captured substrings.

-       If the subject line is successfully matched, the  substrings  extracted
-       by  the  convenience  functions  are  output  with C, G, or L after the
-       string number instead of a colon. This is in  addition  to  the  normal
-       full  list.  The string length (that is, the return from the extraction
+       If  the  subject line is successfully matched, the substrings extracted
+       by the convenience functions are output with  C,  G,  or  L  after  the
+       string  number  instead  of  a colon. This is in addition to the normal
+       full list. The string length (that is, the return from  the  extraction
        function) is given in parentheses after each substring, followed by the
        name when the extraction was by name.

    Testing the substitution function

-       If  the  replace  modifier  is  set, the pcre2_substitute() function is
-       called instead of one of the matching functions. Note that  replacement
-       strings  cannot  contain commas, because a comma signifies the end of a
+       If the replace modifier is  set,  the  pcre2_substitute()  function  is
+       called  instead of one of the matching functions. Note that replacement
+       strings cannot contain commas, because a comma signifies the end  of  a
        modifier. This is not thought to be an issue in a test program.

-       Unlike subject strings, pcre2test does not process replacement  strings
-       for  escape  sequences. In UTF mode, a replacement string is checked to
-       see if it is a valid UTF-8 string. If so, it is correctly converted  to
-       a  UTF  string of the appropriate code unit width. If it is not a valid
-       UTF-8 string, the individual code units are copied directly. This  pro-
+       Unlike  subject strings, pcre2test does not process replacement strings
+       for escape sequences. In UTF mode, a replacement string is  checked  to
+       see  if it is a valid UTF-8 string. If so, it is correctly converted to
+       a UTF string of the appropriate code unit width. If it is not  a  valid
+       UTF-8  string, the individual code units are copied directly. This pro-
        vides a means of passing an invalid UTF-8 string for testing purposes.

-       The  following modifiers set options (in additional to the normal match
+       The following modifiers set options (in additional to the normal  match
        options) for pcre2_substitute():

          global                      PCRE2_SUBSTITUTE_GLOBAL
@@ -1114,8 +1126,8 @@
          substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY

-       After a successful substitution, the modified string  is  output,  pre-
-       ceded  by the number of replacements. This may be zero if there were no
+       After  a  successful  substitution, the modified string is output, pre-
+       ceded by the number of replacements. This may be zero if there were  no
        matches. Here is a simple example of a substitution test:

          /abc/replace=xxx
@@ -1124,12 +1136,12 @@
              =abc=abc=\=global
           2: =xxx=xxx=

-       Subject and replacement strings should be kept relatively short  (fewer
-       than  256 characters) for substitution tests, as fixed-size buffers are
-       used. To make it easy to test for buffer overflow, if  the  replacement
-       string  starts  with a number in square brackets, that number is passed
-       to pcre2_substitute() as the  size  of  the  output  buffer,  with  the
-       replacement  string  starting at the next character. Here is an example
+       Subject  and replacement strings should be kept relatively short (fewer
+       than 256 characters) for substitution tests, as fixed-size buffers  are
+       used.  To  make it easy to test for buffer overflow, if the replacement
+       string starts with a number in square brackets, that number  is  passed
+       to  pcre2_substitute()  as  the  size  of  the  output buffer, with the
+       replacement string starting at the next character. Here is  an  example
        that tests the edge case:

          /abc/
@@ -1138,11 +1150,11 @@
              123abc123\=replace=[9]XYZ
          Failed: error -47: no more memory

-       The   default   action   of    pcre2_substitute()    is    to    return
-       PCRE2_ERROR_NOMEMORY  when  the output buffer is too small. However, if
-       the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using  the  sub-
-       stitute_overflow_length  modifier),  pcre2_substitute() continues to go
-       through the motions of matching and substituting, in order  to  compute
+       The    default    action    of    pcre2_substitute()   is   to   return
+       PCRE2_ERROR_NOMEMORY when the output buffer is too small.  However,  if
+       the  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  option is set (by using the sub-
+       stitute_overflow_length modifier), pcre2_substitute() continues  to  go
+       through  the  motions of matching and substituting, in order to compute
        the size of buffer that is required. When this happens, pcre2test shows
        the required buffer length (which includes space for the trailing zero)
        as part of the error message. For example:
@@ -1152,13 +1164,13 @@
          Failed: error -47: no more memory: 10 code units are needed

        A replacement string is ignored with POSIX and DFA matching. Specifying
-       partial matching provokes an error return  ("bad  option  value")  from
+       partial  matching  provokes  an  error return ("bad option value") from
        pcre2_substitute().

    Setting the JIT stack size

-       The  jitstack modifier provides a way of setting the maximum stack size
-       that is used by the just-in-time optimization code. It  is  ignored  if
+       The jitstack modifier provides a way of setting the maximum stack  size
+       that  is  used  by the just-in-time optimization code. It is ignored if
        JIT optimization is not being used. The value is a number of kilobytes.
        Providing a stack that is larger than the default 32K is necessary only
        for very complicated patterns.
@@ -1165,29 +1177,29 @@

    Setting match and recursion limits

-       The  match_limit and recursion_limit modifiers set the appropriate lim-
+       The match_limit and recursion_limit modifiers set the appropriate  lim-
        its in the match context. These values are ignored when the find_limits
        modifier is specified.

    Finding minimum limits

-       If  the  find_limits modifier is present, pcre2test calls pcre2_match()
-       several times, setting  different  values  in  the  match  context  via
-       pcre2_set_match_limit()  and pcre2_set_recursion_limit() until it finds
-       the minimum values for each parameter that allow pcre2_match() to  com-
+       If the find_limits modifier is present, pcre2test  calls  pcre2_match()
+       several  times,  setting  different  values  in  the  match context via
+       pcre2_set_match_limit() and pcre2_set_recursion_limit() until it  finds
+       the  minimum values for each parameter that allow pcre2_match() to com-
        plete without error.

        If JIT is being used, only the match limit is relevant. If DFA matching
-       is being used, neither limit is relevant, and this modifier is  ignored
+       is  being used, neither limit is relevant, and this modifier is ignored
        (with a warning message).

-       The  match_limit number is a measure of the amount of backtracking that
-       takes place, and learning the minimum value  can  be  instructive.  For
-       most  simple  matches, the number is quite small, but for patterns with
-       very large numbers of matching possibilities, it can become large  very
-       quickly    with    increasing    length    of   subject   string.   The
-       match_limit_recursion number is a measure of how  much  stack  (or,  if
-       PCRE2  is  compiled with NO_RECURSE, how much heap) memory is needed to
+       The match_limit number is a measure of the amount of backtracking  that
+       takes  place,  and  learning  the minimum value can be instructive. For
+       most simple matches, the number is quite small, but for  patterns  with
+       very  large numbers of matching possibilities, it can become large very
+       quickly   with   increasing   length    of    subject    string.    The
+       match_limit_recursion  number  is  a  measure of how much stack (or, if
+       PCRE2 is compiled with NO_RECURSE, how much heap) memory is  needed  to
        complete the match attempt.

    Showing MARK names
@@ -1194,42 +1206,42 @@

        The mark modifier causes the names from backtracking control verbs that
-       are  returned from calls to pcre2_match() to be displayed. If a mark is
-       returned for a match, non-match, or partial match, pcre2test shows  it.
-       For  a  match, it is on a line by itself, tagged with "MK:". Otherwise,
+       are returned from calls to pcre2_match() to be displayed. If a mark  is
+       returned  for a match, non-match, or partial match, pcre2test shows it.
+       For a match, it is on a line by itself, tagged with  "MK:".  Otherwise,
        it is added to the non-match message.

    Showing memory usage

-       The memory modifier causes pcre2test to log all memory  allocation  and
+       The  memory  modifier causes pcre2test to log all memory allocation and
        freeing calls that occur during a match operation.

    Setting a starting offset

-       The  offset  modifier  sets  an  offset  in the subject string at which
+       The offset modifier sets an offset  in  the  subject  string  at  which
        matching starts. Its value is a number of code units, not characters.

    Setting an offset limit

-       The offset_limit modifier sets a limit for  unanchored  matches.  If  a
+       The  offset_limit  modifier  sets  a limit for unanchored matches. If a
        match cannot be found starting at or before this offset in the subject,
        a "no match" return is given. The data value is a number of code units,
-       not  characters. When this modifier is used, the use_offset_limit modi-
+       not characters. When this modifier is used, the use_offset_limit  modi-
        fier must have been set for the pattern; if not, an error is generated.

    Setting the size of the output vector

-       The ovector modifier applies only to  the  subject  line  in  which  it
-       appears,  though  of  course  it can also be used to set a default in a
-       #subject command. It specifies the number of pairs of offsets that  are
+       The  ovector  modifier  applies  only  to  the subject line in which it
+       appears, though of course it can also be used to set  a  default  in  a
+       #subject  command. It specifies the number of pairs of offsets that are
        available for storing matching information. The default is 15.

-       A  value of zero is useful when testing the POSIX API because it causes
+       A value of zero is useful when testing the POSIX API because it  causes
        regexec() to be called with a NULL capture vector. When not testing the
-       POSIX  API,  a  value  of  zero  is used to cause pcre2_match_data_cre-
-       ate_from_pattern() to be called, in order to create a  match  block  of
+       POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
+       ate_from_pattern()  to  be  called, in order to create a match block of
        exactly the right size for the pattern. (It is not possible to create a
-       match block with a zero-length ovector; there is always  at  least  one
+       match  block  with  a zero-length ovector; there is always at least one
        pair of offsets.)

    Passing the subject as zero-terminated
@@ -1236,56 +1248,56 @@

        By default, the subject string is passed to a native API matching func-
        tion with its correct length. In order to test the facility for passing
-       a  zero-terminated  string, the zero_terminate modifier is provided. It
+       a zero-terminated string, the zero_terminate modifier is  provided.  It
        causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
-       via  the  POSIX  interface, this modifier has no effect, as there is no
+       via the POSIX interface, this modifier has no effect, as  there  is  no
        facility for passing a length.)

-       When testing pcre2_substitute(), this modifier also has the  effect  of
+       When  testing  pcre2_substitute(), this modifier also has the effect of
        passing the replacement string as zero-terminated.

    Passing a NULL context

-       Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
+       Normally,  pcre2test  passes  a   context   block   to   pcre2_match(),
        pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
-       set,  however,  NULL  is  passed. This is for testing that the matching
+       set, however, NULL is passed. This is for  testing  that  the  matching
        functions behave correctly in this case (they use default values). This
-       modifier  cannot  be used with the find_limits modifier or when testing
+       modifier cannot be used with the find_limits modifier or  when  testing
        the substitution function.

THE ALTERNATIVE MATCHING FUNCTION

-       By default,  pcre2test  uses  the  standard  PCRE2  matching  function,
+       By  default,  pcre2test  uses  the  standard  PCRE2  matching function,
        pcre2_match() to match each subject line. PCRE2 also supports an alter-
-       native matching function, pcre2_dfa_match(), which operates in  a  dif-
-       ferent  way, and has some restrictions. The differences between the two
+       native  matching  function, pcre2_dfa_match(), which operates in a dif-
+       ferent way, and has some restrictions. The differences between the  two
        functions are described in the pcre2matching documentation.

-       If the dfa modifier is set, the alternative matching function is  used.
-       This  function  finds all possible matches at a given point in the sub-
-       ject. If, however, the dfa_shortest modifier is set,  processing  stops
-       after  the  first  match is found. This is always the shortest possible
+       If  the dfa modifier is set, the alternative matching function is used.
+       This function finds all possible matches at a given point in  the  sub-
+       ject.  If,  however, the dfa_shortest modifier is set, processing stops
+       after the first match is found. This is always  the  shortest  possible
        match.

DEFAULT OUTPUT FROM pcre2test

-       This section describes the output when the  normal  matching  function,
+       This  section  describes  the output when the normal matching function,
        pcre2_match(), is being used.

-       When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
-       strings, starting with number 0 for the string that matched  the  whole
-       pattern.    Otherwise,  it  outputs  "No  match"  when  the  return  is
-       PCRE2_ERROR_NOMATCH, or "Partial  match:"  followed  by  the  partially
-       matching  substring  when the return is PCRE2_ERROR_PARTIAL. (Note that
-       this is the entire substring that  was  inspected  during  the  partial
-       match;  it  may  include  characters before the actual match start if a
+       When a match succeeds, pcre2test outputs  the  list  of  captured  sub-
+       strings,  starting  with number 0 for the string that matched the whole
+       pattern.   Otherwise,  it  outputs  "No  match"  when  the  return   is
+       PCRE2_ERROR_NOMATCH,  or  "Partial  match:"  followed  by the partially
+       matching substring when the return is PCRE2_ERROR_PARTIAL.  (Note  that
+       this  is  the  entire  substring  that was inspected during the partial
+       match; it may include characters before the actual  match  start  if  a
        lookbehind assertion, \K, \b, or \B was involved.)

        For any other return, pcre2test outputs the PCRE2 negative error number
-       and  a  short  descriptive  phrase. If the error is a failed UTF string
-       check, the code unit offset of the start of the  failing  character  is
+       and a short descriptive phrase. If the error is  a  failed  UTF  string
+       check,  the  code  unit offset of the start of the failing character is
        also output. Here is an example of an interactive pcre2test run.

          $ pcre2test
@@ -1301,8 +1313,8 @@
        Unset capturing substrings that are not followed by one that is set are
        not shown by pcre2test unless the allcaptures modifier is specified. In
        the following example, there are two capturing substrings, but when the
-       first data line is matched, the second, unset substring is  not  shown.
-       An  "internal" unset substring is shown as "<unset>", as for the second
+       first  data  line is matched, the second, unset substring is not shown.
+       An "internal" unset substring is shown as "<unset>", as for the  second
        data line.

            re> /(a)|(b)/
@@ -1314,11 +1326,11 @@
           1: <unset>
           2: b

-       If the strings contain any non-printing characters, they are output  as
-       \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
+       If  the strings contain any non-printing characters, they are output as
+       \xhh escapes if the value is less than 256 and UTF  mode  is  not  set.
        Otherwise they are output as \x{hh...} escapes. See below for the defi-
-       nition  of  non-printing characters. If the /aftertext modifier is set,
-       the output for substring 0 is followed by the the rest of  the  subject
+       nition of non-printing characters. If the /aftertext modifier  is  set,
+       the  output  for substring 0 is followed by the the rest of the subject
        string, identified by "0+" like this:

            re> /cat/aftertext
@@ -1326,7 +1338,7 @@
           0: cat
           0+ aract

-       If  global  matching  is  requested, the results of successive matching
+       If global matching is requested, the  results  of  successive  matching
        attempts are output in sequence, like this:

            re> /\Bi(\w\w)/g
@@ -1338,8 +1350,8 @@
           0: ipp
           1: pp

-       "No match" is output only if the first match attempt fails. Here is  an
-       example  of  a  failure  message (the offset 4 that is specified by the
+       "No  match" is output only if the first match attempt fails. Here is an
+       example of a failure message (the offset 4 that  is  specified  by  the
        offset modifier is past the end of the subject string):

            re> /xyz/
@@ -1347,7 +1359,7 @@
          Error -24 (bad offset value)

        Note that whereas patterns can be continued over several lines (a plain
-       ">"  prompt  is used for continuations), subject lines may not. However
+       ">" prompt is used for continuations), subject lines may  not.  However
        newlines can be included in a subject by means of the \n escape (or \r,
        \r\n, etc., depending on the newline sequence setting).

@@ -1355,7 +1367,7 @@
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION

        When the alternative matching function, pcre2_dfa_match(), is used, the
-       output consists of a list of all the matches that start  at  the  first
+       output  consists  of  a list of all the matches that start at the first
        point in the subject where there is at least one match. For example:

            re> /(tang|tangerine|tan)/
@@ -1364,11 +1376,11 @@
           1: tang
           2: tan

-       Using  the normal matching function on this data finds only "tang". The
-       longest matching string is always  given  first  (and  numbered  zero).
-       After  a  PCRE2_ERROR_PARTIAL  return,  the output is "Partial match:",
-       followed by the partially matching substring. Note  that  this  is  the
-       entire  substring  that  was inspected during the partial match; it may
+       Using the normal matching function on this data finds only "tang".  The
+       longest  matching  string  is  always  given first (and numbered zero).
+       After a PCRE2_ERROR_PARTIAL return, the  output  is  "Partial  match:",
+       followed  by  the  partially  matching substring. Note that this is the
+       entire substring that was inspected during the partial  match;  it  may
        include characters before the actual match start if a lookbehind asser-
        tion, \b, or \B was involved. (\K is not supported for DFA matching.)

@@ -1384,16 +1396,16 @@
           1: tan
           0: tan

-       The alternative matching function does not support  substring  capture,
-       so  the  modifiers  that are concerned with captured substrings are not
+       The  alternative  matching function does not support substring capture,
+       so the modifiers that are concerned with captured  substrings  are  not
        relevant.

RESTARTING AFTER A PARTIAL MATCH

-       When the alternative matching function has given  the  PCRE2_ERROR_PAR-
+       When  the  alternative matching function has given the PCRE2_ERROR_PAR-
        TIAL return, indicating that the subject partially matched the pattern,
-       you can restart the match with additional subject data by means of  the
+       you  can restart the match with additional subject data by means of the
        dfa_restart modifier. For example:

            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -1402,7 +1414,7 @@
          data> n05\=dfa,dfa_restart
           0: n05

-       For  further  information  about partial matching, see the pcre2partial
+       For further information about partial matching,  see  the  pcre2partial
        documentation.

@@ -1409,38 +1421,38 @@
CALLOUTS

        If the pattern contains any callout requests, pcre2test's callout func-
-       tion  is called during matching unless callout_none is specified.  This
+       tion is called during matching unless callout_none is specified.   This
        works with both matching functions.

-       The callout function in pcre2test returns zero (carry on  matching)  by
-       default,  but you can use a callout_fail modifier in a subject line (as
+       The  callout  function in pcre2test returns zero (carry on matching) by
+       default, but you can use a callout_fail modifier in a subject line  (as
        described above) to change this and other parameters of the callout.

        Inserting callouts can be helpful when using pcre2test to check compli-
-       cated  regular expressions. For further information about callouts, see
+       cated regular expressions. For further information about callouts,  see
        the pcre2callout documentation.

-       The output for callouts with numerical arguments and those with  string
+       The  output for callouts with numerical arguments and those with string
        arguments is slightly different.

    Callouts with numerical arguments

        By default, the callout function displays the callout number, the start
-       and current positions in the subject text at the callout time, and  the
+       and  current positions in the subject text at the callout time, and the
        next pattern item to be tested. For example:

          --->pqrabcdef
            0    ^  ^     \d

-       This  output  indicates  that  callout  number  0  occurred for a match
-       attempt starting at the fourth character of the  subject  string,  when
-       the  pointer  was  at  the seventh character, and when the next pattern
-       item was \d. Just one circumflex is output if  the  start  and  current
-       positions  are  the same, or if the current position precedes the start
+       This output indicates that  callout  number  0  occurred  for  a  match
+       attempt  starting  at  the fourth character of the subject string, when
+       the pointer was at the seventh character, and  when  the  next  pattern
+       item  was  \d.  Just  one circumflex is output if the start and current
+       positions are the same, or if the current position precedes  the  start
        position, which can happen if the callout is in a lookbehind assertion.

        Callouts numbered 255 are assumed to be automatic callouts, inserted as
-       a  result  of the /auto_callout pattern modifier. In this case, instead
+       a result of the /auto_callout pattern modifier. In this  case,  instead
        of showing the callout number, the offset in the pattern, preceded by a
        plus, is output. For example:

@@ -1454,7 +1466,7 @@
           0: E*

        If a pattern contains (*MARK) items, an additional line is output when-
-       ever a change of latest mark is passed to  the  callout  function.  For
+       ever  a  change  of  latest mark is passed to the callout function. For
        example:

            re> /a(*MARK:X)bc/auto_callout
@@ -1468,17 +1480,17 @@
          +12 ^  ^
           0: abc

-       The  mark  changes between matching "a" and "b", but stays the same for
-       the rest of the match, so nothing more is output. If, as  a  result  of
-       backtracking,  the  mark  reverts to being unset, the text "<unset>" is
+       The mark changes between matching "a" and "b", but stays the  same  for
+       the  rest  of  the match, so nothing more is output. If, as a result of
+       backtracking, the mark reverts to being unset, the  text  "<unset>"  is
        output.

    Callouts with string arguments

        The output for a callout with a string argument is similar, except that
-       instead  of outputting a callout number before the position indicators,
-       the callout string and its offset in  the  pattern  string  are  output
-       before  the reflection of the subject string, and the subject string is
+       instead of outputting a callout number before the position  indicators,
+       the  callout  string  and  its  offset in the pattern string are output
+       before the reflection of the subject string, and the subject string  is
        reflected for each callout. For example:

            re> /^ab(?C'first')cd(?C"second")ef/
@@ -1495,43 +1507,43 @@
 NON-PRINTING CHARACTERS

        When pcre2test is outputting text in the compiled version of a pattern,
-       bytes  other  than 32-126 are always treated as non-printing characters
+       bytes other than 32-126 are always treated as  non-printing  characters
        and are therefore shown as hex escapes.

-       When pcre2test is outputting text that is a matched part of  a  subject
-       string,  it behaves in the same way, unless a different locale has been
-       set for the pattern (using the /locale modifier).  In  this  case,  the
-       isprint()  function  is  used  to distinguish printing and non-printing
+       When  pcre2test  is outputting text that is a matched part of a subject
+       string, it behaves in the same way, unless a different locale has  been
+       set  for  the  pattern  (using the /locale modifier). In this case, the
+       isprint() function is used to  distinguish  printing  and  non-printing
        characters.

SAVING AND RESTORING COMPILED PATTERNS

-       It is possible to save compiled patterns  on  disc  or  elsewhere,  and
+       It  is  possible  to  save  compiled patterns on disc or elsewhere, and
        reload them later, subject to a number of restrictions. JIT data cannot
-       be saved. The host on which the patterns are reloaded must  be  running
+       be  saved.  The host on which the patterns are reloaded must be running
        the same version of PCRE2, with the same code unit width, and must also
-       have the same endianness, pointer width  and  PCRE2_SIZE  type.  Before
-       compiled  patterns  can be saved they must be serialized, that is, con-
-       verted to a stream of bytes. A single byte stream may contain any  num-
-       ber  of  compiled  patterns,  but  they must all use the same character
+       have  the  same  endianness,  pointer width and PCRE2_SIZE type. Before
+       compiled patterns can be saved they must be serialized, that  is,  con-
+       verted  to a stream of bytes. A single byte stream may contain any num-
+       ber of compiled patterns, but they must  all  use  the  same  character
        tables. A single copy of the tables is included in the byte stream (its
        size is 1088 bytes).

-       The  functions  whose  names  begin  with pcre2_serialize_ are used for
-       serializing and de-serializing. They are described in the  pcre2serial-
+       The functions whose names begin  with  pcre2_serialize_  are  used  for
+       serializing  and de-serializing. They are described in the pcre2serial-
        ize  documentation.  In  this  section  we  describe  the  features  of
        pcre2test that can be used to test these functions.

-       When a pattern with push  modifier  is  successfully  compiled,  it  is
-       pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
-       next line to contain a new pattern (or command) instead  of  a  subject
-       line.  By contrast, the pushcopy modifier causes a copy of the compiled
-       pattern to be stacked, leaving the  original  available  for  immediate
-       matching.  By  using  push and/or pushcopy, a number of patterns can be
+       When  a  pattern  with  push  modifier  is successfully compiled, it is
+       pushed onto a stack of compiled patterns,  and  pcre2test  expects  the
+       next  line  to  contain a new pattern (or command) instead of a subject
+       line. By contrast, the pushcopy modifier causes a copy of the  compiled
+       pattern  to  be  stacked,  leaving the original available for immediate
+       matching. By using push and/or pushcopy, a number of  patterns  can  be
        compiled and retained. These modifiers are incompatible with posix, and
-       control  modifiers  that act at match time are ignored (with a message)
-       for the stacked patterns. The jitverify modifier applies only  at  com-
+       control modifiers that act at match time are ignored (with  a  message)
+       for  the  stacked patterns. The jitverify modifier applies only at com-
        pile time.

        The command
@@ -1539,21 +1551,21 @@
          #save <filename>

        causes all the stacked patterns to be serialized and the result written
-       to the named file. Afterwards, all the stacked patterns are freed.  The
+       to  the named file. Afterwards, all the stacked patterns are freed. The
        command

          #load <filename>

-       reads  the  data in the file, and then arranges for it to be de-serial-
-       ized, with the resulting compiled patterns added to the pattern  stack.
-       The  pattern  on the top of the stack can be retrieved by the #pop com-
-       mand, which must be followed by  lines  of  subjects  that  are  to  be
-       matched  with  the pattern, terminated as usual by an empty line or end
-       of file. This command may be followed by  a  modifier  list  containing
-       only  control  modifiers that act after a pattern has been compiled. In
+       reads the data in the file, and then arranges for it to  be  de-serial-
+       ized,  with the resulting compiled patterns added to the pattern stack.
+       The pattern on the top of the stack can be retrieved by the  #pop  com-
+       mand,  which  must  be  followed  by  lines  of subjects that are to be
+       matched with the pattern, terminated as usual by an empty line  or  end
+       of  file.  This  command  may be followed by a modifier list containing
+       only control modifiers that act after a pattern has been  compiled.  In
        particular,  hex,  posix,  posix_nosub,  push,  and  pushcopy  are  not
-       allowed,  nor are any option-setting modifiers.  The JIT modifiers are,
-       however permitted. Here is an example that saves and reloads  two  pat-
+       allowed, nor are any option-setting modifiers.  The JIT modifiers  are,
+       however  permitted.  Here is an example that saves and reloads two pat-
        terns.

          /abc/push
@@ -1566,10 +1578,10 @@
          #pop jit,bincode
          abc

-       If  jitverify  is  used with #pop, it does not automatically imply jit,
+       If jitverify is used with #pop, it does not  automatically  imply  jit,
        which is different behaviour from when it is used on a pattern.

-       The #popcopy command is analagous to the pushcopy modifier in  that  it
+       The  #popcopy  command is analagous to the pushcopy modifier in that it
        makes current a copy of the topmost stack pattern, leaving the original
        still on the stack.

@@ -1589,5 +1601,5 @@

REVISION

-       Last updated: 02 August 2016
+       Last updated: 04 November 2016
        Copyright (c) 1997-2016 University of Cambridge.

Modified: code/trunk/src/pcre2.h
===================================================================
--- code/trunk/src/pcre2.h    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/src/pcre2.h    2016-11-22 15:37:02 UTC (rev 605)
@@ -465,7 +465,9 @@
 PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
   pcre2_code_free(pcre2_code *); \
 PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \
-  *pcre2_code_copy(const pcre2_code *);
+  *pcre2_code_copy(const pcre2_code *); \
+PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \
+  *pcre2_code_copy_with_tables(const pcre2_code *);

/* Functions that give information about a compiled pattern. */
@@ -629,6 +631,7 @@

 #define pcre2_callout_enumerate               PCRE2_SUFFIX(pcre2_callout_enumerate_)
 #define pcre2_code_copy                       PCRE2_SUFFIX(pcre2_code_copy_)
+#define pcre2_code_copy_with_tables           PCRE2_SUFFIX(pcre2_code_copy_with_tables_)
 #define pcre2_code_free                       PCRE2_SUFFIX(pcre2_code_free_)
 #define pcre2_compile                         PCRE2_SUFFIX(pcre2_compile_)
 #define pcre2_compile_context_copy            PCRE2_SUFFIX(pcre2_compile_context_copy_)

Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/src/pcre2.h.in    2016-11-22 15:37:02 UTC (rev 605)
@@ -465,7 +465,9 @@
 PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
   pcre2_code_free(pcre2_code *); \
 PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \
-  *pcre2_code_copy(const pcre2_code *);
+  *pcre2_code_copy(const pcre2_code *); \
+PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \
+  *pcre2_code_copy_with_tables(const pcre2_code *);

/* Functions that give information about a compiled pattern. */
@@ -629,6 +631,7 @@

 #define pcre2_callout_enumerate               PCRE2_SUFFIX(pcre2_callout_enumerate_)
 #define pcre2_code_copy                       PCRE2_SUFFIX(pcre2_code_copy_)
+#define pcre2_code_copy_with_tables           PCRE2_SUFFIX(pcre2_code_copy_with_tables_)
 #define pcre2_code_free                       PCRE2_SUFFIX(pcre2_code_free_)
 #define pcre2_compile                         PCRE2_SUFFIX(pcre2_compile_)
 #define pcre2_compile_context_copy            PCRE2_SUFFIX(pcre2_compile_context_copy_)

Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/src/pcre2_compile.c    2016-11-22 15:37:02 UTC (rev 605)
@@ -1036,7 +1036,46 @@
   ref_count = (PCRE2_SIZE *)(code->tables + tables_length);
   (*ref_count)++;
   }
+  
+return newcode;
+}

+
+
+/*************************************************
+*     Copy compiled code and character tables    *
+*************************************************/
+
+/* Compiled JIT code cannot be copied, so the new compiled block has no
+associated JIT data. This version of code_copy also makes a separate copy of
+the character tables. */
+
+PCRE2_EXP_DEFN pcre2_code * PCRE2_CALL_CONVENTION
+pcre2_code_copy_with_tables(const pcre2_code *code)
+{
+PCRE2_SIZE* ref_count;
+pcre2_code *newcode;
+uint8_t *newtables;
+
+if (code == NULL) return NULL;
+newcode = code->memctl.malloc(code->blocksize, code->memctl.memory_data);
+if (newcode == NULL) return NULL;
+memcpy(newcode, code, code->blocksize);
+newcode->executable_jit = NULL;
+
+newtables = code->memctl.malloc(tables_length + sizeof(PCRE2_SIZE),
+  code->memctl.memory_data);
+if (newtables == NULL)
+  {
+  code->memctl.free((void *)newcode, code->memctl.memory_data);
+  return NULL;
+  }
+memcpy(newtables, code->tables, tables_length);
+ref_count = (PCRE2_SIZE *)(newtables + tables_length);
+*ref_count = 1;
+
+newcode->tables = newtables;
+newcode->flags |= PCRE2_DEREF_TABLES;
 return newcode;
 }

@@ -2367,7 +2406,7 @@
assertion, possibly preceded by a callout. If the value is 1, we have just
had the callout and expect an assertion. There must be at least 3 more
characters in all cases. We know that the current character is an opening
- parenthesis, as otherwise we wouldn't be here. Note that expect_cond_assert
+ parenthesis, as otherwise we wouldn't be here. Note that expect_cond_assert
may be negative, since all callouts just decrement it. */

   if (expect_cond_assert > 0)
@@ -2377,23 +2416,23 @@
       {
       case CHAR_C:
       ok = expect_cond_assert == 2;
-      break;  
- 
+      break;
+
       case CHAR_EQUALS_SIGN:
       case CHAR_EXCLAMATION_MARK:
       break;
-      
+
       case CHAR_LESS_THAN_SIGN:
       ok = ptr[2] == CHAR_EQUALS_SIGN || ptr[2] == CHAR_EXCLAMATION_MARK;
       break;
-      
+
       default:
-      ok = FALSE;       
-      }   
+      ok = FALSE;
+      }

     if (!ok)
       {
-      ptr--;   /* Adjust error offset */ 
+      ptr--;   /* Adjust error offset */
       errorcode = ERR28;
       goto FAILED;
       }
@@ -3559,7 +3598,7 @@
       if (*ptr == CHAR_QUESTION_MARK)
         {
         *parsed_pattern++ = META_COND_ASSERT;
-        ptr--;   /* Pull pointer back to the opening parenthesis. */ 
+        ptr--;   /* Pull pointer back to the opening parenthesis. */
         expect_cond_assert = 2;
         break;  /* End of conditional */
         }

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/src/pcre2test.c    2016-11-22 15:37:02 UTC (rev 605)
@@ -427,16 +427,14 @@
 #define CTL_NULLCONTEXT                  0x00200000u
 #define CTL_POSIX                        0x00400000u
 #define CTL_POSIX_NOSUB                  0x00800000u
-#define CTL_PUSH                         0x01000000u
-#define CTL_PUSHCOPY                     0x02000000u
-#define CTL_STARTCHAR                    0x04000000u
-#define CTL_USE_LENGTH                   0x08000000u  /* Same word as HEXPAT */
-#define CTL_UTF8_INPUT                   0x10000000u
-#define CTL_ZERO_TERMINATE               0x20000000u
+#define CTL_PUSH                         0x01000000u  /* These three must be */
+#define CTL_PUSHCOPY                     0x02000000u  /*   all in the same */
+#define CTL_PUSHTABLESCOPY               0x04000000u  /*     word. */          
+#define CTL_STARTCHAR                    0x08000000u
+#define CTL_USE_LENGTH                   0x10000000u  /* Same word as HEXPAT */
+#define CTL_UTF8_INPUT                   0x20000000u
+#define CTL_ZERO_TERMINATE               0x40000000u

-#define CTL_NL_SET                       0x40000000u  /* Informational */
-#define CTL_BSR_SET                      0x80000000u  /* Informational */
-
 /* Second control word */

 #define CTL2_SUBSTITUTE_EXTENDED         0x00000001u
@@ -444,6 +442,9 @@
 #define CTL2_SUBSTITUTE_UNKNOWN_UNSET    0x00000004u
 #define CTL2_SUBSTITUTE_UNSET_EMPTY      0x00000008u

+#define CTL_NL_SET                       0x40000000u  /* Informational */
+#define CTL_BSR_SET                      0x80000000u  /* Informational */
+
 /* Combinations */

 #define CTL_DEBUG            (CTL_FULLBINCODE|CTL_INFO)  /* For setting */
@@ -607,7 +608,8 @@
   { "posix_nosub",                MOD_PAT,  MOD_CTL, CTL_POSIX|CTL_POSIX_NOSUB,  PO(control) },
   { "ps",                         MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,         DO(options) },
   { "push",                       MOD_PAT,  MOD_CTL, CTL_PUSH,                   PO(control) },
-  { "pushcopy",                   MOD_PAT,  MOD_CTL, CTL_PUSHCOPY,              PO(control) },
+  { "pushcopy",                   MOD_PAT,  MOD_CTL, CTL_PUSHCOPY,               PO(control) },
+  { "pushtablescopy",             MOD_PAT,  MOD_CTL, CTL_PUSHTABLESCOPY,         PO(control) },
   { "recursion_limit",            MOD_CTM,  MOD_INT, 0,                          MO(recursion_limit) },
   { "regerror_buffsize",          MOD_PAT,  MOD_INT, 0,                          PO(regerror_buffsize) },
   { "replace",                    MOD_PND,  MOD_STR, REPLACE_MODSIZE,            PO(replacement) },
@@ -651,10 +653,10 @@

#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \
CTL_BINCODE|CTL_CALLOUT_INFO|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO| \
- CTL_JITVERIFY|CTL_MEMORY|CTL_PUSH|CTL_PUSHCOPY|CTL_BSR_SET|CTL_NL_SET| \
+ CTL_JITVERIFY|CTL_MEMORY|CTL_PUSH|CTL_PUSHCOPY|CTL_PUSHTABLESCOPY| \
CTL_USE_LENGTH)

-#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (0)
+#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL_BSR_SET|CTL_NL_SET)

/* Controls that apply only at compile time with 'push'. */

@@ -664,7 +666,7 @@
/* Controls that are forbidden with #pop or #popcopy. */

#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_POSIX_NOSUB|CTL_PUSH| \
- CTL_PUSHCOPY|CTL_USE_LENGTH)
+ CTL_PUSHCOPY|CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)

/* Pattern controls that are mutually exclusive. At present these are all in
the first control word. Note that CTL_POSIX_NOSUB is always accompanied by
@@ -674,6 +676,7 @@
CTL_POSIX | CTL_HEXPAT,
CTL_POSIX | CTL_PUSH,
CTL_POSIX | CTL_PUSHCOPY,
+ CTL_POSIX | CTL_PUSHTABLESCOPY,
CTL_POSIX | CTL_USE_LENGTH,
CTL_EXPAND | CTL_HEXPAT };

@@ -973,6 +976,14 @@
   else \
     a = (void *)pcre2_code_copy_32(G(b,32))

+#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) \
+  if (test_mode == PCRE8_MODE) \
+    a = (void *)pcre2_code_copy_with_tables_8(G(b,8)); \
+  else if (test_mode == PCRE16_MODE) \
+    a = (void *)pcre2_code_copy_with_tables_16(G(b,16)); \
+  else \
+    a = (void *)pcre2_code_copy_with_tables_32(G(b,32))
+
 #define PCRE2_COMPILE(a,b,c,d,e,f,g) \
   if (test_mode == PCRE8_MODE) \
     G(a,8) = pcre2_compile_8(G(b,8),c,d,e,f,g); \
@@ -1436,6 +1447,12 @@
   else \
     a = (void *)G(pcre2_code_copy_,BITTWO)(G(b,BITTWO))

+#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) \
+  if (test_mode == G(G(PCRE,BITONE),_MODE)) \
+    a = (void *)G(pcre2_code_copy_with_tables_,BITONE)(G(b,BITONE)); \
+  else \
+    a = (void *)G(pcre2_code_copy_with_tables_,BITTWO)(G(b,BITTWO))
+
 #define PCRE2_COMPILE(a,b,c,d,e,f,g) \
   if (test_mode == G(G(PCRE,BITONE),_MODE)) \
     G(a,BITONE) = G(pcre2_compile_,BITONE)(G(b,BITONE),c,d,e,f,g); \
@@ -1773,6 +1790,7 @@
      (int (*)(struct pcre2_callout_enumerate_block_8 *, void *))b,c)
 #define PCRE2_CODE_COPY_FROM_VOID(a,b) G(a,8) = pcre2_code_copy_8(b)
 #define PCRE2_CODE_COPY_TO_VOID(a,b) a = (void *)pcre2_code_copy_8(G(b,8))
+#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) a = (void *)pcre2_code_copy_with_tables_8(G(b,8))
 #define PCRE2_COMPILE(a,b,c,d,e,f,g) \
   G(a,8) = pcre2_compile_8(G(b,8),c,d,e,f,g)
 #define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \
@@ -1868,6 +1886,7 @@
      (int (*)(struct pcre2_callout_enumerate_block_16 *, void *))b,c)
 #define PCRE2_CODE_COPY_FROM_VOID(a,b) G(a,16) = pcre2_code_copy_16(b)
 #define PCRE2_CODE_COPY_TO_VOID(a,b) a = (void *)pcre2_code_copy_16(G(b,16))
+#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) a = (void *)pcre2_code_copy_with_tables_16(G(b,16))
 #define PCRE2_COMPILE(a,b,c,d,e,f,g) \
   G(a,16) = pcre2_compile_16(G(b,16),c,d,e,f,g)
 #define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \
@@ -1963,6 +1982,7 @@
      (int (*)(struct pcre2_callout_enumerate_block_32 *, void *))b,c)
 #define PCRE2_CODE_COPY_FROM_VOID(a,b) G(a,32) = pcre2_code_copy_32(b)
 #define PCRE2_CODE_COPY_TO_VOID(a,b) a = (void *)pcre2_code_copy_32(G(b,32))
+#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) a = (void *)pcre2_code_copy_with_tables_32(G(b,32))
 #define PCRE2_COMPILE(a,b,c,d,e,f,g) \
   G(a,32) = pcre2_compile_32(G(b,32),c,d,e,f,g)
 #define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \
@@ -3435,8 +3455,8 @@
 #else
       *((uint16_t *)field) = PCRE2_BSR_UNICODE;
 #endif
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control &= ~CTL_BSR_SET;
-        else dctl->control &= ~CTL_BSR_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_BSR_SET;
+        else dctl->control2 &= ~CTL_BSR_SET;
       }
     else
       {
@@ -3445,8 +3465,8 @@
       else if (len == 7 && strncmpic(pp, (const uint8_t *)"unicode", 7) == 0)
         *((uint16_t *)field) = PCRE2_BSR_UNICODE;
       else goto INVALID_VALUE;
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control |= CTL_BSR_SET;
-        else dctl->control |= CTL_BSR_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_BSR_SET;
+        else dctl->control2 |= CTL_BSR_SET;
       }
     pp = ep;
     break;
@@ -3513,14 +3533,14 @@
     if (i == 0)
       {
       *((uint16_t *)field) = NEWLINE_DEFAULT;
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control &= ~CTL_NL_SET;
-        else dctl->control &= ~CTL_NL_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_NL_SET;
+        else dctl->control2 &= ~CTL_NL_SET;
       }
     else
       {
       *((uint16_t *)field) = i;
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control |= CTL_NL_SET;
-        else dctl->control |= CTL_NL_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_NL_SET;
+        else dctl->control2 |= CTL_NL_SET;
       }
     pp = ep;
     break;
@@ -3691,7 +3711,7 @@
 static void
 show_controls(uint32_t controls, uint32_t controls2, const char *before)
 {
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
   before,
   ((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
   ((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@@ -3699,7 +3719,7 @@
   ((controls & CTL_ALLUSEDTEXT) != 0)? " allusedtext" : "",
   ((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "",
   ((controls & CTL_BINCODE) != 0)? " bincode" : "",
-  ((controls & CTL_BSR_SET) != 0)? " bsr" : "",
+  ((controls2 & CTL_BSR_SET) != 0)? " bsr" : "",
   ((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
   ((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
   ((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
@@ -3715,12 +3735,13 @@
   ((controls & CTL_JITVERIFY) != 0)? " jitverify" : "",
   ((controls & CTL_MARK) != 0)? " mark" : "",
   ((controls & CTL_MEMORY) != 0)? " memory" : "",
-  ((controls & CTL_NL_SET) != 0)? " newline" : "",
+  ((controls2 & CTL_NL_SET) != 0)? " newline" : "",
   ((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "",
   ((controls & CTL_POSIX) != 0)? " posix" : "",
   ((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "",
   ((controls & CTL_PUSH) != 0)? " push" : "",
   ((controls & CTL_PUSHCOPY) != 0)? " pushcopy" : "",
+  ((controls & CTL_PUSHTABLESCOPY) != 0)? " pushtablescopy" : "",
   ((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
   ((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
   ((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
@@ -4061,7 +4082,7 @@

if (jchanged) fprintf(outfile, "Duplicate name status changes\n");

-  if ((pat_patctl.control & CTL_BSR_SET) != 0 ||
+  if ((pat_patctl.control2 & CTL_BSR_SET) != 0 ||
       (FLD(compiled_code, flags) & PCRE2_BSR_SET) != 0)
     fprintf(outfile, "\\R matches %s\n", (bsr_convention == PCRE2_BSR_UNICODE)?
       "any Unicode newline" : "CR, LF, or CRLF");
@@ -4930,7 +4951,7 @@
 /* Handle compiling via the native interface. Controls that act later are
 ignored with "push". Replacements are locked out. */

-if ((pat_patctl.control & (CTL_PUSH|CTL_PUSHCOPY)) != 0)
+if ((pat_patctl.control & (CTL_PUSH|CTL_PUSHCOPY|CTL_PUSHTABLESCOPY)) != 0)
   {
   if (pat_patctl.replacement[0] != 0)
     {
@@ -5031,7 +5052,7 @@
 appropriate default newline setting, local_newline_default will be non-zero. We
 use this if there is no explicit newline modifier. */

-if ((pat_patctl.control & CTL_NL_SET) == 0 && local_newline_default != 0)
+if ((pat_patctl.control2 & CTL_NL_SET) == 0 && local_newline_default != 0)
{
SETFLD(pat_context, newline_convention, local_newline_default);
}
@@ -5163,7 +5184,7 @@
/* If an explicit newline modifier was given, set the information flag in the
pattern so that it is preserved over push/pop. */

-if ((pat_patctl.control & CTL_NL_SET) != 0)
+if ((pat_patctl.control2 & CTL_NL_SET) != 0)
{
SETFLD(compiled_code, flags, FLD(compiled_code, flags) | PCRE2_NL_SET);
}
@@ -5191,10 +5212,11 @@
SET(compiled_code, NULL);
}

-/* The "pushcopy" control is similar, but pushes a copy of the pattern. This
-tests the pcre2_code_copy() function. */
+/* The "pushcopy" and "pushtablescopy" controls are similar, but push a
+copy of the pattern, the latter with a copy of its character tables. This tests
+the pcre2_code_copy() and pcre2_code_copy_with_tables() functions. */

-if ((pat_patctl.control & CTL_PUSHCOPY) != 0)
+if ((pat_patctl.control & (CTL_PUSHCOPY|CTL_PUSHTABLESCOPY)) != 0)
   {
   if (patstacknext >= PATSTACKSIZE)
     {
@@ -5201,7 +5223,14 @@
     fprintf(outfile, "** Too many pushed patterns (max %d)\n", PATSTACKSIZE);
     return PR_ABEND;
     }
-  PCRE2_CODE_COPY_TO_VOID(patstack[patstacknext++], compiled_code);
+  if ((pat_patctl.control & CTL_PUSHCOPY) != 0)
+    {
+    PCRE2_CODE_COPY_TO_VOID(patstack[patstacknext++], compiled_code);
+    }
+  else
+    {     
+    PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(patstack[patstacknext++],
+      compiled_code); }
   }

return PR_OK;

Modified: code/trunk/testdata/testinput20
===================================================================
--- code/trunk/testdata/testinput20    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/testdata/testinput20    2016-11-22 15:37:02 UTC (rev 605)
@@ -88,4 +88,13 @@

#pop should give an error

+/abcd/pushtablescopy
+    abcd
+
+#popcopy 
+    abcd
+    
+#pop
+    abcd 
+
 # End of testinput20

Modified: code/trunk/testdata/testoutput20
===================================================================
--- code/trunk/testdata/testoutput20    2016-11-22 12:31:03 UTC (rev 604)
+++ code/trunk/testdata/testoutput20    2016-11-22 15:37:02 UTC (rev 605)
@@ -135,4 +135,16 @@
 #pop should give an error
 ** Can't pop off an empty stack

+/abcd/pushtablescopy
+    abcd
+ 0: abcd
+
+#popcopy 
+    abcd
+ 0: abcd
+    
+#pop
+    abcd 
+ 0: abcd
+
 # End of testinput20

Questo messaggio è parte di questo thread:
	il thread completo ordinato per data

[Pcre-svn] [605] code/trunk: Add pcre2_code_copy_with_tables…