[Pcre-svn] [932] code/trunk: Re-factor pcre2_dfa_match() to use the heap instead of the stack for workspace

Author: Subversion repository
Date:
To: pcre-svn
Subject: [Pcre-svn] [932] code/trunk: Re-factor pcre2_dfa_match() to use the heap instead of the stack for workspace

Revision: 932

          http://www.exim.org/viewvc/pcre2?view=rev&revision=932
Author:   ph10
Date:     2018-04-27 17:48:35 +0100 (Fri, 27 Apr 2018)
Log Message:
-----------
Re-factor pcre2_dfa_match() to use the heap instead of the stack for workspace 
vectors when doing recursive function calls.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/README
    code/trunk/configure.ac
    code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt
    code/trunk/doc/html/README.txt
    code/trunk/doc/html/pcre2_dfa_match.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2build.html
    code/trunk/doc/html/pcre2callout.html
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/html/pcre2perform.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_dfa_match.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2build.3
    code/trunk/doc/pcre2callout.3
    code/trunk/doc/pcre2pattern.3
    code/trunk/doc/pcre2perform.3
    code/trunk/doc/pcre2test.1
    code/trunk/doc/pcre2test.txt
    code/trunk/src/config.h.in
    code/trunk/src/pcre2_dfa_match.c
    code/trunk/src/pcre2_internal.h
    code/trunk/src/pcre2_intmodedep.h
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput6

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/ChangeLog    2018-04-27 16:48:35 UTC (rev 932)
@@ -50,9 +50,17 @@

(c) Support for non-C99 snprintf() that returns -1 in the overflow case.

-11. Minor tidy of pcre2_dfa_matgch() code.
+11. Minor tidy of pcre2_dfa_match() code.

+12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
+use the stack for local workspace and local ovectors. Instead, an initial block
+of stack is reserved, but if this is insufficient, heap memory is used. The
+heap limit parameter now applies to pcre2_dfa_match().

+13. If a "find limits" test of DFA matching in pcre2test resulted in too many
+matches for the ovector, no matches were displayed.
+
+
Version 10.31 12-February-2018
------------------------------

Modified: code/trunk/README
===================================================================
--- code/trunk/README    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/README    2018-04-27 16:48:35 UTC (rev 932)
@@ -241,9 +241,11 @@
   discussion in the pcre2api man page (search for pcre2_set_match_limit).

. There is a separate counter that limits the depth of nested backtracking
- during a matching process, which indirectly limits the amount of heap memory
- that is used. This also has a default of ten million, which is essentially
- "unlimited". You can change the default by setting, for example,
+ (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
+ matching process, which indirectly limits the amount of heap memory that is
+ used, and in the case of pcre2_dfa_match() the amount of stack as well. This
+ counter also has a default of ten million, which is essentially "unlimited".
+ You can change the default by setting, for example,

--with-match-limit-depth=5000

@@ -251,7 +253,7 @@
pcre2_set_depth_limit).

. You can also set an explicit limit on the amount of heap memory used by
- the pcre2_match() interpreter:
+ the pcre2_match() and pcre2_dfa_match() interpreters:

--with-heap-limit=500

@@ -885,4 +887,4 @@
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 25 February 2018
+Last updated: 27 April 2018

Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/configure.ac    2018-04-27 16:48:35 UTC (rev 932)
@@ -142,7 +142,7 @@
               AS_HELP_STRING([--enable-jit],
                              [enable Just-In-Time compiling support]),
               , enable_jit=no)
-              
+
 # This code enables JIT if the hardware supports it.
 if test "$enable_jit" = "auto"; then
   AC_LANG(C)
@@ -718,10 +718,11 @@
 AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
   The value of MATCH_LIMIT determines the default number of times the
   pcre2_match() function can record a backtrack position during a single
-  matching attempt. There is a runtime interface for setting a different limit.
-  The limit exists in order to catch runaway regular expressions that take for
-  ever to determine that they do not match. The default is set very large so
-  that it does not accidentally catch legitimate cases.])
+  matching attempt. The value is also used to limit a loop counter in
+  pcre2_dfa_match(). There is a runtime interface for setting a different
+  limit. The limit exists in order to catch runaway regular expressions that
+  take for ever to determine that they do not match. The default is set very
+  large so that it does not accidentally catch legitimate cases.])

# --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth

@@ -745,11 +746,15 @@
the maximum amount of heap memory that is used. The value of
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it must
be less than the value of MATCH_LIMIT. The default is to use the same value
- as MATCH_LIMIT. There is a runtime method for setting a different limit.])
+ as MATCH_LIMIT. There is a runtime method for setting a different limit. In
+ the case of pcre2_dfa_match(), this limit controls the depth of the internal
+ nested function calls that are used for pattern recursions, lookarounds, and
+ atomic groups.])

AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
- This limits the amount of memory that pcre2_match() may use while matching
- a pattern. The value is in kilobytes.])
+ This limits the amount of memory that may be used while matching
+ a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
+ not apply to JIT matching. The value is in kilobytes.])

AC_DEFINE([MAX_NAME_SIZE], [32], [
This limit is parameterized just in case anybody ever wants to

Modified: code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt
===================================================================
--- code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/NON-AUTOTOOLS-BUILD.txt    2018-04-27 16:48:35 UTC (rev 932)
@@ -10,6 +10,7 @@
   Calling conventions in Windows environments
   Comments about Win32 builds
   Building PCRE2 on Windows with CMake
+  Building PCRE2 on Windows with Visual Studio
   Testing with RunTest.bat
   Building PCRE2 on native z/OS and z/VM

@@ -328,8 +329,20 @@
     most recent build configuration is targeted by the tests. A summary of
     test results is presented. Complete test output is subsequently
     available for review in Testing\Temporary under your build dir.
+

+BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO

+The code currently cannot be compiled without a stdint.h header, which is
+available only in relatively recent versions of Visual Studio. However, this
+portable and permissively-licensed implementation of the header worked without
+issue:
+
+ http://www.azillionmonkeys.com/qed/pstdint.h
+
+Just rename it and drop it into the top level of the build tree.
+
+
TESTING WITH RUNTEST.BAT

If configured with CMake, building the test project ("make test" or building
@@ -382,6 +395,6 @@
z/OS file formats. The port provides an API for LE languages such as COBOL and
for the z/OS and z/VM versions of the Rexx languages.

-===============================
-Last Updated: 13 September 2017
-===============================
+===========================
+Last Updated: 19 April 2018
+===========================

Modified: code/trunk/doc/html/README.txt
===================================================================
--- code/trunk/doc/html/README.txt    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/README.txt    2018-04-27 16:48:35 UTC (rev 932)
@@ -241,9 +241,11 @@
   discussion in the pcre2api man page (search for pcre2_set_match_limit).

Modified: code/trunk/doc/html/pcre2_dfa_match.html
===================================================================
--- code/trunk/doc/html/pcre2_dfa_match.html    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/pcre2_dfa_match.html    2018-04-27 16:48:35 UTC (rev 932)
@@ -46,9 +46,9 @@
   <i>wscount</i>      Number of elements in the vector
 </pre>
 For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
-up a callout function or specify the match and/or the recursion depth limits.
-The <i>length</i> and <i>startoffset</i> values are code units, not characters.
-The options are:
+up a callout function or specify the heap limit or the match or the recursion
+depth limits. The <i>length</i> and <i>startoffset</i> values are code units, not
+characters. The options are:
 <pre>
   PCRE2_ANCHORED          Match only at the first position
   PCRE2_ENDANCHORED       Pattern can match only at end of subject

Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/pcre2api.html    2018-04-27 16:48:35 UTC (rev 932)
@@ -951,14 +951,15 @@
 <br>
 The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
 amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
-information when running an interpretive match. This limit does not apply to
-matching with the JIT optimization, which has its own memory control
-arrangements (see the
+information when running an interpretive match. This limit also applies to
+<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
+lot of nested pattern recursion or lookarounds or atomic groups. This limit
+does not apply to matching with the JIT optimization, which has its own memory
+control arrangements (see the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
-documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
-If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
-returned. The default limit is set when PCRE2 is built; the default default is
-very large and is essentially "unlimited".
+documentation for more details). If the limit is reached, the negative error
+code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
+built; the default default is very large and is essentially "unlimited".
 </P>
 <P>
 A value for the heap limit may also be supplied by an item at the start of a
@@ -978,6 +979,12 @@
 is set to a value less than 21 (in particular, zero) no heap memory will be
 used. In this case, only patterns that do not have a lot of nested backtracking
 can be successfully processed.
+</P>
+<P>
+Similarly, for <b>pcre2_dfa_match()</b>, a vector on the system stack is used 
+when processing pattern recursions, lookarounds, or atomic groups, and only if 
+this is not big enough is heap memory used. In this case, too, setting a value 
+of zero disables the use of the heap.
 <br>
 <br>
 <b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
@@ -1035,13 +1042,24 @@
 <P>
 The depth limit is not relevant, and is ignored, when matching is done using
 JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
-uses it to limit the depth of internal recursive function calls that implement
-atomic groups, lookaround assertions, and pattern recursions. This is,
-therefore, an indirect limit on the amount of system stack that is used. A
-recursive pattern such as /(.)(?1)/, when matched to a very long string using
-<b>pcre2_dfa_match()</b>, can use a great deal of stack.
+uses it to limit the depth of nested internal recursive function calls that
+implement atomic groups, lookaround assertions, and pattern recursions. This
+limits, indirectly, the amount of system stack this is used. It was more useful
+in versions before 10.32, when stack memory was used for local workspace
+vectors for recursive function calls. From version 10.32, only local variables
+are allocated on the stack and as each call uses only a few hundred bytes, even
+a small stack can support quite a lot of recursion.
 </P>
 <P>
+If the depth of internal recursive function calls is great enough, local
+workspace vectors are allocated on the heap from version 10.32 onwards, so the
+depth limit also indirectly limits the amount of heap memory that is used. A
+recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
+using <b>pcre2_dfa_match()</b>, can use a great deal of memory. However, it is
+probably better to limit heap usage directly by calling
+<b>pcre2_set_heap_limit()</b>.
+</P>
+<P>
 The default value for the depth limit can be set when PCRE2 is built; the
 default default is the same value as the default for the match limit. If the
 limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
@@ -1096,15 +1114,16 @@
   PCRE2_CONFIG_DEPTHLIMIT
 </pre>
 The output is a uint32_t integer that gives the default limit for the depth of
-nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
-and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
-<b>pcre2_set_depth_limit()</b> above.
+nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions,
+lookarounds, and atomic groups in <b>pcre2_dfa_match()</b>. Further details are
+given with <b>pcre2_set_depth_limit()</b> above.
 <pre>
   PCRE2_CONFIG_HEAPLIMIT
 </pre>
 The output is a uint32_t integer that gives, in kilobytes, the default limit
-for the amount of heap memory used by <b>pcre2_match()</b>. Further details are
-given with <b>pcre2_set_heap_limit()</b> above.
+for the amount of heap memory used by <b>pcre2_match()</b> or 
+<b>pcre2_dfa_match()</b>. Further details are given with
+<b>pcre2_set_heap_limit()</b> above.
 <pre>
   PCRE2_CONFIG_JIT
 </pre>
@@ -3510,17 +3529,7 @@
 Calls to the convenience functions that extract substrings by name
 return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
 DFA match. The convenience functions that extract substrings by number never
-return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
-slightly different:
-<pre>
-  PCRE2_ERROR_UNAVAILABLE
-</pre>
-The ovector is not big enough to include a slot for the given substring number.
-<pre>
-  PCRE2_ERROR_UNSET
-</pre>
-There is a slot in the ovector for this substring, but there were insufficient
-matches to fill it.
+return PCRE2_ERROR_NOSUBSTRING.
 </P>
 <P>
 The matched strings are stored in the ovector in reverse order of length; that
@@ -3594,9 +3603,9 @@
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 31 December 2017
+Last updated: 27 April 2018
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2build.html
===================================================================
--- code/trunk/doc/html/pcre2build.html    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/pcre2build.html    2018-04-27 16:48:35 UTC (rev 932)
@@ -295,9 +295,10 @@
   --with-heap-limit=500
 </pre>
 which limits the amount of heap to 500 kilobytes. This limit applies only to
-interpretive matching in pcre2_match(). It does not apply when JIT (which has
-its own memory arrangements) is used, nor does it apply to
-<b>pcre2_dfa_match()</b>.
+interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
+may also use the heap for internal workspace when processing complicated
+patterns. This limit does not apply when JIT (which has its own memory
+arrangements) is used.
 </P>
 <P>
 You can also explicitly limit the depth of nested backtracking in the
@@ -573,7 +574,7 @@
 </P>
 <br><a name="SEC25" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 25 February 2018
+Last updated: 26 April 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2callout.html
===================================================================
--- code/trunk/doc/html/pcre2callout.html    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/pcre2callout.html    2018-04-27 16:48:35 UTC (rev 932)
@@ -310,10 +310,12 @@
 </P>
 <P>
 For DFA matching, the <i>offset_vector</i> field points to the ovector that was
-passed to the matching function in the match data block, but it holds no useful
-information at callout time because <b>pcre2_dfa_match()</b> does not support
-substring capturing. The value of <i>capture_top</i> is always 1 and the value
-of <i>capture_last</i> is always 0 for DFA matching.
+passed to the matching function in the match data block for callouts at the top
+level, but to an internal ovector during the processing of pattern recursions,
+lookarounds, and atomic groups. However, these ovectors hold no useful
+information because <b>pcre2_dfa_match()</b> does not support substring
+capturing. The value of <i>capture_top</i> is always 1 and the value of
+<i>capture_last</i> is always 0 for DFA matching.
 </P>
 <P>
 The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
@@ -461,9 +463,9 @@
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 22 December 2017
+Last updated: 26 April 2018
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/pcre2pattern.html    2018-04-27 16:48:35 UTC (rev 932)
@@ -173,12 +173,12 @@
 Setting match resource limits
 </b><br>
 <P>
-The pcre2_match() function contains a counter that is incremented every time it
-goes round its main loop. The caller of <b>pcre2_match()</b> can set a limit on
-this counter, which therefore limits the amount of computing resource used for
-a match. The maximum depth of nested backtracking can also be limited; this
-indirectly restricts the amount of heap memory that is used, but there is also
-an explicit memory limit that can be set.
+The <b>pcre2_match()</b> function contains a counter that is incremented every
+time it goes round its main loop. The caller of <b>pcre2_match()</b> can set a
+limit on this counter, which therefore limits the amount of computing resource
+used for a match. The maximum depth of nested backtracking can also be limited;
+this indirectly restricts the amount of heap memory that is used, but there is
+also an explicit memory limit that can be set.
 </P>
 <P>
 These facilities are provided to catch runaway matches that are provoked by
@@ -195,7 +195,8 @@
 be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
 for it to have any effect. In other words, the pattern writer can lower the
 limits set by the programmer, but not raise them. If there is more than one
-setting of one of these limits, the lower value is used.
+setting of one of these limits, the lower value is used. The heap limit is 
+specified in kilobytes.
 </P>
 <P>
 Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
@@ -202,13 +203,14 @@
 still recognized for backwards compatibility.
 </P>
 <P>
-The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
-for matching. It does not apply to JIT or DFA matching. The match limit is used
-(but in a different way) when JIT is being used, or when
-<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
-matching functions. The depth limit is ignored by JIT but is relevant for DFA
-matching, which uses function recursion for recursions within the pattern. In
-this case, the depth limit controls the amount of system stack that is used.
+The heap limit applies only when the <b>pcre2_match()</b> or
+<b>pcre2_dfa_match()</b> interpreters are used for matching. It does not apply
+to JIT. The match limit is used (but in a different way) when JIT is being
+used, or when <b>pcre2_dfa_match()</b> is called, to limit computing resource
+usage by those matching functions. The depth limit is ignored by JIT but is
+relevant for DFA matching, which uses function recursion for recursions within
+the pattern and for lookaround assertions and atomic groups. In this case, the
+depth limit controls the depth of such recursion.
 <a name="newlines"></a></P>
 <br><b>
 Newline conventions
@@ -2818,11 +2820,6 @@
 (temporarily) set at a deeper level during the matching process.
 </P>
 <P>
-If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
-obtain extra memory from the heap to store data during a recursion. If no
-memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
-</P>
-<P>
 Do not confuse the (?R) item with the condition (R), which tests for recursion.
 Consider this pattern, which matches text in angle brackets, allowing for
 arbitrary nesting. Only digits are allowed in nested brackets (that is, when
@@ -3479,9 +3476,9 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 12 September 2017
+Last updated: 25 April 2018
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2perform.html
===================================================================
--- code/trunk/doc/html/pcre2perform.html    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/pcre2perform.html    2018-04-27 16:48:35 UTC (rev 932)
@@ -93,10 +93,18 @@
 <P>
 In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
 function calls, but only for processing atomic groups, lookaround assertions,
-and recursion within the pattern. Too much nested recursion may cause stack
-issues. The "match depth" parameter can be used to limit the depth of function
-recursion in <b>pcre2_dfa_match()</b>.
+and recursion within the pattern. The original version of the code used to
+allocate quite large internal workspace vectors on the stack, which caused some 
+problems for some patterns in environments with small stacks. From release
+10.32 the code for <b>pcre2_dfa_match()</b> has been re-factored to use heap
+memory when necessary for internal workspace when recursing, though recursive
+function calls are still used.
 </P>
+<P>
+The "match depth" parameter can be used to limit the depth of function
+recursion, and the "match heap" parameter to limit heap memory in
+<b>pcre2_dfa_match()</b>.
+</P>
 <br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
 <P>
 Certain items in regular expression patterns are processed more efficiently
@@ -244,9 +252,9 @@
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 08 April 2017
+Last updated: 25 April 2018
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/html/pcre2test.html    2018-04-27 16:48:35 UTC (rev 932)
@@ -1199,7 +1199,7 @@
       get=&#60;number or name&#62;       extract captured substring
       getall                     extract all captured substrings
   /g  global                     global matching
-      heap_limit=&#60;n&#62;             set a limit on heap memory
+      heap_limit=&#60;n&#62;             set a limit on heap memory (Kbytes)
       jitstack=&#60;n&#62;               set size of JIT stack
       mark                       show mark values
       match_limit=&#60;n&#62;            set a match limit
@@ -1438,22 +1438,19 @@
 <P>
 If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
 calls the relevant matching function several times, setting different values in
-the match context via <b>pcre2_set_heap_limit(), \fBpcre2_set_match_limit()</b>,
-or <b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
-parameter that allows the match to complete without error.
+the match context via <b>pcre2_set_heap_limit()</b>,
+<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
+the minimum values for each parameter that allows the match to complete without
+error. If JIT is being used, only the match limit is relevant.
 </P>
 <P>
-If JIT is being used, only the match limit is relevant. If DFA matching is
-being used, only the depth limit is relevant.
+When using this modifier, the pattern should not contain any limit settings 
+such as (*LIMIT_MATCH=...) within it. If such a setting is present and is 
+lower than the minimum matching value, the minimum value cannot be found 
+because <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of 
+an in-pattern limit; they cannot increase it.
 </P>
 <P>
-The <i>match_limit</i> number is a measure of the amount of backtracking
-that takes place, and learning the minimum value can be instructive. For most
-simple matches, the number is quite small, but for patterns with very large
-numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string.
-</P>
-<P>
 For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
 much nested backtracking happens (that is, how deeply the pattern's tree is
 searched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
@@ -1460,6 +1457,22 @@
 recursive calls of the internal function that is used for handling pattern
 recursion, lookaround assertions, and atomic groups.
 </P>
+<P>
+For non-DFA matching, the <i>match_limit</i> number is a measure of the amount
+of backtracking that takes place, and learning the minimum value can be
+instructive. For most simple matches, the number is quite small, but for
+patterns with very large numbers of matching possibilities, it can become large
+very quickly with increasing length of subject string. In the case of DFA 
+matching, <i>match_limit</i> controls the total number of calls, both recursive 
+and non-recursive, to the internal matching function, thus controlling the 
+overall amount of computing resource that is used.
+</P>
+<P>
+For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes) 
+limits the amount of heap memory used for matching. A value of zero disables 
+the use of any heap memory; many simple pattern matches can be done without 
+using the heap, so this is not an unreasonable setting.
+</P>
 <br><b>
 Showing MARK names
 </b><br>
@@ -1476,13 +1489,14 @@
 <P>
 The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
 memory allocation and freeing calls that occur during a call to
-<b>pcre2_match()</b>. These occur only when a match requires a bigger vector
-than the default for remembering backtracking points. In many cases there will
-be no heap memory used and therefore no additional output. No heap memory is
-allocated during matching with <b>pcre2_dfa_match</b> or with JIT, so in those
-cases the <b>memory</b> modifier never has any effect. For this modifier to
-work, the <b>null_context</b> modifier must not be set on both the pattern and
-the subject, though it can be set on one or the other.
+<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. These occur only when a match
+requires a bigger vector than the default for remembering backtracking points
+(<b>pcre2_match()</b>) or for internal workspace (<b>pcre2_dfa_match()</b>). In
+many cases there will be no heap memory used and therefore no additional
+output. No heap memory is allocated during matching with JIT, so in that case
+the <b>memory</b> modifier never has any effect. For this modifier to work, the
+<b>null_context</b> modifier must not be set on both the pattern and the
+subject, though it can be set on one or the other.
 </P>
 <br><b>
 Setting a starting offset
@@ -1982,9 +1996,9 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 21 December 2017
+Last updated: 25 April 2018
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.

Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2.txt    2018-04-27 16:48:35 UTC (rev 932)
@@ -959,13 +959,15 @@

        The heap_limit parameter specifies, in units of kilobytes, the  maximum
        amount  of  heap memory that pcre2_match() may use to hold backtracking
-       information when running an interpretive match.  This  limit  does  not
-       apply  to  matching with the JIT optimization, which has its own memory
-       control arrangements (see the pcre2jit documentation for more details),
-       nor  does  it apply to pcre2_dfa_match().  If the limit is reached, the
-       negative error code  PCRE2_ERROR_HEAPLIMIT  is  returned.  The  default
-       limit is set when PCRE2 is built; the default default is very large and
-       is essentially "unlimited".
+       information when running an interpretive match. This limit also applies
+       to  pcre2_dfa_match(),  which may use the heap when processing patterns
+       with a lot of nested pattern recursion or lookarounds or atomic groups.
+       This  limit does not apply to matching with the JIT optimization, which
+       has its own memory control arrangements (see the pcre2jit documentation
+       for  more  details).  If  the limit is reached, the negative error code
+       PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when  PCRE2
+       is  built; the default default is very large and is essentially "unlim-
+       ited".

        A value for the heap limit may also be supplied by an item at the start
        of a pattern of the form
@@ -984,38 +986,43 @@
        zero) no heap memory will be used. In this case, only patterns that  do
        not have a lot of nested backtracking can be successfully processed.

+       Similarly,  for pcre2_dfa_match(), a vector on the system stack is used
+       when processing pattern recursions, lookarounds, or atomic groups,  and
+       only  if this is not big enough is heap memory used. In this case, too,
+       setting a value of zero disables the use of the heap.
+
        int pcre2_set_match_limit(pcre2_match_context *mcontext,
          uint32_t value);

-       The  match_limit  parameter  provides  a means of preventing PCRE2 from
+       The match_limit parameter provides a means  of  preventing  PCRE2  from
        using up too many computing resources when processing patterns that are
        not going to match, but which have a very large number of possibilities
-       in their search trees. The classic  example  is  a  pattern  that  uses
+       in  their  search  trees.  The  classic  example is a pattern that uses
        nested unlimited repeats.

-       There  is an internal counter in pcre2_match() that is incremented each
-       time round its main matching loop. If  this  value  reaches  the  match
+       There is an internal counter in pcre2_match() that is incremented  each
+       time  round  its  main  matching  loop. If this value reaches the match
        limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT.
-       This has the effect of limiting the amount  of  backtracking  that  can
+       This  has  the  effect  of limiting the amount of backtracking that can
        take place. For patterns that are not anchored, the count restarts from
-       zero for each position in the subject string. This limit  also  applies
+       zero  for  each position in the subject string. This limit also applies
        to pcre2_dfa_match(), though the counting is done in a different way.

-       When  pcre2_match() is called with a pattern that was successfully pro-
+       When pcre2_match() is called with a pattern that was successfully  pro-
        cessed by pcre2_jit_compile(), the way in which matching is executed is
-       entirely  different. However, there is still the possibility of runaway
-       matching that goes on for a very long  time,  and  so  the  match_limit
-       value  is  also used in this case (but in a different way) to limit how
+       entirely different. However, there is still the possibility of  runaway
+       matching  that  goes  on  for  a very long time, and so the match_limit
+       value is also used in this case (but in a different way) to  limit  how
        long the matching can continue.

-       The default value for the limit can be set when  PCRE2  is  built;  the
-       default  default  is 10 million, which handles all but the most extreme
-       cases. A value for the match limit may also be supplied by an  item  at
+       The  default  value  for  the limit can be set when PCRE2 is built; the
+       default default is 10 million, which handles all but the  most  extreme
+       cases.  A  value for the match limit may also be supplied by an item at
        the start of a pattern of the form

          (*LIMIT_MATCH=ddd)

-       where  ddd  is  a  decimal  number.  However, such a setting is ignored
+       where ddd is a decimal number.  However,  such  a  setting  is  ignored
        unless ddd is less than the limit set by the caller of pcre2_match() or
        pcre2_dfa_match() or, if no such limit is set, less than the default.

@@ -1022,33 +1029,43 @@
        int pcre2_set_depth_limit(pcre2_match_context *mcontext,
          uint32_t value);

-       This   parameter   limits   the   depth   of   nested  backtracking  in
-       pcre2_match().  Each time a nested backtracking point is passed, a  new
+       This  parameter  limits   the   depth   of   nested   backtracking   in
+       pcre2_match().   Each time a nested backtracking point is passed, a new
        memory "frame" is used to remember the state of matching at that point.
-       Thus, this parameter indirectly limits the amount  of  memory  that  is
-       used  in  a  match.  However,  because  the size of each memory "frame"
+       Thus,  this  parameter  indirectly  limits the amount of memory that is
+       used in a match. However, because  the  size  of  each  memory  "frame"
        depends on the number of capturing parentheses, the actual memory limit
-       varies  from pattern to pattern. This limit was more useful in versions
+       varies from pattern to pattern. This limit was more useful in  versions
        before 10.30, where function recursion was used for backtracking.

-       The depth limit is not relevant, and is ignored, when matching is  done
+       The  depth limit is not relevant, and is ignored, when matching is done
        using JIT compiled code. However, it is supported by pcre2_dfa_match(),
-       which uses it to limit the depth of internal recursive  function  calls
-       that implement atomic groups, lookaround assertions, and pattern recur-
-       sions. This is, therefore, an indirect limit on the  amount  of  system
-       stack that is used. A recursive pattern such as /(.)(?1)/, when matched
-       to a very long string using pcre2_dfa_match(), can use a great deal  of
-       stack.
+       which  uses it to limit the depth of nested internal recursive function
+       calls that implement atomic groups, lookaround assertions, and  pattern
+       recursions. This limits, indirectly, the amount of system stack this is
+       used. It was more useful in versions before 10.32,  when  stack  memory
+       was used for local workspace vectors for recursive function calls. From
+       version 10.32, only local variables are allocated on the stack  and  as
+       each call uses only a few hundred bytes, even a small stack can support
+       quite a lot of recursion.

-       The  default  value for the depth limit can be set when PCRE2 is built;
-       the default default is the same value as  the  default  for  the  match
-       limit.  If  the  limit  is exceeded, pcre2_match() or pcre2_dfa_match()
+       If the depth of internal recursive  function  calls  is  great  enough,
+       local  workspace  vectors  are allocated on the heap from version 10.32
+       onwards, so the depth limit also indirectly limits the amount  of  heap
+       memory that is used. A recursive pattern such as /(.(?2))((?1)|)/, when
+       matched to a very long string using pcre2_dfa_match(), can use a  great
+       deal  of  memory.  However,  it  is probably better to limit heap usage
+       directly by calling pcre2_set_heap_limit().
+
+       The default value for the depth limit can be set when PCRE2  is  built;
+       the  default  default  is  the  same value as the default for the match
+       limit. If the limit is  exceeded,  pcre2_match()  or  pcre2_dfa_match()
        returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
        supplied by an item at the start of a pattern of the form

          (*LIMIT_DEPTH=ddd)

-       where  ddd  is  a  decimal  number.  However, such a setting is ignored
+       where ddd is a decimal number.  However,  such  a  setting  is  ignored
        unless ddd is less than the limit set by the caller of pcre2_match() or
        pcre2_dfa_match() or, if no such limit is set, less than the default.

@@ -1057,52 +1074,53 @@

        int pcre2_config(uint32_t what, void *where);

-       The  function  pcre2_config()  makes  it possible for a PCRE2 client to
-       discover which optional features have  been  compiled  into  the  PCRE2
-       library.  The  pcre2build  documentation  has  more details about these
+       The function pcre2_config() makes it possible for  a  PCRE2  client  to
+       discover  which  optional  features  have  been compiled into the PCRE2
+       library. The pcre2build documentation  has  more  details  about  these
        optional features.

-       The first argument for pcre2_config() specifies  which  information  is
-       required.  The  second  argument  is a pointer to memory into which the
-       information is placed. If NULL is  passed,  the  function  returns  the
-       amount  of  memory  that  is  needed for the requested information. For
-       calls that return  numerical  values,  the  value  is  in  bytes;  when
-       requesting  these  values,  where should point to appropriately aligned
-       memory. For calls that return strings, the required length is given  in
+       The  first  argument  for pcre2_config() specifies which information is
+       required. The second argument is a pointer to  memory  into  which  the
+       information  is  placed.  If  NULL  is passed, the function returns the
+       amount of memory that is needed  for  the  requested  information.  For
+       calls  that  return  numerical  values,  the  value  is  in bytes; when
+       requesting these values, where should point  to  appropriately  aligned
+       memory.  For calls that return strings, the required length is given in
        code units, not counting the terminating zero.

-       When  requesting information, the returned value from pcre2_config() is
-       non-negative on success, or the negative error code  PCRE2_ERROR_BADOP-
-       TION  if the value in the first argument is not recognized. The follow-
+       When requesting information, the returned value from pcre2_config()  is
+       non-negative  on success, or the negative error code PCRE2_ERROR_BADOP-
+       TION if the value in the first argument is not recognized. The  follow-
        ing information is available:

          PCRE2_CONFIG_BSR

-       The output is a uint32_t integer whose value indicates  what  character
-       sequences  the  \R  escape  sequence  matches  by  default.  A value of
+       The  output  is a uint32_t integer whose value indicates what character
+       sequences the \R  escape  sequence  matches  by  default.  A  value  of
        PCRE2_BSR_UNICODE  means  that  \R  matches  any  Unicode  line  ending
-       sequence;  a  value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
+       sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches  only  CR,
        LF, or CRLF. The default can be overridden when a pattern is compiled.

          PCRE2_CONFIG_COMPILED_WIDTHS

-       The output is a uint32_t integer whose lower bits indicate  which  code
-       unit  widths  were  selected  when PCRE2 was built. The 1-bit indicates
-       8-bit support, and the 2-bit and 4-bit indicate 16-bit and 32-bit  sup-
+       The  output  is a uint32_t integer whose lower bits indicate which code
+       unit widths were selected when PCRE2 was  built.  The  1-bit  indicates
+       8-bit  support, and the 2-bit and 4-bit indicate 16-bit and 32-bit sup-
        port, respectively.

          PCRE2_CONFIG_DEPTHLIMIT

-       The  output  is a uint32_t integer that gives the default limit for the
-       depth of nested backtracking in pcre2_match() or the  depth  of  nested
-       recursions  and  lookarounds  in pcre2_dfa_match(). Further details are
-       given with pcre2_set_depth_limit() above.
+       The output is a uint32_t integer that gives the default limit  for  the
+       depth  of  nested  backtracking in pcre2_match() or the depth of nested
+       recursions, lookarounds, and atomic groups in  pcre2_dfa_match().  Fur-
+       ther details are given with pcre2_set_depth_limit() above.

          PCRE2_CONFIG_HEAPLIMIT

-       The output is a uint32_t integer that gives, in kilobytes, the  default
-       limit  for  the  amount  of  heap memory used by pcre2_match(). Further
-       details are given with pcre2_set_heap_limit() above.
+       The  output is a uint32_t integer that gives, in kilobytes, the default
+       limit  for  the  amount  of  heap  memory  used  by  pcre2_match()   or
+       pcre2_dfa_match().      Further      details     are     given     with
+       pcre2_set_heap_limit() above.

          PCRE2_CONFIG_JIT

@@ -3396,74 +3414,63 @@
        Calls to the convenience functions  that  extract  substrings  by  name
        return  the  error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used
        after a DFA match. The convenience functions that extract substrings by
-       number  never  return PCRE2_ERROR_NOSUBSTRING, and the meanings of some
-       other errors are slightly different:
+       number never return PCRE2_ERROR_NOSUBSTRING.

-         PCRE2_ERROR_UNAVAILABLE
-
-       The ovector is not big enough to include a slot for the given substring
-       number.
-
-         PCRE2_ERROR_UNSET
-
-       There  is  a  slot  in  the  ovector for this substring, but there were
-       insufficient matches to fill it.
-
-       The matched strings are stored in  the  ovector  in  reverse  order  of
-       length;  that  is,  the longest matching string is first. If there were
-       too many matches to fit into the ovector, the yield of the function  is
+       The  matched  strings  are  stored  in  the ovector in reverse order of
+       length; that is, the longest matching string is first.  If  there  were
+       too  many matches to fit into the ovector, the yield of the function is
        zero, and the vector is filled with the longest matches.

-       NOTE:  PCRE2's  "auto-possessification" optimization usually applies to
-       character repeats at the end of a pattern (as well as internally).  For
-       example,  the pattern "a\d+" is compiled as if it were "a\d++". For DFA
-       matching, this means that only one possible  match  is  found.  If  you
-       really  do  want multiple matches in such cases, either use an ungreedy
-       repeat such as "a\d+?" or set  the  PCRE2_NO_AUTO_POSSESS  option  when
+       NOTE: PCRE2's "auto-possessification" optimization usually  applies  to
+       character  repeats at the end of a pattern (as well as internally). For
+       example, the pattern "a\d+" is compiled as if it were "a\d++". For  DFA
+       matching,  this  means  that  only  one possible match is found. If you
+       really do want multiple matches in such cases, either use  an  ungreedy
+       repeat  such  as  "a\d+?"  or set the PCRE2_NO_AUTO_POSSESS option when
        compiling.

    Error returns from pcre2_dfa_match()

        The pcre2_dfa_match() function returns a negative number when it fails.
-       Many of the errors are the same  as  for  pcre2_match(),  as  described
+       Many  of  the  errors  are  the same as for pcre2_match(), as described
        above.  There are in addition the following errors that are specific to
        pcre2_dfa_match():

          PCRE2_ERROR_DFA_UITEM

-       This return is given if pcre2_dfa_match() encounters  an  item  in  the
-       pattern  that it does not support, for instance, the use of \C in a UTF
+       This  return  is  given  if pcre2_dfa_match() encounters an item in the
+       pattern that it does not support, for instance, the use of \C in a  UTF
        mode or a back reference.

          PCRE2_ERROR_DFA_UCOND

-       This return is given if pcre2_dfa_match() encounters a  condition  item
-       that  uses  a back reference for the condition, or a test for recursion
+       This  return  is given if pcre2_dfa_match() encounters a condition item
+       that uses a back reference for the condition, or a test  for  recursion
        in a specific group. These are not supported.

          PCRE2_ERROR_DFA_WSSIZE

-       This return is given if pcre2_dfa_match() runs  out  of  space  in  the
+       This  return  is  given  if  pcre2_dfa_match() runs out of space in the
        workspace vector.

          PCRE2_ERROR_DFA_RECURSE

-       When  a  recursive subpattern is processed, the matching function calls
+       When a recursive subpattern is processed, the matching  function  calls
        itself recursively, using private memory for the ovector and workspace.
-       This  error  is given if the internal ovector is not large enough. This
+       This error is given if the internal ovector is not large  enough.  This
        should be extremely rare, as a vector of size 1000 is used.

          PCRE2_ERROR_DFA_BADRESTART

-       When pcre2_dfa_match() is called  with  the  PCRE2_DFA_RESTART  option,
-       some  plausibility  checks  are  made on the contents of the workspace,
-       which should contain data about the previous partial match. If  any  of
+       When  pcre2_dfa_match()  is  called  with the PCRE2_DFA_RESTART option,
+       some plausibility checks are made on the  contents  of  the  workspace,
+       which  should  contain data about the previous partial match. If any of
        these checks fail, this error is given.

SEE ALSO

-       pcre2build(3),    pcre2callout(3),    pcre2demo(3),   pcre2matching(3),
+       pcre2build(3),   pcre2callout(3),    pcre2demo(3),    pcre2matching(3),
        pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).

@@ -3476,8 +3483,8 @@

REVISION

-       Last updated: 31 December 2017
-       Copyright (c) 1997-2017 University of Cambridge.
+       Last updated: 27 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -3746,28 +3753,29 @@
          --with-heap-limit=500

        which limits the amount of heap to 500 kilobytes.  This  limit  applies
-       only  to interpretive matching in pcre2_match(). It does not apply when
-       JIT (which has its own memory arrangements) is used, nor does it  apply
-       to pcre2_dfa_match().
+       only  to  interpretive matching in pcre2_match() and pcre2_dfa_match(),
+       which may also use the heap for internal workspace when processing com-
+       plicated  patterns.  This  limit does not apply when JIT (which has its
+       own memory arrangements) is used.

-       You  can  also explicitly limit the depth of nested backtracking in the
+       You can also explicitly limit the depth of nested backtracking  in  the
        pcre2_match() interpreter. This limit defaults to the value that is set
-       for  --with-match-limit.  You  can set a lower default limit by adding,
+       for --with-match-limit. You can set a lower default  limit  by  adding,
        for example,

          --with-match-limit_depth=10000

-       to the configure command. This value can be  overridden  at  run  time.
-       This  depth  limit  indirectly limits the amount of heap memory that is
-       used, but because the size of each backtracking "frame" depends on  the
-       number  of  capturing parentheses in a pattern, the amount of heap that
-       is used before the limit is reached varies  from  pattern  to  pattern.
-       This  limit  was  more  useful in versions before 10.30, where function
+       to  the  configure  command.  This value can be overridden at run time.
+       This depth limit indirectly limits the amount of heap  memory  that  is
+       used,  but because the size of each backtracking "frame" depends on the
+       number of capturing parentheses in a pattern, the amount of  heap  that
+       is  used  before  the  limit is reached varies from pattern to pattern.
+       This limit was more useful in versions  before  10.30,  where  function
        recursion was used for backtracking.

        As well as applying to pcre2_match(), the depth limit also controls the
-       depth  of recursive function calls in pcre2_dfa_match(). These are used
-       for lookaround assertions, atomic groups,  and  recursion  within  pat-
+       depth of recursive function calls in pcre2_dfa_match(). These are  used
+       for  lookaround  assertions,  atomic  groups, and recursion within pat-
        terns.  The limit does not apply to JIT matching.

@@ -3775,24 +3783,24 @@

        PCRE2 uses fixed tables for processing characters whose code points are
        less than 256. By default, PCRE2 is built with a set of tables that are
-       distributed  in  the file src/pcre2_chartables.c.dist. These tables are
+       distributed in the file src/pcre2_chartables.c.dist. These  tables  are
        for ASCII codes only. If you add

          --enable-rebuild-chartables

-       to the configure command, the distributed tables are  no  longer  used.
-       Instead,  a  program  called dftables is compiled and run. This outputs
+       to  the  configure  command, the distributed tables are no longer used.
+       Instead, a program called dftables is compiled and  run.  This  outputs
        the source for new set of tables, created in the default locale of your
        C run-time system. This method of replacing the tables does not work if
-       you are cross compiling, because dftables is run on the local host.  If
-       you  need  to  create alternative tables when cross compiling, you will
+       you  are cross compiling, because dftables is run on the local host. If
+       you need to create alternative tables when cross  compiling,  you  will
        have to do so "by hand".

USING EBCDIC CODE

-       PCRE2 assumes by default that it will run in an environment  where  the
-       character  code is ASCII or Unicode, which is a superset of ASCII. This
+       PCRE2  assumes  by default that it will run in an environment where the
+       character code is ASCII or Unicode, which is a superset of ASCII.  This
        is the case for most computer operating systems. PCRE2 can, however, be
        compiled to run in an 8-bit EBCDIC environment by adding

@@ -3799,21 +3807,21 @@
          --enable-ebcdic --disable-unicode

        to the configure command. This setting implies --enable-rebuild-charta-
-       bles. You should only use it if you know that  you  are  in  an  EBCDIC
+       bles.  You  should  only  use  it if you know that you are in an EBCDIC
        environment (for example, an IBM mainframe operating system).

-       It  is  not possible to support both EBCDIC and UTF-8 codes in the same
-       version of the library. Consequently,  --enable-unicode  and  --enable-
+       It is not possible to support both EBCDIC and UTF-8 codes in  the  same
+       version  of  the  library. Consequently, --enable-unicode and --enable-
        ebcdic are mutually exclusive.

        The EBCDIC character that corresponds to an ASCII LF is assumed to have
-       the value 0x15 by default. However, in some EBCDIC  environments,  0x25
+       the  value  0x15 by default. However, in some EBCDIC environments, 0x25
        is used. In such an environment you should use

          --enable-ebcdic-nl25

        as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
-       has the same value as in ASCII, namely, 0x0d.  Whichever  of  0x15  and
+       has  the  same  value  as in ASCII, namely, 0x0d. Whichever of 0x15 and
        0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
        acter (which, in Unicode, is 0x85).

@@ -3826,15 +3834,15 @@

        By default, on non-Windows systems, pcre2grep supports the use of call-
        outs with string arguments within the patterns it is matching, in order
-       to run external scripts. For details, see the pcre2grep  documentation.
-       This  support  can be disabled by adding --disable-pcre2grep-callout to
+       to  run external scripts. For details, see the pcre2grep documentation.
+       This support can be disabled by adding  --disable-pcre2grep-callout  to
        the configure command.

PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT

-       By default, pcre2grep reads all files as plain text. You can  build  it
-       so  that  it recognizes files whose names end in .gz or .bz2, and reads
+       By  default,  pcre2grep reads all files as plain text. You can build it
+       so that it recognizes files whose names end in .gz or .bz2,  and  reads
        them with libz or libbz2, respectively, by adding one or both of

          --enable-pcre2grep-libz
@@ -3841,19 +3849,19 @@
          --enable-pcre2grep-libbz2

        to the configure command. These options naturally require that the rel-
-       evant  libraries  are installed on your system. Configuration will fail
+       evant libraries are installed on your system. Configuration  will  fail
        if they are not.

PCRE2GREP BUFFER SIZE

-       pcre2grep uses an internal buffer to hold a "window" on the file it  is
+       pcre2grep  uses an internal buffer to hold a "window" on the file it is
        scanning, in order to be able to output "before" and "after" lines when
-       it finds a match. The starting size of the buffer is  controlled  by  a
-       parameter  whose default value is 20K. The buffer itself is three times
-       this size, but because of the way  it  is  used  for  holding  "before"
-       lines,  the  longest  line  that is guaranteed to be processable is the
-       parameter size. If a longer line is  encountered,  pcre2grep  automati-
+       it  finds  a  match. The starting size of the buffer is controlled by a
+       parameter whose default value is 20K. The buffer itself is three  times
+       this  size,  but  because  of  the  way it is used for holding "before"
+       lines, the longest line that is guaranteed to  be  processable  is  the
+       parameter  size.  If  a longer line is encountered, pcre2grep automati-
        cally expands the buffer, up to a specified maximum size, whose default
        is 1M or the starting size, whichever is the larger. You can change the
        default parameter values by adding, for example,
@@ -3861,8 +3869,8 @@
          --with-pcre2grep-bufsize=51200
          --with-pcre2grep-max-bufsize=2097152

-       to  the  configure  command. The caller of pcre2grep can override these
-       values by using --buffer-size  and  --max-buffer-size  on  the  command
+       to the configure command. The caller of pcre2grep  can  override  these
+       values  by  using  --buffer-size  and  --max-buffer-size on the command
        line.

@@ -3873,19 +3881,19 @@
          --enable-pcre2test-libreadline
          --enable-pcre2test-libedit

-       to  the  configure  command,  pcre2test  is linked with the libreadline
+       to the configure command, pcre2test  is  linked  with  the  libreadline
        orlibedit library, respectively, and when its input is from a terminal,
-       it  reads  it using the readline() function. This provides line-editing
-       and history facilities. Note that libreadline is  GPL-licensed,  so  if
-       you  distribute  a binary of pcre2test linked in this way, there may be
+       it reads it using the readline() function. This  provides  line-editing
+       and  history  facilities.  Note that libreadline is GPL-licensed, so if
+       you distribute a binary of pcre2test linked in this way, there  may  be
        licensing issues. These can be avoided by linking instead with libedit,
        which has a BSD licence.

-       Setting  --enable-pcre2test-libreadline causes the -lreadline option to
-       be added to the pcre2test build. In many operating environments with  a
-       sytem-installed  readline  library this is sufficient. However, in some
+       Setting --enable-pcre2test-libreadline causes the -lreadline option  to
+       be  added to the pcre2test build. In many operating environments with a
+       sytem-installed readline library this is sufficient. However,  in  some
        environments (e.g. if an unmodified distribution version of readline is
-       in  use),  some  extra configuration may be necessary. The INSTALL file
+       in use), some extra configuration may be necessary.  The  INSTALL  file
        for libreadline says this:

          "Readline uses the termcap functions, but does not link with
@@ -3892,7 +3900,7 @@
          the termcap or curses library itself, allowing applications
          which link with readline the to choose an appropriate library."

-       If your environment has not been set up so that an appropriate  library
+       If  your environment has not been set up so that an appropriate library
        is automatically included, you may need to add something like

          LIBS="-ncurses"
@@ -3906,7 +3914,7 @@

          --enable-debug

-       to  the configure command, additional debugging code is included in the
+       to the configure command, additional debugging code is included in  the
        build. This feature is intended for use by the PCRE2 maintainers.

@@ -3916,15 +3924,15 @@

          --enable-valgrind

-       to the configure command, PCRE2 will use valgrind annotations  to  mark
-       certain  memory  regions  as  unaddressable.  This  allows it to detect
-       invalid memory accesses, and  is  mostly  useful  for  debugging  PCRE2
+       to  the  configure command, PCRE2 will use valgrind annotations to mark
+       certain memory regions as  unaddressable.  This  allows  it  to  detect
+       invalid  memory  accesses,  and  is  mostly  useful for debugging PCRE2
        itself.

CODE COVERAGE REPORTING

-       If  your  C  compiler is gcc, you can build a version of PCRE2 that can
+       If your C compiler is gcc, you can build a version of  PCRE2  that  can
        generate a code coverage report for its test suite. To enable this, you
        must install lcov version 1.6 or above. Then specify

@@ -3933,7 +3941,7 @@
        to the configure command and build PCRE2 in the usual way.

        Note that using ccache (a caching C compiler) is incompatible with code
-       coverage reporting. If you have configured ccache to run  automatically
+       coverage  reporting. If you have configured ccache to run automatically
        on your system, you must set the environment variable

          CCACHE_DISABLE=1
@@ -3940,13 +3948,13 @@

        before running make to build PCRE2, so that ccache is not used.

-       When  --enable-coverage  is  used,  the  following addition targets are
+       When --enable-coverage is used,  the  following  addition  targets  are
        added to the Makefile:

          make coverage

-       This creates a fresh coverage report for the PCRE2 test  suite.  It  is
-       equivalent  to running "make coverage-reset", "make coverage-baseline",
+       This  creates  a  fresh coverage report for the PCRE2 test suite. It is
+       equivalent to running "make coverage-reset", "make  coverage-baseline",
        "make check", and then "make coverage-report".

          make coverage-reset
@@ -3963,56 +3971,56 @@

          make coverage-clean-report

-       This removes the generated coverage report without cleaning the  cover-
+       This  removes the generated coverage report without cleaning the cover-
        age data itself.

          make coverage-clean-data

-       This  removes  the captured coverage data without removing the coverage
+       This removes the captured coverage data without removing  the  coverage
        files created at compile time (*.gcno).

          make coverage-clean

-       This cleans all coverage data including the generated coverage  report.
-       For  more  information about code coverage, see the gcov and lcov docu-
+       This  cleans all coverage data including the generated coverage report.
+       For more information about code coverage, see the gcov and  lcov  docu-
        mentation.

SUPPORT FOR FUZZERS

-       There is a special option for use by people who  want  to  run  fuzzing
+       There  is  a  special  option for use by people who want to run fuzzing
        tests on PCRE2:

          --enable-fuzz-support

        At present this applies only to the 8-bit library. If set, it causes an
-       extra library  called  libpcre2-fuzzsupport.a  to  be  built,  but  not
-       installed.  This contains a single function called LLVMFuzzerTestOneIn-
-       put() whose arguments are a pointer to a string and the length  of  the
-       string.  When  called,  this  function tries to compile the string as a
-       pattern, and if that succeeds, to match it.  This is done both with  no
-       options  and  with some random options bits that are generated from the
+       extra  library  called  libpcre2-fuzzsupport.a  to  be  built,  but not
+       installed. This contains a single function called  LLVMFuzzerTestOneIn-
+       put()  whose  arguments are a pointer to a string and the length of the
+       string. When called, this function tries to compile  the  string  as  a
+       pattern,  and if that succeeds, to match it.  This is done both with no
+       options and with some random options bits that are generated  from  the
        string.

-       Setting --enable-fuzz-support also causes  a  binary  called  pcre2fuz-
-       zcheck  to be created. This is normally run under valgrind or used when
+       Setting  --enable-fuzz-support  also  causes  a binary called pcre2fuz-
+       zcheck to be created. This is normally run under valgrind or used  when
        PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
-       function  and  outputs information about it is doing. The input strings
-       are specified by arguments: if an argument starts with "=" the rest  of
-       it  is  a  literal  input string. Otherwise, it is assumed to be a file
+       function and outputs information about it is doing. The  input  strings
+       are  specified by arguments: if an argument starts with "=" the rest of
+       it is a literal input string. Otherwise, it is assumed  to  be  a  file
        name, and the contents of the file are the test string.

OBSOLETE OPTION

-       In versions of PCRE2 prior to 10.30, there were two  ways  of  handling
-       backtracking  in the pcre2_match() function. The default was to use the
+       In  versions  of  PCRE2 prior to 10.30, there were two ways of handling
+       backtracking in the pcre2_match() function. The default was to use  the
        system stack, but if

          --disable-stack-for-recursion

-       was set, memory on the heap was used. From release 10.30  onwards  this
-       has  changed  (the  stack  is  no longer used) and this option now does
+       was  set,  memory on the heap was used. From release 10.30 onwards this
+       has changed (the stack is no longer used)  and  this  option  now  does
        nothing except give a warning.

@@ -4030,7 +4038,7 @@

REVISION

-       Last updated: 25 February 2018
+       Last updated: 26 April 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -4311,10 +4319,12 @@
        their ovector slots set to PCRE2_UNSET.

        For  DFA  matching,  the offset_vector field points to the ovector that
-       was passed to the matching function in the match  data  block,  but  it
-       holds  no  useful information at callout time because pcre2_dfa_match()
-       does not support substring  capturing.  The  value  of  capture_top  is
-       always 1 and the value of capture_last is always 0 for DFA matching.
+       was passed to the matching function in the match data block  for  call-
+       outs at the top level, but to an internal ovector during the processing
+       of pattern recursions, lookarounds, and atomic groups.  However,  these
+       ovectors  hold no useful information because pcre2_dfa_match() does not
+       support substring capturing. The value of capture_top is always  1  and
+       the value of capture_last is always 0 for DFA matching.

        The subject and subject_length fields contain copies of the values that
        were passed to the matching function.
@@ -4454,8 +4464,8 @@

REVISION

-       Last updated: 22 December 2017
-       Copyright (c) 1997-2017 University of Cambridge.
+       Last updated: 26 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -5919,19 +5929,19 @@
        pcre2_match() for it to have any effect. In other  words,  the  pattern
        writer  can lower the limits set by the programmer, but not raise them.
        If there is more than one setting of one of  these  limits,  the  lower
-       value is used.
+       value is used. The heap limit is specified in kilobytes.

        Prior  to  release  10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This
        name is still recognized for backwards compatibility.

-       The heap limit applies only when the pcre2_match() interpreter is  used
-       for matching. It does not apply to JIT or DFA matching. The match limit
-       is used (but in a different way)  when  JIT  is  being  used,  or  when
+       The heap limit applies only when the pcre2_match() or pcre2_dfa_match()
+       interpreters are used for matching. It does not apply to JIT. The match
+       limit is used (but in a different way) when JIT is being used, or  when
        pcre2_dfa_match() is called, to limit computing resource usage by those
        matching functions. The depth limit is ignored by JIT but  is  relevant
        for  DFA  matching, which uses function recursion for recursions within
-       the pattern. In this case, the depth limit controls the amount of  sys-
-       tem stack that is used.
+       the pattern and for lookaround assertions and atomic  groups.  In  this
+       case, the depth limit controls the depth of such recursion.

    Newline conventions

@@ -8260,21 +8270,16 @@
        unset,  even  if  it was (temporarily) set at a deeper level during the
        matching process.

-       If there are more than 15 capturing parentheses in a pattern, PCRE2 has
-       to  obtain extra memory from the heap to store data during a recursion.
-       If  no  memory  can   be   obtained,   the   match   fails   with   the
-       PCRE2_ERROR_NOMEMORY error.
-
-       Do  not  confuse  the (?R) item with the condition (R), which tests for
-       recursion.  Consider this pattern, which matches text in  angle  brack-
-       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
-       brackets (that is, when recursing), whereas any characters are  permit-
+       Do not confuse the (?R) item with the condition (R),  which  tests  for
+       recursion.   Consider  this pattern, which matches text in angle brack-
+       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
+       brackets  (that is, when recursing), whereas any characters are permit-
        ted at the outer level.

          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >

-       In  this  pattern, (?(R) is the start of a conditional subpattern, with
-       two different alternatives for the recursive and  non-recursive  cases.
+       In this pattern, (?(R) is the start of a conditional  subpattern,  with
+       two  different  alternatives for the recursive and non-recursive cases.
        The (?R) item is the actual recursive call.

    Differences in recursion processing between PCRE2 and Perl
@@ -8281,65 +8286,65 @@

        Some former differences between PCRE2 and Perl no longer exist.

-       Before  release 10.30, recursion processing in PCRE2 differed from Perl
-       in that a recursive subpattern call was always  treated  as  an  atomic
-       group.  That is, once it had matched some of the subject string, it was
-       never re-entered, even if it contained untried alternatives  and  there
-       was  a  subsequent matching failure. (Historical note: PCRE implemented
+       Before release 10.30, recursion processing in PCRE2 differed from  Perl
+       in  that  a  recursive  subpattern call was always treated as an atomic
+       group. That is, once it had matched some of the subject string, it  was
+       never  re-entered,  even if it contained untried alternatives and there
+       was a subsequent matching failure. (Historical note:  PCRE  implemented
        recursion before Perl did.)

-       Starting with release 10.30, recursive subroutine calls are  no  longer
+       Starting  with  release 10.30, recursive subroutine calls are no longer
        treated as atomic. That is, they can be re-entered to try unused alter-
-       natives if there is a matching failure later in the  pattern.  This  is
-       now  compatible  with the way Perl works. If you want a subroutine call
+       natives  if  there  is a matching failure later in the pattern. This is
+       now compatible with the way Perl works. If you want a  subroutine  call
        to be atomic, you must explicitly enclose it in an atomic group.

-       Supporting backtracking into recursions  simplifies  certain  types  of
+       Supporting  backtracking  into  recursions  simplifies certain types of
        recursive  pattern.  For  example,  this  pattern  matches  palindromic
        strings:

          ^((.)(?1)\2|.?)$

-       The second branch in the group matches a single  central  character  in
-       the  palindrome  when there are an odd number of characters, or nothing
-       when there are an even number of characters, but in order  to  work  it
-       has  to  be  able  to  try the second case when the rest of the pattern
+       The  second  branch  in the group matches a single central character in
+       the palindrome when there are an odd number of characters,  or  nothing
+       when  there  are  an even number of characters, but in order to work it
+       has to be able to try the second case when  the  rest  of  the  pattern
        match fails. If you want to match typical palindromic phrases, the pat-
-       tern  has  to  ignore  all  non-word characters, which can be done like
+       tern has to ignore all non-word characters,  which  can  be  done  like
        this:

          ^\W*+((.)\W*+(?1)\W*+\2|\W*+.?)\W*+$

-       If run with the PCRE2_CASELESS option,  this  pattern  matches  phrases
-       such  as "A man, a plan, a canal: Panama!". Note the use of the posses-
-       sive quantifier *+ to avoid backtracking  into  sequences  of  non-word
+       If  run  with  the  PCRE2_CASELESS option, this pattern matches phrases
+       such as "A man, a plan, a canal: Panama!". Note the use of the  posses-
+       sive  quantifier  *+  to  avoid backtracking into sequences of non-word
        characters. Without this, PCRE2 takes a great deal longer (ten times or
-       more) to match typical phrases, and Perl takes so long that  you  think
+       more)  to  match typical phrases, and Perl takes so long that you think
        it has gone into a loop.

-       Another  way  in which PCRE2 and Perl used to differ in their recursion
-       processing is in the handling of captured  values.  Formerly  in  Perl,
-       when  a  subpattern  was called recursively or as a subpattern (see the
-       next section), it had no access to any values that were  captured  out-
-       side  the  recursion,  whereas in PCRE2 these values can be referenced.
+       Another way in which PCRE2 and Perl used to differ in  their  recursion
+       processing  is  in  the  handling of captured values. Formerly in Perl,
+       when a subpattern was called recursively or as a  subpattern  (see  the
+       next  section),  it had no access to any values that were captured out-
+       side the recursion, whereas in PCRE2 these values  can  be  referenced.
        Consider this pattern:

          ^(.)(\1|a(?2))

-       This pattern matches "bab". The first capturing parentheses match  "b",
-       then  in  the  second  group, when the back reference \1 fails to match
-       "b", the second alternative matches  "a"  and  then  recurses.  In  the
-       recursion,  \1 does now match "b" and so the whole match succeeds. This
-       match used to fail in Perl, but in later versions (I  tried  5.024)  it
+       This  pattern matches "bab". The first capturing parentheses match "b",
+       then in the second group, when the back reference  \1  fails  to  match
+       "b",  the  second  alternative  matches  "a"  and then recurses. In the
+       recursion, \1 does now match "b" and so the whole match succeeds.  This
+       match  used  to  fail in Perl, but in later versions (I tried 5.024) it
        now works.

SUBPATTERNS AS SUBROUTINES

-       If  the  syntax for a recursive subpattern call (either by number or by
-       name) is used outside the parentheses to which it refers,  it  operates
-       like  a subroutine in a programming language. The called subpattern may
-       be defined before or after the reference. A numbered reference  can  be
+       If the syntax for a recursive subpattern call (either by number  or  by
+       name)  is  used outside the parentheses to which it refers, it operates
+       like a subroutine in a programming language. The called subpattern  may
+       be  defined  before or after the reference. A numbered reference can be
        absolute or relative, as in these examples:

          (...(absolute)...)...(?2)...
@@ -8350,48 +8355,48 @@

          (sens|respons)e and \1ibility

-       matches  "sense and sensibility" and "response and responsibility", but
+       matches "sense and sensibility" and "response and responsibility",  but
        not "sense and responsibility". If instead the pattern

          (sens|respons)e and (?1)ibility

-       is used, it does match "sense and responsibility" as well as the  other
-       two  strings.  Another  example  is  given  in the discussion of DEFINE
+       is  used, it does match "sense and responsibility" as well as the other
+       two strings. Another example is  given  in  the  discussion  of  DEFINE
        above.

-       Like recursions, subroutine calls used to be  treated  as  atomic,  but
-       this  changed  at  PCRE2 release 10.30, so backtracking into subroutine
-       calls can now occur. However, any capturing parentheses  that  are  set
+       Like  recursions,  subroutine  calls  used to be treated as atomic, but
+       this changed at PCRE2 release 10.30, so  backtracking  into  subroutine
+       calls  can  now  occur. However, any capturing parentheses that are set
        during the subroutine call revert to their previous values afterwards.

-       Processing  options  such as case-independence are fixed when a subpat-
-       tern is defined, so if it is used as a subroutine, such options  cannot
+       Processing options such as case-independence are fixed when  a  subpat-
+       tern  is defined, so if it is used as a subroutine, such options cannot
        be changed for different calls. For example, consider this pattern:

          (abc)(?i:(?-1))

-       It  matches  "abcabc". It does not match "abcABC" because the change of
+       It matches "abcabc". It does not match "abcABC" because the  change  of
        processing option does not affect the called subpattern.

ONIGURUMA SUBROUTINE SYNTAX

-       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
+       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
        name or a number enclosed either in angle brackets or single quotes, is
-       an alternative syntax for referencing a  subpattern  as  a  subroutine,
-       possibly  recursively. Here are two of the examples used above, rewrit-
+       an  alternative  syntax  for  referencing a subpattern as a subroutine,
+       possibly recursively. Here are two of the examples used above,  rewrit-
        ten using this syntax:

          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility

-       PCRE2 supports an extension to Oniguruma: if a number is preceded by  a
+       PCRE2  supports an extension to Oniguruma: if a number is preceded by a
        plus or a minus sign it is taken as a relative reference. For example:

          (abc)(?i:\g<-1>)

-       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
-       synonymous. The former is a back reference; the latter is a  subroutine
+       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
+       synonymous.  The former is a back reference; the latter is a subroutine
        call.

@@ -8398,54 +8403,54 @@
CALLOUTS

        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl code to be obeyed in the middle of matching a regular  expression.
+       Perl  code to be obeyed in the middle of matching a regular expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.

-       PCRE2  provides  a  similar feature, but of course it cannot obey arbi-
-       trary Perl code. The feature is called "callout". The caller  of  PCRE2
-       provides  an  external  function  by putting its entry point in a match
-       context using the function pcre2_set_callout(), and then  passing  that
-       context  to  pcre2_match() or pcre2_dfa_match(). If no match context is
+       PCRE2 provides a similar feature, but of course it  cannot  obey  arbi-
+       trary  Perl  code. The feature is called "callout". The caller of PCRE2
+       provides an external function by putting its entry  point  in  a  match
+       context  using  the function pcre2_set_callout(), and then passing that
+       context to pcre2_match() or pcre2_dfa_match(). If no match  context  is
        passed, or if the callout entry point is set to NULL, callouts are dis-
        abled.

-       Within  a  regular expression, (?C<arg>) indicates a point at which the
-       external function is to be called. There  are  two  kinds  of  callout:
-       those  with a numerical argument and those with a string argument. (?C)
-       on its own with no argument is treated as (?C0). A  numerical  argument
-       allows  the  application  to  distinguish  between  different callouts.
-       String arguments were added for release 10.20 to make it  possible  for
-       script  languages that use PCRE2 to embed short scripts within patterns
+       Within a regular expression, (?C<arg>) indicates a point at  which  the
+       external  function  is  to  be  called. There are two kinds of callout:
+       those with a numerical argument and those with a string argument.  (?C)
+       on  its  own with no argument is treated as (?C0). A numerical argument
+       allows the  application  to  distinguish  between  different  callouts.
+       String  arguments  were added for release 10.20 to make it possible for
+       script languages that use PCRE2 to embed short scripts within  patterns
        in a similar way to Perl.

        During matching, when PCRE2 reaches a callout point, the external func-
-       tion  is  called.  It is provided with the number or string argument of
-       the callout, the position in the pattern, and one item of data that  is
+       tion is called. It is provided with the number or  string  argument  of
+       the  callout, the position in the pattern, and one item of data that is
        also set in the match block. The callout function may cause matching to
        proceed, to backtrack, or to fail.

-       By default, PCRE2 implements a  number  of  optimizations  at  matching
-       time,  and  one  side-effect is that sometimes callouts are skipped. If
-       you need all possible callouts to happen, you need to set options  that
-       disable  the relevant optimizations. More details, including a complete
-       description of the programming interface to the callout  function,  are
+       By  default,  PCRE2  implements  a  number of optimizations at matching
+       time, and one side-effect is that sometimes callouts  are  skipped.  If
+       you  need all possible callouts to happen, you need to set options that
+       disable the relevant optimizations. More details, including a  complete
+       description  of  the programming interface to the callout function, are
        given in the pcre2callout documentation.

    Callouts with numerical arguments

-       If  you  just  want  to  have  a means of identifying different callout
-       points, put a number less than 256 after the  letter  C.  For  example,
+       If you just want to have  a  means  of  identifying  different  callout
+       points,  put  a  number  less than 256 after the letter C. For example,
        this pattern has two callout points:

          (?C1)abc(?C2)def

-       If  the PCRE2_AUTO_CALLOUT flag is passed to pcre2_compile(), numerical
-       callouts are automatically installed before each item in  the  pattern.
-       They  are all numbered 255. If there is a conditional group in the pat-
+       If the PCRE2_AUTO_CALLOUT flag is passed to pcre2_compile(),  numerical
+       callouts  are  automatically installed before each item in the pattern.
+       They are all numbered 255. If there is a conditional group in the  pat-
        tern whose condition is an assertion, an additional callout is inserted
-       just  before the condition. An explicit callout may also be set at this
+       just before the condition. An explicit callout may also be set at  this
        position, as in this example:

          (?(?C9)(?=a)abc|def)
@@ -8455,60 +8460,60 @@

    Callouts with string arguments

-       A  delimited  string may be used instead of a number as a callout argu-
-       ment. The starting delimiter must be one of ` ' " ^ % #  $  {  and  the
+       A delimited string may be used instead of a number as a  callout  argu-
+       ment.  The  starting  delimiter  must be one of ` ' " ^ % # $ { and the
        ending delimiter is the same as the start, except for {, where the end-
-       ing delimiter is }. If  the  ending  delimiter  is  needed  within  the
+       ing  delimiter  is  }.  If  the  ending  delimiter is needed within the
        string, it must be doubled. For example:

          (?C'ab ''c'' d')xyz(?C{any text})pqr

-       The  doubling  is  removed  before  the string is passed to the callout
+       The doubling is removed before the string  is  passed  to  the  callout
        function.

BACKTRACKING CONTROL

-       There are a number of special  "Backtracking  Control  Verbs"  (to  use
-       Perl's  terminology)  that  modify the behaviour of backtracking during
-       matching. They are generally of the form (*VERB) or (*VERB:NAME).  Some
-       verbs  take  either  form,  possibly  behaving differently depending on
+       There  are  a  number  of  special "Backtracking Control Verbs" (to use
+       Perl's terminology) that modify the behaviour  of  backtracking  during
+       matching.  They are generally of the form (*VERB) or (*VERB:NAME). Some
+       verbs take either form,  possibly  behaving  differently  depending  on
        whether or not a name is present.

-       By default, for compatibility with Perl, a  name  is  any  sequence  of
+       By  default,  for  compatibility  with  Perl, a name is any sequence of
        characters that does not include a closing parenthesis. The name is not
-       processed in any way, and it is  not  possible  to  include  a  closing
-       parenthesis   in  the  name.   This  can  be  changed  by  setting  the
-       PCRE2_ALT_VERBNAMES option, but the result is no  longer  Perl-compati-
+       processed  in  any  way,  and  it  is not possible to include a closing
+       parenthesis  in  the  name.   This  can  be  changed  by  setting   the
+       PCRE2_ALT_VERBNAMES  option,  but the result is no longer Perl-compati-
        ble.

-       When  PCRE2_ALT_VERBNAMES  is  set,  backslash processing is applied to
-       verb names and only an unescaped  closing  parenthesis  terminates  the
-       name.  However, the only backslash items that are permitted are \Q, \E,
-       and sequences such as \x{100} that define character code points.  Char-
+       When PCRE2_ALT_VERBNAMES is set, backslash  processing  is  applied  to
+       verb  names  and  only  an unescaped closing parenthesis terminates the
+       name. However, the only backslash items that are permitted are \Q,  \E,
+       and  sequences such as \x{100} that define character code points. Char-
        acter type escapes such as \d are faulted.

        A closing parenthesis can be included in a name either as \) or between
-       \Q and \E. In addition to backslash processing, if  the  PCRE2_EXTENDED
-       option  is also set, unescaped whitespace in verb names is skipped, and
-       #-comments are recognized, exactly as  in  the  rest  of  the  pattern.
+       \Q  and  \E. In addition to backslash processing, if the PCRE2_EXTENDED
+       option is also set, unescaped whitespace in verb names is skipped,  and
+       #-comments  are  recognized,  exactly  as  in  the rest of the pattern.
        PCRE2_EXTENDED does not affect verb names unless PCRE2_ALT_VERBNAMES is
        also set.

-       The maximum length of a name is 255 in the 8-bit library and  65535  in
-       the  16-bit and 32-bit libraries. If the name is empty, that is, if the
-       closing parenthesis immediately follows the colon, the effect is as  if
+       The  maximum  length of a name is 255 in the 8-bit library and 65535 in
+       the 16-bit and 32-bit libraries. If the name is empty, that is, if  the
+       closing  parenthesis immediately follows the colon, the effect is as if
        the colon were not there. Any number of these verbs may occur in a pat-
        tern.

-       Since these verbs are specifically related  to  backtracking,  most  of
-       them  can be used only when the pattern is to be matched using the tra-
+       Since  these  verbs  are  specifically related to backtracking, most of
+       them can be used only when the pattern is to be matched using the  tra-
        ditional matching function, because that uses a backtracking algorithm.
-       With  the  exception  of (*FAIL), which behaves like a failing negative
+       With the exception of (*FAIL), which behaves like  a  failing  negative
        assertion, the backtracking control verbs cause an error if encountered
        by the DFA matching function.

-       The  behaviour  of  these  verbs in repeated groups, assertions, and in
+       The behaviour of these verbs in repeated  groups,  assertions,  and  in
        subpatterns called as subroutines (whether or not recursively) is docu-
        mented below.

@@ -8516,71 +8521,71 @@

        PCRE2 contains some optimizations that are used to speed up matching by
        running some checks at the start of each match attempt. For example, it
-       may  know  the minimum length of matching subject, or that a particular
+       may know the minimum length of matching subject, or that  a  particular
        character must be present. When one of these optimizations bypasses the
-       running  of  a  match,  any  included  backtracking  verbs will not, of
+       running of a match,  any  included  backtracking  verbs  will  not,  of
        course, be processed. You can suppress the start-of-match optimizations
-       by  setting  the PCRE2_NO_START_OPTIMIZE option when calling pcre2_com-
-       pile(), or by starting the pattern with (*NO_START_OPT). There is  more
+       by setting the PCRE2_NO_START_OPTIMIZE option when  calling  pcre2_com-
+       pile(),  or by starting the pattern with (*NO_START_OPT). There is more
        discussion of this option in the section entitled "Compiling a pattern"
        in the pcre2api documentation.

-       Experiments with Perl suggest that it too  has  similar  optimizations,
+       Experiments  with  Perl  suggest that it too has similar optimizations,
        sometimes leading to anomalous results.

    Verbs that act immediately

-       The  following  verbs act as soon as they are encountered. They may not
+       The following verbs act as soon as they are encountered. They  may  not
        be followed by a name.

           (*ACCEPT)

-       This verb causes the match to end successfully, skipping the  remainder
-       of  the pattern. However, when it is inside a subpattern that is called
-       as a subroutine, only that subpattern is ended  successfully.  Matching
+       This  verb causes the match to end successfully, skipping the remainder
+       of the pattern. However, when it is inside a subpattern that is  called
+       as  a  subroutine, only that subpattern is ended successfully. Matching
        then continues at the outer level. If (*ACCEPT) in triggered in a posi-
-       tive assertion, the assertion succeeds; in a  negative  assertion,  the
+       tive  assertion,  the  assertion succeeds; in a negative assertion, the
        assertion fails.

-       If  (*ACCEPT)  is inside capturing parentheses, the data so far is cap-
+       If (*ACCEPT) is inside capturing parentheses, the data so far  is  cap-
        tured. For example:

          A((?:A|B(*ACCEPT)|C)D)

-       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
+       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
        tured by the outer parentheses.

          (*FAIL) or (*F)

-       This  verb causes a matching failure, forcing backtracking to occur. It
-       is equivalent to (?!) but easier to read. The Perl documentation  notes
-       that  it  is  probably  useful only when combined with (?{}) or (??{}).
-       Those are, of course, Perl features that are not present in PCRE2.  The
-       nearest  equivalent is the callout feature, as for example in this pat-
+       This verb causes a matching failure, forcing backtracking to occur.  It
+       is  equivalent to (?!) but easier to read. The Perl documentation notes
+       that it is probably useful only when combined  with  (?{})  or  (??{}).
+       Those  are, of course, Perl features that are not present in PCRE2. The
+       nearest equivalent is the callout feature, as for example in this  pat-
        tern:

          a+(?C)(*FAIL)

-       A match with the string "aaaa" always fails, but the callout  is  taken
+       A  match  with the string "aaaa" always fails, but the callout is taken
        before each backtrack happens (in this example, 10 times).

    Recording which path was taken

-       There  is  one  verb  whose  main  purpose  is to track how a match was
-       arrived at, though it also has a  secondary  use  in  conjunction  with
+       There is one verb whose main purpose  is  to  track  how  a  match  was
+       arrived  at,  though  it  also  has a secondary use in conjunction with
        advancing the match starting point (see (*SKIP) below).

          (*MARK:NAME) or (*:NAME)

-       A  name  is  always  required  with  this  verb.  There  may be as many
-       instances of (*MARK) as you like in a pattern, and their names  do  not
+       A name is always  required  with  this  verb.  There  may  be  as  many
+       instances  of  (*MARK) as you like in a pattern, and their names do not
        have to be unique.

-       When  a  match succeeds, the name of the last-encountered (*MARK:NAME),
-       (*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed  back  to
-       the  caller  as  described  in  the section entitled "Other information
-       about the match" in the pcre2api documentation. Here is an  example  of
-       pcre2test  output, where the "mark" modifier requests the retrieval and
+       When a match succeeds, the name of the  last-encountered  (*MARK:NAME),
+       (*PRUNE:NAME),  or  (*THEN:NAME) on the matching path is passed back to
+       the caller as described in  the  section  entitled  "Other  information
+       about  the  match" in the pcre2api documentation. Here is an example of
+       pcre2test output, where the "mark" modifier requests the retrieval  and
        outputting of (*MARK) data:

            re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
@@ -8592,16 +8597,16 @@
          MK: B

        The (*MARK) name is tagged with "MK:" in this output, and in this exam-
-       ple  it indicates which of the two alternatives matched. This is a more
-       efficient way of obtaining this information than putting each  alterna-
+       ple it indicates which of the two alternatives matched. This is a  more
+       efficient  way of obtaining this information than putting each alterna-
        tive in its own capturing parentheses.

-       If  a  verb  with a name is encountered in a positive assertion that is
-       true, the name is recorded and passed back if it  is  the  last-encoun-
+       If a verb with a name is encountered in a positive  assertion  that  is
+       true,  the  name  is recorded and passed back if it is the last-encoun-
        tered. This does not happen for negative assertions or failing positive
        assertions.

-       After a partial match or a failed match, the last encountered  name  in
+       After  a  partial match or a failed match, the last encountered name in
        the entire match process is returned. For example:

            re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
@@ -8608,56 +8613,56 @@
          data> XP
          No match, mark = B

-       Note  that  in  this  unanchored  example the mark is retained from the
+       Note that in this unanchored example the  mark  is  retained  from  the
        match attempt that started at the letter "X" in the subject. Subsequent
        match attempts starting at "P" and then with an empty string do not get
        as far as the (*MARK) item, but nevertheless do not reset it.

-       If you are interested in  (*MARK)  values  after  failed  matches,  you
-       should  probably  set the PCRE2_NO_START_OPTIMIZE option (see above) to
+       If  you  are  interested  in  (*MARK)  values after failed matches, you
+       should probably set the PCRE2_NO_START_OPTIMIZE option (see  above)  to
        ensure that the match is always attempted.

    Verbs that act after backtracking

        The following verbs do nothing when they are encountered. Matching con-
-       tinues  with what follows, but if there is no subsequent match, causing
-       a backtrack to the verb, a failure is  forced.  That  is,  backtracking
-       cannot  pass  to the left of the verb. However, when one of these verbs
-       appears inside an atomic group or in an assertion  that  is  true,  its
-       effect  is  confined  to  that  group,  because once the group has been
-       matched, there is never any backtracking into it.  In  this  situation,
-       backtracking  has  to  jump  to  the left of the entire atomic group or
+       tinues with what follows, but if there is no subsequent match,  causing
+       a  backtrack  to  the  verb, a failure is forced. That is, backtracking
+       cannot pass to the left of the verb. However, when one of  these  verbs
+       appears  inside  an  atomic  group or in an assertion that is true, its
+       effect is confined to that group,  because  once  the  group  has  been
+       matched,  there  is  never any backtracking into it. In this situation,
+       backtracking has to jump to the left of  the  entire  atomic  group  or
        assertion.

-       These verbs differ in exactly what kind of failure  occurs  when  back-
-       tracking  reaches  them.  The behaviour described below is what happens
-       when the verb is not in a subroutine or an assertion.  Subsequent  sec-
+       These  verbs  differ  in exactly what kind of failure occurs when back-
+       tracking reaches them. The behaviour described below  is  what  happens
+       when  the  verb is not in a subroutine or an assertion. Subsequent sec-
        tions cover these special cases.

          (*COMMIT)

-       This  verb, which may not be followed by a name, causes the whole match
+       This verb, which may not be followed by a name, causes the whole  match
        to fail outright if there is a later matching failure that causes back-
-       tracking  to  reach  it.  Even if the pattern is unanchored, no further
+       tracking to reach it. Even if the pattern  is  unanchored,  no  further
        attempts to find a match by advancing the starting point take place. If
-       (*COMMIT)  is  the  only backtracking verb that is encountered, once it
-       has been passed pcre2_match() is committed to finding a  match  at  the
+       (*COMMIT) is the only backtracking verb that is  encountered,  once  it
+       has  been  passed  pcre2_match() is committed to finding a match at the
        current starting point, or not at all. For example:

          a+(*COMMIT)b

-       This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
+       This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
        of dynamic anchor, or "I've started, so I must finish." The name of the
-       most  recently passed (*MARK) in the path is passed back when (*COMMIT)
+       most recently passed (*MARK) in the path is passed back when  (*COMMIT)
        forces a match failure.

-       If there is more than one backtracking verb in a pattern,  a  different
-       one  that  follows  (*COMMIT) may be triggered first, so merely passing
+       If  there  is more than one backtracking verb in a pattern, a different
+       one that follows (*COMMIT) may be triggered first,  so  merely  passing
        (*COMMIT) during a match does not always guarantee that a match must be
        at this starting point.

-       Note  that  (*COMMIT)  at  the start of a pattern is not the same as an
-       anchor, unless PCRE2's start-of-match optimizations are turned off,  as
+       Note that (*COMMIT) at the start of a pattern is not  the  same  as  an
+       anchor,  unless PCRE2's start-of-match optimizations are turned off, as
        shown in this output from pcre2test:

            re> /(*COMMIT)abc/
@@ -8668,49 +8673,49 @@
          data> xyzabc
          No match

-       For  the first pattern, PCRE2 knows that any match must start with "a",
-       so the optimization skips along the subject to "a" before applying  the
-       pattern  to the first set of data. The match attempt then succeeds. The
-       second pattern disables the optimization that skips along to the  first
-       character.  The  pattern  is  now  applied  starting at "x", and so the
-       (*COMMIT) causes the match to fail without trying  any  other  starting
+       For the first pattern, PCRE2 knows that any match must start with  "a",
+       so  the optimization skips along the subject to "a" before applying the
+       pattern to the first set of data. The match attempt then succeeds.  The
+       second  pattern disables the optimization that skips along to the first
+       character. The pattern is now applied  starting  at  "x",  and  so  the
+       (*COMMIT)  causes  the  match to fail without trying any other starting
        points.

          (*PRUNE) or (*PRUNE:NAME)

-       This  verb causes the match to fail at the current starting position in
+       This verb causes the match to fail at the current starting position  in
        the subject if there is a later matching failure that causes backtrack-
-       ing  to  reach it. If the pattern is unanchored, the normal "bumpalong"
-       advance to the next starting character then happens.  Backtracking  can
-       occur  as  usual to the left of (*PRUNE), before it is reached, or when
-       matching to the right of (*PRUNE), but if there  is  no  match  to  the
-       right,  backtracking cannot cross (*PRUNE). In simple cases, the use of
-       (*PRUNE) is just an alternative to an atomic group or possessive  quan-
+       ing to reach it. If the pattern is unanchored, the  normal  "bumpalong"
+       advance  to  the next starting character then happens. Backtracking can
+       occur as usual to the left of (*PRUNE), before it is reached,  or  when
+       matching  to  the  right  of  (*PRUNE), but if there is no match to the
+       right, backtracking cannot cross (*PRUNE). In simple cases, the use  of
+       (*PRUNE)  is just an alternative to an atomic group or possessive quan-
        tifier, but there are some uses of (*PRUNE) that cannot be expressed in
-       any other way. In an anchored pattern (*PRUNE) has the same  effect  as
+       any  other  way. In an anchored pattern (*PRUNE) has the same effect as
        (*COMMIT).

        The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
        It is like (*MARK:NAME) in that the name is remembered for passing back
-       to  the  caller. However, (*SKIP:NAME) searches only for names set with
+       to the caller. However, (*SKIP:NAME) searches only for names  set  with
        (*MARK), ignoring those set by (*PRUNE) or (*THEN).

          (*SKIP)

-       This verb, when given without a name, is like (*PRUNE), except that  if
-       the  pattern  is unanchored, the "bumpalong" advance is not to the next
+       This  verb, when given without a name, is like (*PRUNE), except that if
+       the pattern is unanchored, the "bumpalong" advance is not to  the  next
        character, but to the position in the subject where (*SKIP) was encoun-
-       tered.  (*SKIP)  signifies that whatever text was matched leading up to
+       tered. (*SKIP) signifies that whatever text was matched leading  up  to
        it cannot be part of a successful match. Consider:

          a+(*SKIP)b

-       If the subject is "aaaac...",  after  the  first  match  attempt  fails
-       (starting  at  the  first  character in the string), the starting point
+       If  the  subject  is  "aaaac...",  after  the first match attempt fails
+       (starting at the first character in the  string),  the  starting  point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer  does not have the same effect as this example; although it would
-       suppress backtracking  during  the  first  match  attempt,  the  second
-       attempt  would  start at the second character instead of skipping on to
+       tifer does not have the same effect as this example; although it  would
+       suppress  backtracking  during  the  first  match  attempt,  the second
+       attempt would start at the second character instead of skipping  on  to
        "c".

          (*SKIP:NAME)
@@ -8717,164 +8722,164 @@

        When (*SKIP) has an associated name, its behaviour is modified. When it
        is triggered, the previous path through the pattern is searched for the
-       most recent (*MARK) that has the  same  name.  If  one  is  found,  the
+       most  recent  (*MARK)  that  has  the  same  name. If one is found, the
        "bumpalong" advance is to the subject position that corresponds to that
        (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with
        a matching name is found, the (*SKIP) is ignored.

-       Note  that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
+       Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME).  It
        ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).

          (*THEN) or (*THEN:NAME)

-       This verb causes a skip to the next innermost  alternative  when  back-
-       tracking  reaches  it.  That  is,  it  cancels any further backtracking
-       within the current alternative. Its name  comes  from  the  observation
+       This  verb  causes  a skip to the next innermost alternative when back-
+       tracking reaches it. That  is,  it  cancels  any  further  backtracking
+       within  the  current  alternative.  Its name comes from the observation
        that it can be used for a pattern-based if-then-else block:

          ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...

-       If  the COND1 pattern matches, FOO is tried (and possibly further items
-       after the end of the group if FOO succeeds); on  failure,  the  matcher
-       skips  to  the second alternative and tries COND2, without backtracking
-       into COND1. If that succeeds and BAR fails, COND3 is tried.  If  subse-
-       quently  BAZ fails, there are no more alternatives, so there is a back-
-       track to whatever came before the  entire  group.  If  (*THEN)  is  not
+       If the COND1 pattern matches, FOO is tried (and possibly further  items
+       after  the  end  of the group if FOO succeeds); on failure, the matcher
+       skips to the second alternative and tries COND2,  without  backtracking
+       into  COND1.  If that succeeds and BAR fails, COND3 is tried. If subse-
+       quently BAZ fails, there are no more alternatives, so there is a  back-
+       track  to  whatever  came  before  the  entire group. If (*THEN) is not
        inside an alternation, it acts like (*PRUNE).

-       The    behaviour   of   (*THEN:NAME)   is   the   not   the   same   as
-       (*MARK:NAME)(*THEN).  It is like  (*MARK:NAME)  in  that  the  name  is
-       remembered  for  passing  back  to  the  caller.  However, (*SKIP:NAME)
-       searches only for  names  set  with  (*MARK),  ignoring  those  set  by
+       The   behaviour   of   (*THEN:NAME)   is   the   not   the   same    as
+       (*MARK:NAME)(*THEN).   It  is  like  (*MARK:NAME)  in  that the name is
+       remembered for  passing  back  to  the  caller.  However,  (*SKIP:NAME)
+       searches  only  for  names  set  with  (*MARK),  ignoring  those set by
        (*PRUNE) and (*THEN).

-       A  subpattern that does not contain a | character is just a part of the
-       enclosing alternative; it is not a nested  alternation  with  only  one
-       alternative.  The effect of (*THEN) extends beyond such a subpattern to
-       the enclosing alternative. Consider this pattern, where A, B, etc.  are
-       complex  pattern fragments that do not contain any | characters at this
+       A subpattern that does not contain a | character is just a part of  the
+       enclosing  alternative;  it  is  not a nested alternation with only one
+       alternative. The effect of (*THEN) extends beyond such a subpattern  to
+       the  enclosing alternative. Consider this pattern, where A, B, etc. are
+       complex pattern fragments that do not contain any | characters at  this
        level:

          A (B(*THEN)C) | D

-       If A and B are matched, but there is a failure in C, matching does  not
+       If  A and B are matched, but there is a failure in C, matching does not
        backtrack into A; instead it moves to the next alternative, that is, D.
-       However, if the subpattern containing (*THEN) is given an  alternative,
+       However,  if the subpattern containing (*THEN) is given an alternative,
        it behaves differently:

          A (B(*THEN)C | (*FAIL)) | D

-       The  effect of (*THEN) is now confined to the inner subpattern. After a
+       The effect of (*THEN) is now confined to the inner subpattern. After  a
        failure in C, matching moves to (*FAIL), which causes the whole subpat-
-       tern  to  fail  because  there are no more alternatives to try. In this
+       tern to fail because there are no more alternatives  to  try.  In  this
        case, matching does now backtrack into A.

-       Note that a conditional subpattern is  not  considered  as  having  two
-       alternatives,  because  only  one  is  ever used. In other words, the |
+       Note  that  a  conditional  subpattern  is not considered as having two
+       alternatives, because only one is ever used.  In  other  words,  the  |
        character in a conditional subpattern has a different meaning. Ignoring
        white space, consider:

          ^.*? (?(?=a) a | b(*THEN)c )

-       If  the  subject  is  "ba", this pattern does not match. Because .*? is
-       ungreedy, it initially matches zero  characters.  The  condition  (?=a)
-       then  fails,  the  character  "b"  is  matched, but "c" is not. At this
-       point, matching does not backtrack to .*? as might perhaps be  expected
-       from  the  presence  of  the | character. The conditional subpattern is
+       If the subject is "ba", this pattern does not  match.  Because  .*?  is
+       ungreedy,  it  initially  matches  zero characters. The condition (?=a)
+       then fails, the character "b" is matched,  but  "c"  is  not.  At  this
+       point,  matching does not backtrack to .*? as might perhaps be expected
+       from the presence of the | character.  The  conditional  subpattern  is
        part of the single alternative that comprises the whole pattern, and so
-       the  match  fails.  (If  there was a backtrack into .*?, allowing it to
+       the match fails. (If there was a backtrack into  .*?,  allowing  it  to
        match "b", the match would succeed.)

-       The verbs just described provide four different "strengths" of  control
+       The  verbs just described provide four different "strengths" of control
        when subsequent matching fails. (*THEN) is the weakest, carrying on the
-       match at the next alternative. (*PRUNE) comes next, failing  the  match
-       at  the  current starting position, but allowing an advance to the next
-       character (for an unanchored pattern). (*SKIP) is similar, except  that
+       match  at  the next alternative. (*PRUNE) comes next, failing the match
+       at the current starting position, but allowing an advance to  the  next
+       character  (for an unanchored pattern). (*SKIP) is similar, except that
        the advance may be more than one character. (*COMMIT) is the strongest,
        causing the entire match to fail.

    More than one backtracking verb

-       If more than one backtracking verb is present in  a  pattern,  the  one
-       that  is  backtracked  onto first acts. For example, consider this pat-
+       If  more  than  one  backtracking verb is present in a pattern, the one
+       that is backtracked onto first acts. For example,  consider  this  pat-
        tern, where A, B, etc. are complex pattern fragments:

          (A(*COMMIT)B(*THEN)C|ABD)

-       If A matches but B fails, the backtrack to (*COMMIT) causes the  entire
+       If  A matches but B fails, the backtrack to (*COMMIT) causes the entire
        match to fail. However, if A and B match, but C fails, the backtrack to
-       (*THEN) causes the next alternative (ABD) to be tried.  This  behaviour
-       is  consistent,  but is not always the same as Perl's. It means that if
-       two or more backtracking verbs appear in succession, all the  the  last
+       (*THEN)  causes  the next alternative (ABD) to be tried. This behaviour
+       is consistent, but is not always the same as Perl's. It means  that  if
+       two  or  more backtracking verbs appear in succession, all the the last
        of them has no effect. Consider this example:

          ...(*COMMIT)(*PRUNE)...

        If there is a matching failure to the right, backtracking onto (*PRUNE)
-       causes it to be triggered, and its action is taken. There can never  be
+       causes  it to be triggered, and its action is taken. There can never be
        a backtrack onto (*COMMIT).

    Backtracking verbs in repeated groups

-       PCRE2  differs  from  Perl  in  its  handling  of backtracking verbs in
+       PCRE2 differs from Perl  in  its  handling  of  backtracking  verbs  in
        repeated groups. For example, consider:

          /(a(*COMMIT)b)+ac/

-       If the subject is "abac", Perl matches, but  PCRE2  fails  because  the
+       If  the  subject  is  "abac", Perl matches, but PCRE2 fails because the
        (*COMMIT) in the second repeat of the group acts.

    Backtracking verbs in assertions

-       (*FAIL)  in any assertion has its normal effect: it forces an immediate
-       backtrack. The behaviour of the other  backtracking  verbs  depends  on
-       whether  or  not the assertion is standalone or acting as the condition
+       (*FAIL) in any assertion has its normal effect: it forces an  immediate
+       backtrack.  The  behaviour  of  the other backtracking verbs depends on
+       whether or not the assertion is standalone or acting as  the  condition
        in a conditional subpattern.

-       (*ACCEPT) in a standalone positive assertion causes  the  assertion  to
-       succeed  without any further processing; captured strings are retained.
-       In a standalone negative assertion, (*ACCEPT) causes the  assertion  to
+       (*ACCEPT)  in  a  standalone positive assertion causes the assertion to
+       succeed without any further processing; captured strings are  retained.
+       In  a  standalone negative assertion, (*ACCEPT) causes the assertion to
        fail without any further processing; captured substrings are discarded.

-       If  the  assertion is a condition, (*ACCEPT) causes the condition to be
-       true for a positive assertion and false for a  negative  one;  captured
+       If the assertion is a condition, (*ACCEPT) causes the condition  to  be
+       true  for  a  positive assertion and false for a negative one; captured
        substrings are retained in both cases.

-       The  effect of (*THEN) is not allowed to escape beyond an assertion. If
-       there are no more branches to try, (*THEN) causes a positive  assertion
+       The effect of (*THEN) is not allowed to escape beyond an assertion.  If
+       there  are no more branches to try, (*THEN) causes a positive assertion
        to be false, and a negative assertion to be true.

-       The  other  backtracking verbs are not treated specially if they appear
-       in a standalone positive assertion. In a  conditional  positive  asser-
+       The other backtracking verbs are not treated specially if  they  appear
+       in  a  standalone  positive assertion. In a conditional positive asser-
        tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con-
-       dition to be false. However, for both standalone and conditional  nega-
-       tive  assertions,  backtracking  into  (*COMMIT),  (*SKIP), or (*PRUNE)
+       dition  to be false. However, for both standalone and conditional nega-
+       tive assertions, backtracking  into  (*COMMIT),  (*SKIP),  or  (*PRUNE)
        causes the assertion to be true, without considering any further alter-
        native branches.

    Backtracking verbs in subroutines

-       These  behaviours  occur whether or not the subpattern is called recur-
+       These behaviours occur whether or not the subpattern is  called  recur-
        sively.  Perl's treatment of subroutines is different in some cases.

-       (*FAIL) in a subpattern called as a subroutine has its  normal  effect:
+       (*FAIL)  in  a subpattern called as a subroutine has its normal effect:
        it forces an immediate backtrack.

-       (*ACCEPT)  in a subpattern called as a subroutine causes the subroutine
-       match to succeed without any further processing. Matching then  contin-
+       (*ACCEPT) in a subpattern called as a subroutine causes the  subroutine
+       match  to succeed without any further processing. Matching then contin-
        ues after the subroutine call.

        (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
        cause the subroutine match to fail.

-       (*THEN) skips to the next alternative in the innermost enclosing  group
-       within  the subpattern that has alternatives. If there is no such group
+       (*THEN)  skips to the next alternative in the innermost enclosing group
+       within the subpattern that has alternatives. If there is no such  group
        within the subpattern, (*THEN) causes the subroutine match to fail.

SEE ALSO

-       pcre2api(3),   pcre2callout(3),    pcre2matching(3),    pcre2syntax(3),
+       pcre2api(3),    pcre2callout(3),    pcre2matching(3),   pcre2syntax(3),
        pcre2(3).

@@ -8887,8 +8892,8 @@

REVISION

-       Last updated: 12 September 2017
-       Copyright (c) 1997-2017 University of Cambridge.
+       Last updated: 25 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -8973,11 +8978,19 @@

        In  contrast  to  pcre2_match(),  pcre2_dfa_match()  does use recursive
        function calls, but  only  for  processing  atomic  groups,  lookaround
-       assertions, and recursion within the pattern. Too much nested recursion
-       may cause stack issues. The "match depth"  parameter  can  be  used  to
-       limit the depth of function recursion in pcre2_dfa_match().
+       assertions,  and  recursion within the pattern. The original version of
+       the code used to allocate quite large internal workspace vectors on the
+       stack,  which  caused  some  problems for some patterns in environments
+       with small stacks. From release 10.32 the  code  for  pcre2_dfa_match()
+       has  been  re-factored  to  use heap memory when necessary for internal
+       workspace when recursing, though recursive  function  calls  are  still
+       used.

+       The  "match depth" parameter can be used to limit the depth of function
+       recursion, and the "match heap"  parameter  to  limit  heap  memory  in
+       pcre2_dfa_match().

+
PROCESSING TIME

        Certain  items  in regular expression patterns are processed more effi-
@@ -9115,8 +9128,8 @@

REVISION

-       Last updated: 08 April 2017
-       Copyright (c) 1997-2017 University of Cambridge.
+       Last updated: 25 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

Modified: code/trunk/doc/pcre2_dfa_match.3
===================================================================
--- code/trunk/doc/pcre2_dfa_match.3    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2_dfa_match.3    2018-04-27 16:48:35 UTC (rev 932)
@@ -1,4 +1,4 @@
-.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
+.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -34,9 +34,9 @@
   \fIwscount\fP      Number of elements in the vector
 .sp
 For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
-up a callout function or specify the match and/or the recursion depth limits.
-The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
-The options are:
+up a callout function or specify the heap limit or the match or the recursion
+depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
+characters. The options are:
 .sp
   PCRE2_ANCHORED          Match only at the first position
   PCRE2_ENDANCHORED       Pattern can match only at end of subject

Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2api.3    2018-04-27 16:48:35 UTC (rev 932)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "31 December 2017" "PCRE2 10.31"
+.TH PCRE2API 3 "27 April 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -887,16 +887,17 @@
 .sp
 The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
 amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
-information when running an interpretive match. This limit does not apply to
-matching with the JIT optimization, which has its own memory control
-arrangements (see the
+information when running an interpretive match. This limit also applies to
+\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a
+lot of nested pattern recursion or lookarounds or atomic groups. This limit
+does not apply to matching with the JIT optimization, which has its own memory
+control arrangements (see the
 .\" HREF
 \fBpcre2jit\fP
 .\"
-documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP.
-If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
-returned. The default limit is set when PCRE2 is built; the default default is
-very large and is essentially "unlimited".
+documentation for more details). If the limit is reached, the negative error
+code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
+built; the default default is very large and is essentially "unlimited".
 .P
 A value for the heap limit may also be supplied by an item at the start of a
 pattern of the form
@@ -914,6 +915,11 @@
 is set to a value less than 21 (in particular, zero) no heap memory will be
 used. In this case, only patterns that do not have a lot of nested backtracking
 can be successfully processed.
+.P
+Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used 
+when processing pattern recursions, lookarounds, or atomic groups, and only if 
+this is not big enough is heap memory used. In this case, too, setting a value 
+of zero disables the use of the heap.
 .sp
 .nf
 .B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
@@ -967,12 +973,22 @@
 .P
 The depth limit is not relevant, and is ignored, when matching is done using
 JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
-uses it to limit the depth of internal recursive function calls that implement
-atomic groups, lookaround assertions, and pattern recursions. This is,
-therefore, an indirect limit on the amount of system stack that is used. A
-recursive pattern such as /(.)(?1)/, when matched to a very long string using
-\fBpcre2_dfa_match()\fP, can use a great deal of stack.
+uses it to limit the depth of nested internal recursive function calls that
+implement atomic groups, lookaround assertions, and pattern recursions. This
+limits, indirectly, the amount of system stack this is used. It was more useful
+in versions before 10.32, when stack memory was used for local workspace
+vectors for recursive function calls. From version 10.32, only local variables
+are allocated on the stack and as each call uses only a few hundred bytes, even
+a small stack can support quite a lot of recursion.
 .P
+If the depth of internal recursive function calls is great enough, local
+workspace vectors are allocated on the heap from version 10.32 onwards, so the
+depth limit also indirectly limits the amount of heap memory that is used. A
+recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
+using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
+probably better to limit heap usage directly by calling
+\fBpcre2_set_heap_limit()\fP.
+.P
 The default value for the depth limit can be set when PCRE2 is built; the
 default default is the same value as the default for the match limit. If the
 limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
@@ -1028,15 +1044,16 @@
   PCRE2_CONFIG_DEPTHLIMIT
 .sp
 The output is a uint32_t integer that gives the default limit for the depth of
-nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
-and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
-\fBpcre2_set_depth_limit()\fP above.
+nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions,
+lookarounds, and atomic groups in \fBpcre2_dfa_match()\fP. Further details are
+given with \fBpcre2_set_depth_limit()\fP above.
 .sp
   PCRE2_CONFIG_HEAPLIMIT
 .sp
 The output is a uint32_t integer that gives, in kilobytes, the default limit
-for the amount of heap memory used by \fBpcre2_match()\fP. Further details are
-given with \fBpcre2_set_heap_limit()\fP above.
+for the amount of heap memory used by \fBpcre2_match()\fP or 
+\fBpcre2_dfa_match()\fP. Further details are given with
+\fBpcre2_set_heap_limit()\fP above.
 .sp
   PCRE2_CONFIG_JIT
 .sp
@@ -3514,17 +3531,7 @@
 Calls to the convenience functions that extract substrings by name
 return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
 DFA match. The convenience functions that extract substrings by number never
-return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
-slightly different:
-.sp
-  PCRE2_ERROR_UNAVAILABLE
-.sp
-The ovector is not big enough to include a slot for the given substring number.
-.sp
-  PCRE2_ERROR_UNSET
-.sp
-There is a slot in the ovector for this substring, but there were insufficient
-matches to fill it.
+return PCRE2_ERROR_NOSUBSTRING.
 .P
 The matched strings are stored in the ovector in reverse order of length; that
 is, the longest matching string is first. If there were too many matches to fit
@@ -3605,6 +3612,6 @@
 .rs
 .sp
 .nf
-Last updated: 31 December 2017
-Copyright (c) 1997-2017 University of Cambridge.
+Last updated: 27 April 2018
+Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2build.3
===================================================================
--- code/trunk/doc/pcre2build.3    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2build.3    2018-04-27 16:48:35 UTC (rev 932)
@@ -1,4 +1,4 @@
-.TH PCRE2BUILD 3 "25 February 2018" "PCRE2 10.32"
+.TH PCRE2BUILD 3 "26 April 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .
@@ -292,9 +292,10 @@
   --with-heap-limit=500
 .sp
 which limits the amount of heap to 500 kilobytes. This limit applies only to
-interpretive matching in pcre2_match(). It does not apply when JIT (which has
-its own memory arrangements) is used, nor does it apply to
-\fBpcre2_dfa_match()\fP.
+interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
+may also use the heap for internal workspace when processing complicated
+patterns. This limit does not apply when JIT (which has its own memory
+arrangements) is used.
 .P
 You can also explicitly limit the depth of nested backtracking in the
 \fBpcre2_match()\fP interpreter. This limit defaults to the value that is set
@@ -590,6 +591,6 @@
 .rs
 .sp
 .nf
-Last updated: 25 February 2018
+Last updated: 26 April 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2callout.3
===================================================================
--- code/trunk/doc/pcre2callout.3    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2callout.3    2018-04-27 16:48:35 UTC (rev 932)
@@ -1,4 +1,4 @@
-.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
+.TH PCRE2CALLOUT 3 "26 April 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -291,10 +291,12 @@
 PCRE2_UNSET.
 .P
 For DFA matching, the \fIoffset_vector\fP field points to the ovector that was
-passed to the matching function in the match data block, but it holds no useful
-information at callout time because \fBpcre2_dfa_match()\fP does not support
-substring capturing. The value of \fIcapture_top\fP is always 1 and the value
-of \fIcapture_last\fP is always 0 for DFA matching.
+passed to the matching function in the match data block for callouts at the top
+level, but to an internal ovector during the processing of pattern recursions,
+lookarounds, and atomic groups. However, these ovectors hold no useful
+information because \fBpcre2_dfa_match()\fP does not support substring
+capturing. The value of \fIcapture_top\fP is always 1 and the value of
+\fIcapture_last\fP is always 0 for DFA matching.
 .P
 The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
 that were passed to the matching function.
@@ -441,6 +443,6 @@
 .rs
 .sp
 .nf
-Last updated: 22 December 2017
-Copyright (c) 1997-2017 University of Cambridge.
+Last updated: 26 April 2018
+Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2pattern.3    2018-04-27 16:48:35 UTC (rev 932)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "12 September 2017" "PCRE2 10.31"
+.TH PCRE2PATTERN 3 "25 April 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -141,12 +141,12 @@
 .SS "Setting match resource limits"
 .rs
 .sp
-The pcre2_match() function contains a counter that is incremented every time it
-goes round its main loop. The caller of \fBpcre2_match()\fP can set a limit on
-this counter, which therefore limits the amount of computing resource used for
-a match. The maximum depth of nested backtracking can also be limited; this
-indirectly restricts the amount of heap memory that is used, but there is also
-an explicit memory limit that can be set.
+The \fBpcre2_match()\fP function contains a counter that is incremented every
+time it goes round its main loop. The caller of \fBpcre2_match()\fP can set a
+limit on this counter, which therefore limits the amount of computing resource
+used for a match. The maximum depth of nested backtracking can also be limited;
+this indirectly restricts the amount of heap memory that is used, but there is
+also an explicit memory limit that can be set.
 .P
 These facilities are provided to catch runaway matches that are provoked by
 patterns with huge matching trees (a typical example is a pattern with nested
@@ -162,18 +162,20 @@
 be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
 for it to have any effect. In other words, the pattern writer can lower the
 limits set by the programmer, but not raise them. If there is more than one
-setting of one of these limits, the lower value is used.
+setting of one of these limits, the lower value is used. The heap limit is 
+specified in kilobytes.
 .P
 Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
 still recognized for backwards compatibility.
 .P
-The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
-for matching. It does not apply to JIT or DFA matching. The match limit is used
-(but in a different way) when JIT is being used, or when
-\fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
-matching functions. The depth limit is ignored by JIT but is relevant for DFA
-matching, which uses function recursion for recursions within the pattern. In
-this case, the depth limit controls the amount of system stack that is used.
+The heap limit applies only when the \fBpcre2_match()\fP or
+\fBpcre2_dfa_match()\fP interpreters are used for matching. It does not apply
+to JIT. The match limit is used (but in a different way) when JIT is being
+used, or when \fBpcre2_dfa_match()\fP is called, to limit computing resource
+usage by those matching functions. The depth limit is ignored by JIT but is
+relevant for DFA matching, which uses function recursion for recursions within
+the pattern and for lookaround assertions and atomic groups. In this case, the
+depth limit controls the depth of such recursion.
 .
 .
 .\" HTML <a name="newlines"></a>
@@ -2838,10 +2840,6 @@
 matched at the top level, its final captured value is unset, even if it was
 (temporarily) set at a deeper level during the matching process.
 .P
-If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
-obtain extra memory from the heap to store data during a recursion. If no
-memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
-.P
 Do not confuse the (?R) item with the condition (R), which tests for recursion.
 Consider this pattern, which matches text in angle brackets, allowing for
 arbitrary nesting. Only digits are allowed in nested brackets (that is, when
@@ -3505,6 +3503,6 @@
 .rs
 .sp
 .nf
-Last updated: 12 September 2017
-Copyright (c) 1997-2017 University of Cambridge.
+Last updated: 25 April 2018
+Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2perform.3
===================================================================
--- code/trunk/doc/pcre2perform.3    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2perform.3    2018-04-27 16:48:35 UTC (rev 932)
@@ -1,4 +1,4 @@
-.TH PCRE2PERFORM 3 "08 April 2017" "PCRE2 10.30"
+.TH PCRE2PERFORM 3 "25 April 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 PERFORMANCE"
@@ -78,9 +78,16 @@
 .P
 In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
 function calls, but only for processing atomic groups, lookaround assertions,
-and recursion within the pattern. Too much nested recursion may cause stack
-issues. The "match depth" parameter can be used to limit the depth of function
-recursion in \fBpcre2_dfa_match()\fP.
+and recursion within the pattern. The original version of the code used to
+allocate quite large internal workspace vectors on the stack, which caused some 
+problems for some patterns in environments with small stacks. From release
+10.32 the code for \fBpcre2_dfa_match()\fP has been re-factored to use heap
+memory when necessary for internal workspace when recursing, though recursive
+function calls are still used.
+.P
+The "match depth" parameter can be used to limit the depth of function
+recursion, and the "match heap" parameter to limit heap memory in
+\fBpcre2_dfa_match()\fP.
 .
 .
 .SH "PROCESSING TIME"
@@ -232,6 +239,6 @@
 .rs
 .sp
 .nf
-Last updated: 08 April 2017
-Copyright (c) 1997-2017 University of Cambridge.
+Last updated: 25 April 2018
+Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2test.1    2018-04-27 16:48:35 UTC (rev 932)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
+.TH PCRE2TEST 1 "25 April 2018" "PCRE 10.32"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -1168,7 +1168,7 @@
       get=<number or name>       extract captured substring
       getall                     extract all captured substrings
   /g  global                     global matching
-      heap_limit=<n>             set a limit on heap memory
+      heap_limit=<n>             set a limit on heap memory (Kbytes)
       jitstack=<n>               set size of JIT stack
       mark                       show mark values
       match_limit=<n>            set a match limit
@@ -1401,24 +1401,36 @@
 .sp
 If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
 calls the relevant matching function several times, setting different values in
-the match context via \fBpcre2_set_heap_limit(), \fBpcre2_set_match_limit()\fP,
-or \fBpcre2_set_depth_limit()\fP until it finds the minimum values for each
-parameter that allows the match to complete without error.
+the match context via \fBpcre2_set_heap_limit()\fP,
+\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds
+the minimum values for each parameter that allows the match to complete without
+error. If JIT is being used, only the match limit is relevant.
 .P
-If JIT is being used, only the match limit is relevant. If DFA matching is
-being used, only the depth limit is relevant.
+When using this modifier, the pattern should not contain any limit settings 
+such as (*LIMIT_MATCH=...) within it. If such a setting is present and is 
+lower than the minimum matching value, the minimum value cannot be found 
+because \fBpcre2_set_match_limit()\fP etc. are only able to reduce the value of 
+an in-pattern limit; they cannot increase it.
 .P
-The \fImatch_limit\fP number is a measure of the amount of backtracking
-that takes place, and learning the minimum value can be instructive. For most
-simple matches, the number is quite small, but for patterns with very large
-numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string.
-.P
 For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
 much nested backtracking happens (that is, how deeply the pattern's tree is
 searched). In the case of DFA matching, \fIdepth_limit\fP controls the depth of
 recursive calls of the internal function that is used for handling pattern
 recursion, lookaround assertions, and atomic groups.
+.P
+For non-DFA matching, the \fImatch_limit\fP number is a measure of the amount
+of backtracking that takes place, and learning the minimum value can be
+instructive. For most simple matches, the number is quite small, but for
+patterns with very large numbers of matching possibilities, it can become large
+very quickly with increasing length of subject string. In the case of DFA 
+matching, \fImatch_limit\fP controls the total number of calls, both recursive 
+and non-recursive, to the internal matching function, thus controlling the 
+overall amount of computing resource that is used.
+.P
+For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes) 
+limits the amount of heap memory used for matching. A value of zero disables 
+the use of any heap memory; many simple pattern matches can be done without 
+using the heap, so this is not an unreasonable setting.
 .
 .
 .SS "Showing MARK names"
@@ -1437,13 +1449,14 @@
 .sp
 The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
 memory allocation and freeing calls that occur during a call to
-\fBpcre2_match()\fP. These occur only when a match requires a bigger vector
-than the default for remembering backtracking points. In many cases there will
-be no heap memory used and therefore no additional output. No heap memory is
-allocated during matching with \fBpcre2_dfa_match\fP or with JIT, so in those
-cases the \fBmemory\fP modifier never has any effect. For this modifier to
-work, the \fBnull_context\fP modifier must not be set on both the pattern and
-the subject, though it can be set on one or the other.
+\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. These occur only when a match
+requires a bigger vector than the default for remembering backtracking points
+(\fBpcre2_match()\fP) or for internal workspace (\fBpcre2_dfa_match()\fP). In
+many cases there will be no heap memory used and therefore no additional
+output. No heap memory is allocated during matching with JIT, so in that case
+the \fBmemory\fP modifier never has any effect. For this modifier to work, the
+\fBnull_context\fP modifier must not be set on both the pattern and the
+subject, though it can be set on one or the other.
 .
 .
 .SS "Setting a starting offset"
@@ -1962,6 +1975,6 @@
 .rs
 .sp
 .nf
-Last updated: 21 December 2017
-Copyright (c) 1997-2017 University of Cambridge.
+Last updated: 25 April 2018
+Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/doc/pcre2test.txt    2018-04-27 16:48:35 UTC (rev 932)
@@ -1071,7 +1071,7 @@
              get=<number or name>       extract captured substring
              getall                     extract all captured substrings
          /g  global                     global matching
-             heap_limit=<n>             set a limit on heap memory
+             heap_limit=<n>             set a limit on heap memory (Kbytes)
              jitstack=<n>               set size of JIT stack
              mark                       show mark values
              match_limit=<n>            set a match limit
@@ -1291,71 +1291,84 @@
        values   in   the    match    context    via    pcre2_set_heap_limit(),
        pcre2_set_match_limit(),  or pcre2_set_depth_limit() until it finds the
        minimum values for each parameter that allows  the  match  to  complete
-       without error.
+       without error. If JIT is being used, only the match limit is relevant.

-       If JIT is being used, only the match limit is relevant. If DFA matching
-       is being used, only the depth limit is relevant.
+       When using this modifier, the pattern should not contain any limit set-
+       tings such as (*LIMIT_MATCH=...)  within  it.  If  such  a  setting  is
+       present and is lower than the minimum matching value, the minimum value
+       cannot be found because pcre2_set_match_limit() etc. are only  able  to
+       reduce the value of an in-pattern limit; they cannot increase it.

-       The match_limit number is a measure of the amount of backtracking  that
-       takes  place,  and  learning  the minimum value can be instructive. For
-       most simple matches, the number is quite small, but for  patterns  with
-       very  large numbers of matching possibilities, it can become large very
-       quickly with increasing length of subject string.
-
-       For non-DFA matching, the minimum depth_limit number is  a  measure  of
+       For  non-DFA  matching,  the minimum depth_limit number is a measure of
        how much nested backtracking happens (that is, how deeply the pattern's
-       tree is searched). In the case of DFA  matching,  depth_limit  controls
-       the  depth of recursive calls of the internal function that is used for
+       tree  is  searched).  In the case of DFA matching, depth_limit controls
+       the depth of recursive calls of the internal function that is used  for
        handling pattern recursion, lookaround assertions, and atomic groups.

+       For non-DFA matching, the match_limit number is a measure of the amount
+       of backtracking that takes place, and learning the minimum value can be
+       instructive.  For  most  simple matches, the number is quite small, but
+       for patterns with very large numbers of matching possibilities, it  can
+       become  large very quickly with increasing length of subject string. In
+       the case of DFA matching, match_limit  controls  the  total  number  of
+       calls, both recursive and non-recursive, to the internal matching func-
+       tion, thus controlling the overall amount of computing resource that is
+       used.
+
+       For  both  kinds  of matching, the heap_limit number (which is in kilo-
+       bytes) limits the amount of heap memory used for matching. A  value  of
+       zero  disables  the use of any heap memory; many simple pattern matches
+       can be done without using the heap, so this is not an unreasonable set-
+       ting.
+
    Showing MARK names

        The mark modifier causes the names from backtracking control verbs that
-       are  returned from calls to pcre2_match() to be displayed. If a mark is
-       returned for a match, non-match, or partial match, pcre2test shows  it.
-       For  a  match, it is on a line by itself, tagged with "MK:". Otherwise,
+       are returned from calls to pcre2_match() to be displayed. If a mark  is
+       returned  for a match, non-match, or partial match, pcre2test shows it.
+       For a match, it is on a line by itself, tagged with  "MK:".  Otherwise,
        it is added to the non-match message.

    Showing memory usage

-       The memory modifier causes pcre2test to log the sizes of all heap  mem-
-       ory   allocation  and  freeing  calls  that  occur  during  a  call  to
-       pcre2_match(). These occur only when a match requires a  bigger  vector
-       than  the  default  for  remembering backtracking points. In many cases
-       there will be no heap memory used and therefore no  additional  output.
-       No  heap  memory  is  allocated during matching with pcre2_dfa_match or
-       with JIT, so in those cases the memory modifier never has  any  effect.
-       For this modifier to work, the null_context modifier must not be set on
-       both the pattern and the subject, though it can be set on  one  or  the
-       other.
+       The  memory modifier causes pcre2test to log the sizes of all heap mem-
+       ory  allocation  and  freeing  calls  that  occur  during  a  call   to
+       pcre2_match()  or  pcre2_dfa_match().  These  occur  only  when a match
+       requires a bigger vector than the default for remembering  backtracking
+       points  (pcre2_match())  or for internal workspace (pcre2_dfa_match()).
+       In many cases there will be no heap memory used and therefore no  addi-
+       tional output. No heap memory is allocated during matching with JIT, so
+       in that case the memory modifier never has any effect. For  this  modi-
+       fier  to  work,  the  null_context modifier must not be set on both the
+       pattern and the subject, though it can be set on one or the other.

    Setting a starting offset

-       The  offset  modifier  sets  an  offset  in the subject string at which
+       The offset modifier sets an offset  in  the  subject  string  at  which
        matching starts. Its value is a number of code units, not characters.

    Setting an offset limit

-       The offset_limit modifier sets a limit for  unanchored  matches.  If  a
+       The  offset_limit  modifier  sets  a limit for unanchored matches. If a
        match cannot be found starting at or before this offset in the subject,
        a "no match" return is given. The data value is a number of code units,
-       not  characters. When this modifier is used, the use_offset_limit modi-
+       not characters. When this modifier is used, the use_offset_limit  modi-
        fier must have been set for the pattern; if not, an error is generated.

    Setting the size of the output vector

-       The ovector modifier applies only to  the  subject  line  in  which  it
-       appears,  though  of  course  it can also be used to set a default in a
-       #subject command. It specifies the number of pairs of offsets that  are
+       The  ovector  modifier  applies  only  to  the subject line in which it
+       appears, though of course it can also be used to set  a  default  in  a
+       #subject  command. It specifies the number of pairs of offsets that are
        available for storing matching information. The default is 15.

-       A  value of zero is useful when testing the POSIX API because it causes
+       A value of zero is useful when testing the POSIX API because it  causes
        regexec() to be called with a NULL capture vector. When not testing the
-       POSIX  API,  a  value  of  zero  is used to cause pcre2_match_data_cre-
-       ate_from_pattern() to be called, in order to create a  match  block  of
+       POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
+       ate_from_pattern()  to  be  called, in order to create a match block of
        exactly the right size for the pattern. (It is not possible to create a
-       match block with a zero-length ovector; there is always  at  least  one
+       match  block  with  a zero-length ovector; there is always at least one
        pair of offsets.)

    Passing the subject as zero-terminated
@@ -1362,55 +1375,55 @@

        By default, the subject string is passed to a native API matching func-
        tion with its correct length. In order to test the facility for passing
-       a  zero-terminated  string, the zero_terminate modifier is provided. It
-       causes the length to be passed as PCRE2_ZERO_TERMINATED. When  matching
+       a zero-terminated string, the zero_terminate modifier is  provided.  It
+       causes  the length to be passed as PCRE2_ZERO_TERMINATED. When matching
        via the POSIX interface, this modifier is ignored, with a warning.

-       When  testing  pcre2_substitute(), this modifier also has the effect of
+       When testing pcre2_substitute(), this modifier also has the  effect  of
        passing the replacement string as zero-terminated.

    Passing a NULL context

-       Normally,  pcre2test  passes  a   context   block   to   pcre2_match(),
+       Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
        pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
-       set, however, NULL is passed. This is for  testing  that  the  matching
+       set,  however,  NULL  is  passed. This is for testing that the matching
        functions behave correctly in this case (they use default values). This
-       modifier cannot be used with the find_limits modifier or  when  testing
+       modifier  cannot  be used with the find_limits modifier or when testing
        the substitution function.

THE ALTERNATIVE MATCHING FUNCTION

-       By  default,  pcre2test  uses  the  standard  PCRE2  matching function,
+       By default,  pcre2test  uses  the  standard  PCRE2  matching  function,
        pcre2_match() to match each subject line. PCRE2 also supports an alter-
-       native  matching  function, pcre2_dfa_match(), which operates in a dif-
-       ferent way, and has some restrictions. The differences between the  two
+       native matching function, pcre2_dfa_match(), which operates in  a  dif-
+       ferent  way, and has some restrictions. The differences between the two
        functions are described in the pcre2matching documentation.

-       If  the dfa modifier is set, the alternative matching function is used.
-       This function finds all possible matches at a given point in  the  sub-
-       ject.  If,  however, the dfa_shortest modifier is set, processing stops
-       after the first match is found. This is always  the  shortest  possible
+       If the dfa modifier is set, the alternative matching function is  used.
+       This  function  finds all possible matches at a given point in the sub-
+       ject. If, however, the dfa_shortest modifier is set,  processing  stops
+       after  the  first  match is found. This is always the shortest possible
        match.

DEFAULT OUTPUT FROM pcre2test

-       This  section  describes  the output when the normal matching function,
+       This section describes the output when the  normal  matching  function,
        pcre2_match(), is being used.

-       When a match succeeds, pcre2test outputs  the  list  of  captured  sub-
-       strings,  starting  with number 0 for the string that matched the whole
-       pattern.   Otherwise,  it  outputs  "No  match"  when  the  return   is
-       PCRE2_ERROR_NOMATCH,  or  "Partial  match:"  followed  by the partially
-       matching substring when the return is PCRE2_ERROR_PARTIAL.  (Note  that
-       this  is  the  entire  substring  that was inspected during the partial
-       match; it may include characters before the actual  match  start  if  a
+       When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
+       strings, starting with number 0 for the string that matched  the  whole
+       pattern.    Otherwise,  it  outputs  "No  match"  when  the  return  is
+       PCRE2_ERROR_NOMATCH, or "Partial  match:"  followed  by  the  partially
+       matching  substring  when the return is PCRE2_ERROR_PARTIAL. (Note that
+       this is the entire substring that  was  inspected  during  the  partial
+       match;  it  may  include  characters before the actual match start if a
        lookbehind assertion, \K, \b, or \B was involved.)

        For any other return, pcre2test outputs the PCRE2 negative error number
-       and a short descriptive phrase. If the error is  a  failed  UTF  string
-       check,  the  code  unit offset of the start of the failing character is
+       and  a  short  descriptive  phrase. If the error is a failed UTF string
+       check, the code unit offset of the start of the  failing  character  is
        also output. Here is an example of an interactive pcre2test run.

          $ pcre2test
@@ -1426,8 +1439,8 @@
        Unset capturing substrings that are not followed by one that is set are
        not shown by pcre2test unless the allcaptures modifier is specified. In
        the following example, there are two capturing substrings, but when the
-       first  data  line is matched, the second, unset substring is not shown.
-       An "internal" unset substring is shown as "<unset>", as for the  second
+       first data line is matched, the second, unset substring is  not  shown.
+       An  "internal" unset substring is shown as "<unset>", as for the second
        data line.

            re> /(a)|(b)/
@@ -1439,11 +1452,11 @@
           1: <unset>
           2: b

-       If  the strings contain any non-printing characters, they are output as
-       \xhh escapes if the value is less than 256 and UTF  mode  is  not  set.
+       If the strings contain any non-printing characters, they are output  as
+       \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
        Otherwise they are output as \x{hh...} escapes. See below for the defi-
-       nition of non-printing characters. If the aftertext  modifier  is  set,
-       the  output  for substring 0 is followed by the the rest of the subject
+       nition  of  non-printing  characters. If the aftertext modifier is set,
+       the output for substring 0 is followed by the the rest of  the  subject
        string, identified by "0+" like this:

            re> /cat/aftertext
@@ -1451,7 +1464,7 @@
           0: cat
           0+ aract

-       If global matching is requested, the  results  of  successive  matching
+       If  global  matching  is  requested, the results of successive matching
        attempts are output in sequence, like this:

            re> /\Bi(\w\w)/g
@@ -1463,8 +1476,8 @@
           0: ipp
           1: pp

-       "No  match" is output only if the first match attempt fails. Here is an
-       example of a failure message (the offset 4 that  is  specified  by  the
+       "No match" is output only if the first match attempt fails. Here is  an
+       example  of  a  failure  message (the offset 4 that is specified by the
        offset modifier is past the end of the subject string):

            re> /xyz/
@@ -1472,7 +1485,7 @@
          Error -24 (bad offset value)

        Note that whereas patterns can be continued over several lines (a plain
-       ">" prompt is used for continuations), subject lines may  not.  However
+       ">"  prompt  is used for continuations), subject lines may not. However
        newlines can be included in a subject by means of the \n escape (or \r,
        \r\n, etc., depending on the newline sequence setting).

@@ -1480,7 +1493,7 @@
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION

        When the alternative matching function, pcre2_dfa_match(), is used, the
-       output  consists  of  a list of all the matches that start at the first
+       output consists of a list of all the matches that start  at  the  first
        point in the subject where there is at least one match. For example:

            re> /(tang|tangerine|tan)/
@@ -1489,11 +1502,11 @@
           1: tang
           2: tan

-       Using the normal matching function on this data finds only "tang".  The
-       longest  matching  string  is  always  given first (and numbered zero).
-       After a PCRE2_ERROR_PARTIAL return, the  output  is  "Partial  match:",
-       followed  by  the  partially  matching substring. Note that this is the
-       entire substring that was inspected during the partial  match;  it  may
+       Using  the normal matching function on this data finds only "tang". The
+       longest matching string is always  given  first  (and  numbered  zero).
+       After  a  PCRE2_ERROR_PARTIAL  return,  the output is "Partial match:",
+       followed by the partially matching substring. Note  that  this  is  the
+       entire  substring  that  was inspected during the partial match; it may
        include characters before the actual match start if a lookbehind asser-
        tion, \b, or \B was involved. (\K is not supported for DFA matching.)

@@ -1509,16 +1522,16 @@
           1: tan
           0: tan

-       The  alternative  matching function does not support substring capture,
-       so the modifiers that are concerned with captured  substrings  are  not
+       The alternative matching function does not support  substring  capture,
+       so  the  modifiers  that are concerned with captured substrings are not
        relevant.

RESTARTING AFTER A PARTIAL MATCH

-       When  the  alternative matching function has given the PCRE2_ERROR_PAR-
+       When the alternative matching function has given  the  PCRE2_ERROR_PAR-
        TIAL return, indicating that the subject partially matched the pattern,
-       you  can restart the match with additional subject data by means of the
+       you can restart the match with additional subject data by means of  the
        dfa_restart modifier. For example:

            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -1527,7 +1540,7 @@
          data> n05\=dfa,dfa_restart
           0: n05

-       For further information about partial matching,  see  the  pcre2partial
+       For  further  information  about partial matching, see the pcre2partial
        documentation.

@@ -1534,30 +1547,30 @@
CALLOUTS

        If the pattern contains any callout requests, pcre2test's callout func-
-       tion is called during matching unless callout_none is  specified.  This
+       tion  is  called during matching unless callout_none is specified. This
        works with both matching functions, and with JIT, though there are some
-       differences in behaviour. The output for callouts with numerical  argu-
+       differences  in behaviour. The output for callouts with numerical argu-
        ments and those with string arguments is slightly different.

    Callouts with numerical arguments

        By default, the callout function displays the callout number, the start
-       and current positions in the subject text at the callout time, and  the
+       and  current positions in the subject text at the callout time, and the
        next pattern item to be tested. For example:

          --->pqrabcdef
            0    ^  ^     \d

-       This  output  indicates  that  callout  number  0  occurred for a match
-       attempt starting at the fourth character of the  subject  string,  when
-       the  pointer  was  at  the seventh character, and when the next pattern
-       item was \d. Just one circumflex is output if  the  start  and  current
-       positions  are  the same, or if the current position precedes the start
+       This output indicates that  callout  number  0  occurred  for  a  match
+       attempt  starting  at  the fourth character of the subject string, when
+       the pointer was at the seventh character, and  when  the  next  pattern
+       item  was  \d.  Just  one circumflex is output if the start and current
+       positions are the same, or if the current position precedes  the  start
        position, which can happen if the callout is in a lookbehind assertion.

        Callouts numbered 255 are assumed to be automatic callouts, inserted as
        a result of the auto_callout pattern modifier. In this case, instead of
-       showing the callout number, the offset in the pattern,  preceded  by  a
+       showing  the  callout  number, the offset in the pattern, preceded by a
        plus, is output. For example:

            re> /\d?[A-E]\*/auto_callout
@@ -1570,7 +1583,7 @@
           0: E*

        If a pattern contains (*MARK) items, an additional line is output when-
-       ever a change of latest mark is passed to  the  callout  function.  For
+       ever  a  change  of  latest mark is passed to the callout function. For
        example:

            re> /a(*MARK:X)bc/auto_callout
@@ -1584,17 +1597,17 @@
          +12 ^  ^
           0: abc

-       The  mark  changes between matching "a" and "b", but stays the same for
-       the rest of the match, so nothing more is output. If, as  a  result  of
-       backtracking,  the  mark  reverts to being unset, the text "<unset>" is
+       The mark changes between matching "a" and "b", but stays the  same  for
+       the  rest  of  the match, so nothing more is output. If, as a result of
+       backtracking, the mark reverts to being unset, the  text  "<unset>"  is
        output.

    Callouts with string arguments

        The output for a callout with a string argument is similar, except that
-       instead  of outputting a callout number before the position indicators,
-       the callout string and its offset in  the  pattern  string  are  output
-       before  the reflection of the subject string, and the subject string is
+       instead of outputting a callout number before the position  indicators,
+       the  callout  string  and  its  offset in the pattern string are output
+       before the reflection of the subject string, and the subject string  is
        reflected for each callout. For example:

            re> /^ab(?C'first')cd(?C"second")ef/
@@ -1610,26 +1623,26 @@

    Callout modifiers

-       The callout function in pcre2test returns zero (carry on  matching)  by
-       default,  but  you can use a callout_fail modifier in a subject line to
+       The  callout  function in pcre2test returns zero (carry on matching) by
+       default, but you can use a callout_fail modifier in a subject  line  to
        change this and other parameters of the callout (see below).

        If the callout_capture modifier is set, the current captured groups are
        output when a callout occurs. This is useful only for non-DFA matching,
-       as pcre2_dfa_match() does not support capturing,  so  no  captures  are
+       as  pcre2_dfa_match()  does  not  support capturing, so no captures are
        ever shown.

        The normal callout output, showing the callout number or pattern offset
-       (as described above) is suppressed if the callout_no_where modifier  is
+       (as  described above) is suppressed if the callout_no_where modifier is
        set.

-       When  using  the  interpretive  matching function pcre2_match() without
-       JIT, setting the callout_extra modifier causes additional  output  from
-       pcre2test's  callout function to be generated. For the first callout in
-       a match attempt at a new starting position in the subject,  "New  match
-       attempt"  is output. If there has been a backtrack since the last call-
+       When using the interpretive  matching  function  pcre2_match()  without
+       JIT,  setting  the callout_extra modifier causes additional output from
+       pcre2test's callout function to be generated. For the first callout  in
+       a  match  attempt at a new starting position in the subject, "New match
+       attempt" is output. If there has been a backtrack since the last  call-
        out (or start of matching if this is the first callout), "Backtrack" is
-       output,  followed  by  "No other matching paths" if the backtrack ended
+       output, followed by "No other matching paths" if  the  backtrack  ended
        the previous match attempt. For example:

           re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
@@ -1666,39 +1679,39 @@
           +1    ^    a+
          No match

-       Notice that various optimizations must be turned off if  you  want  all
-       possible  matching  paths  to  be  scanned. If no_start_optimize is not
-       used, there is an immediate "no match", without any  callouts,  because
-       the  starting  optimization  fails to find "b" in the subject, which it
-       knows must be present for any match. If no_auto_possess  is  not  used,
-       the  "a+"  item is turned into "a++", which reduces the number of back-
+       Notice  that  various  optimizations must be turned off if you want all
+       possible matching paths to be  scanned.  If  no_start_optimize  is  not
+       used,  there  is an immediate "no match", without any callouts, because
+       the starting optimization fails to find "b" in the  subject,  which  it
+       knows  must  be  present for any match. If no_auto_possess is not used,
+       the "a+" item is turned into "a++", which reduces the number  of  back-
        tracks.

-       The callout_extra modifier has no effect if used with the DFA  matching
+       The  callout_extra modifier has no effect if used with the DFA matching
        function, or with JIT.

    Return values from callouts

-       The  default  return  from  the  callout function is zero, which allows
+       The default return from the callout  function  is  zero,  which  allows
        matching to continue. The callout_fail modifier can be given one or two
        numbers. If there is only one number, 1 is returned instead of 0 (caus-
        ing matching to backtrack) when a callout of that number is reached. If
-       two  numbers  (<n>:<m>)  are  given,  1 is returned when callout <n> is
-       reached and there have been at least <m>  callouts.  The  callout_error
+       two numbers (<n>:<m>) are given, 1 is  returned  when  callout  <n>  is
+       reached  and  there  have been at least <m> callouts. The callout_error
        modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
-       ing the entire matching process to be aborted. If both these  modifiers
-       are  set  for  the same callout number, callout_error takes precedence.
-       Note that callouts with string arguments are always  given  the  number
+       ing  the entire matching process to be aborted. If both these modifiers
+       are set for the same callout number,  callout_error  takes  precedence.
+       Note  that  callouts  with string arguments are always given the number
        zero.

-       The  callout_data  modifier can be given an unsigned or a negative num-
-       ber.  This is set as the "user data" that is  passed  to  the  matching
-       function,  and  passed  back  when the callout function is invoked. Any
-       value other than zero is used as  a  return  from  pcre2test's  callout
+       The callout_data modifier can be given an unsigned or a  negative  num-
+       ber.   This  is  set  as the "user data" that is passed to the matching
+       function, and passed back when the callout  function  is  invoked.  Any
+       value  other  than  zero  is  used as a return from pcre2test's callout
        function.

        Inserting callouts can be helpful when using pcre2test to check compli-
-       cated regular expressions. For further information about callouts,  see
+       cated  regular expressions. For further information about callouts, see
        the pcre2callout documentation.

@@ -1705,43 +1718,43 @@
NON-PRINTING CHARACTERS

        When pcre2test is outputting text in the compiled version of a pattern,
-       bytes other than 32-126 are always treated as  non-printing  characters
+       bytes  other  than 32-126 are always treated as non-printing characters
        and are therefore shown as hex escapes.

-       When  pcre2test  is outputting text that is a matched part of a subject
-       string, it behaves in the same way, unless a different locale has  been
-       set  for  the  pattern  (using  the locale modifier). In this case, the
-       isprint() function is used to  distinguish  printing  and  non-printing
+       When pcre2test is outputting text that is a matched part of  a  subject
+       string,  it behaves in the same way, unless a different locale has been
+       set for the pattern (using the locale  modifier).  In  this  case,  the
+       isprint()  function  is  used  to distinguish printing and non-printing
        characters.

SAVING AND RESTORING COMPILED PATTERNS

-       It  is  possible  to  save  compiled patterns on disc or elsewhere, and
+       It is possible to save compiled patterns  on  disc  or  elsewhere,  and
        reload them later, subject to a number of restrictions. JIT data cannot
-       be  saved.  The host on which the patterns are reloaded must be running
+       be saved. The host on which the patterns are reloaded must  be  running
        the same version of PCRE2, with the same code unit width, and must also
-       have  the  same  endianness,  pointer width and PCRE2_SIZE type. Before
-       compiled patterns can be saved they must be serialized, that  is,  con-
-       verted  to a stream of bytes. A single byte stream may contain any num-
-       ber of compiled patterns, but they must  all  use  the  same  character
+       have the same endianness, pointer width  and  PCRE2_SIZE  type.  Before
+       compiled  patterns  can be saved they must be serialized, that is, con-
+       verted to a stream of bytes. A single byte stream may contain any  num-
+       ber  of  compiled  patterns,  but  they must all use the same character
        tables. A single copy of the tables is included in the byte stream (its
        size is 1088 bytes).

-       The functions whose names begin  with  pcre2_serialize_  are  used  for
-       serializing  and de-serializing. They are described in the pcre2serial-
+       The  functions  whose  names  begin  with pcre2_serialize_ are used for
+       serializing and de-serializing. They are described in the  pcre2serial-
        ize  documentation.  In  this  section  we  describe  the  features  of
        pcre2test that can be used to test these functions.

-       When  a  pattern  with  push  modifier  is successfully compiled, it is
-       pushed onto a stack of compiled patterns,  and  pcre2test  expects  the
-       next  line  to  contain a new pattern (or command) instead of a subject
-       line. By contrast, the pushcopy modifier causes a copy of the  compiled
-       pattern  to  be  stacked,  leaving the original available for immediate
-       matching. By using push and/or pushcopy, a number of  patterns  can  be
+       When a pattern with push  modifier  is  successfully  compiled,  it  is
+       pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
+       next line to contain a new pattern (or command) instead  of  a  subject
+       line.  By contrast, the pushcopy modifier causes a copy of the compiled
+       pattern to be stacked, leaving the  original  available  for  immediate
+       matching.  By  using  push and/or pushcopy, a number of patterns can be
        compiled and retained. These modifiers are incompatible with posix, and
-       control modifiers that act at match time are ignored (with  a  message)
-       for  the  stacked patterns. The jitverify modifier applies only at com-
+       control  modifiers  that act at match time are ignored (with a message)
+       for the stacked patterns. The jitverify modifier applies only  at  com-
        pile time.

        The command
@@ -1749,21 +1762,21 @@
          #save <filename>

        causes all the stacked patterns to be serialized and the result written
-       to  the named file. Afterwards, all the stacked patterns are freed. The
+       to the named file. Afterwards, all the stacked patterns are freed.  The
        command

          #load <filename>

-       reads the data in the file, and then arranges for it to  be  de-serial-
-       ized,  with the resulting compiled patterns added to the pattern stack.
-       The pattern on the top of the stack can be retrieved by the  #pop  com-
-       mand,  which  must  be  followed  by  lines  of subjects that are to be
-       matched with the pattern, terminated as usual by an empty line  or  end
-       of  file.  This  command  may be followed by a modifier list containing
-       only control modifiers that act after a pattern has been  compiled.  In
+       reads  the  data in the file, and then arranges for it to be de-serial-
+       ized, with the resulting compiled patterns added to the pattern  stack.
+       The  pattern  on the top of the stack can be retrieved by the #pop com-
+       mand, which must be followed by  lines  of  subjects  that  are  to  be
+       matched  with  the pattern, terminated as usual by an empty line or end
+       of file. This command may be followed by  a  modifier  list  containing
+       only  control  modifiers that act after a pattern has been compiled. In
        particular,  hex,  posix,  posix_nosub,  push,  and  pushcopy  are  not
-       allowed, nor are any option-setting modifiers.  The JIT modifiers  are,
-       however  permitted.  Here is an example that saves and reloads two pat-
+       allowed,  nor are any option-setting modifiers.  The JIT modifiers are,
+       however permitted. Here is an example that saves and reloads  two  pat-
        terns.

          /abc/push
@@ -1776,10 +1789,10 @@
          #pop jit,bincode
          abc

-       If jitverify is used with #pop, it does not  automatically  imply  jit,
+       If  jitverify  is  used with #pop, it does not automatically imply jit,
        which is different behaviour from when it is used on a pattern.

-       The  #popcopy  command is analagous to the pushcopy modifier in that it
+       The #popcopy command is analagous to the pushcopy modifier in  that  it
        makes current a copy of the topmost stack pattern, leaving the original
        still on the stack.

@@ -1799,5 +1812,5 @@

REVISION

-       Last updated: 21 December 2017
-       Copyright (c) 1997-2017 University of Cambridge.
+       Last updated: 25 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.

Modified: code/trunk/src/config.h.in
===================================================================
--- code/trunk/src/config.h.in    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/src/config.h.in    2018-04-27 16:48:35 UTC (rev 932)
@@ -132,8 +132,9 @@
 /* Define to 1 if you have the <zlib.h> header file. */
 #undef HAVE_ZLIB_H

-/* This limits the amount of memory that pcre2_match() may use while matching
- a pattern. The value is in kilobytes. */
+/* This limits the amount of memory that may be used while matching a pattern.
+ It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
+ to JIT matching. The value is in kilobytes. */
#undef HEAP_LIMIT

/* The value of LINK_SIZE determines the number of bytes used to store links
@@ -148,7 +149,8 @@

 /* The value of MATCH_LIMIT determines the default number of times the
    pcre2_match() function can record a backtrack position during a single
-   matching attempt. There is a runtime interface for setting a different
+   matching attempt. The value is also used to limit a loop counter in
+   pcre2_dfa_match(). There is a runtime interface for setting a different
    limit. The limit exists in order to catch runaway regular expressions that
    take for ever to determine that they do not match. The default is set very
    large so that it does not accidentally catch legitimate cases. */
@@ -161,7 +163,9 @@
    MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
    must be less than the value of MATCH_LIMIT. The default is to use the same
    value as MATCH_LIMIT. There is a runtime method for setting a different
-   limit. */
+   limit. In the case of pcre2_dfa_match(), this limit controls the depth of
+   the internal nested function calls that are used for pattern recursions,
+   lookarounds, and atomic groups. */
 #undef MATCH_LIMIT_DEPTH

/* This limit is parameterized just in case anybody ever wants to change it.

Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/src/pcre2_dfa_match.c    2018-04-27 16:48:35 UTC (rev 932)
@@ -292,7 +292,36 @@
 #define INTS_PER_STATEBLOCK  (int)(sizeof(stateblock)/sizeof(int))

+/* Before version 10.32 the recursive calls of internal_dfa_match() were passed
+local working space and output vectors that were created on the stack. This has
+caused issues for some patterns, especially in small-stack environments such as
+Windows. A new scheme is now in use which sets up a vector on the stack, but if
+this is too small, heap memory is used, up to the heap_limit. The main
+parameters are all numbers of ints because the workspace is a vector of ints.

+The size of the starting stack vector, DFA_START_RWS_SIZE, is in bytes, and is
+defined in pcre2_internal.h so as to be available to pcre2test when it is
+finding the minimum heap requirement for a match. */
+
+#define OVEC_UNIT  (sizeof(PCRE2_SIZE)/sizeof(int))
+
+#define RWS_BASE_SIZE   (DFA_START_RWS_SIZE/sizeof(int))  /* Stack vector */
+#define RWS_RSIZE       1000                    /* Work size for recursion */
+#define RWS_OVEC_RSIZE  (1000*OVEC_UNIT)        /* Ovector for recursion */
+#define RWS_OVEC_OSIZE  (2*OVEC_UNIT)           /* Ovector in other cases */
+
+/* This structure is at the start of each workspace block. */
+
+typedef struct RWS_anchor {
+  struct RWS_anchor *next;
+  unsigned int size;  /* Number of ints */
+  unsigned int free;  /* Number of ints */
+} RWS_anchor;
+
+#define RWS_ANCHOR_SIZE (sizeof(RWS_anchor)/sizeof(int))
+
+
+
 /*************************************************
 *               Process a callout                *
 *************************************************/
@@ -354,6 +383,61 @@

 /*************************************************
+*         Expand local workspace memory          *
+*************************************************/
+
+/* This function is called when internal_dfa_match() is about to be called
+recursively and there is insufficient workingspace left in the current work
+space block. If there's an existing next block, use it; otherwise get a new
+block unless the heap limit is reached.
+
+Arguments:
+  rwsptr     pointer to block pointer (updated)
+  ovecsize   space needed for an ovector
+  mb         the match block
+
+Returns:     0 rwsptr has been updated
+            !0 an error code
+*/
+
+static int
+more_workspace(RWS_anchor **rwsptr, unsigned int ovecsize, dfa_match_block *mb)
+{
+RWS_anchor *rws = *rwsptr;
+RWS_anchor *new;
+
+if (rws->next != NULL)
+  {
+  new = rws->next;
+  }
+
+/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
+kilobytes. */
+
+else
+  {
+  unsigned int newsize = rws->size * 2;
+  unsigned int heapleft = (unsigned int)
+    (((1024/sizeof(int))*mb->heap_limit - mb->heap_used));
+  if (newsize > heapleft) newsize = heapleft;
+  if (newsize < RWS_RSIZE + ovecsize + RWS_ANCHOR_SIZE)
+    return PCRE2_ERROR_HEAPLIMIT;
+  new = mb->memctl.malloc(newsize*sizeof(int), mb->memctl.memory_data);
+  if (new == NULL) return PCRE2_ERROR_NOMEMORY;
+  mb->heap_used += newsize;
+  new->next = NULL;
+  new->size = newsize;
+  rws->next = new;
+  }
+
+new->free = new->size - RWS_ANCHOR_SIZE;
+*rwsptr = new;
+return 0;
+}
+
+
+
+/*************************************************
 *     Match a Regular Expression - DFA engine    *
 *************************************************/

@@ -431,7 +515,8 @@
   uint32_t offsetcount,
   int *workspace,
   int wscount,
-  uint32_t  rlevel)
+  uint32_t rlevel,
+  int *RWS)
 {
 stateblock *active_states, *new_states, *temp_states;
 stateblock *next_active_state, *next_new_state;
@@ -2587,11 +2672,23 @@
       case OP_ASSERTBACK:
       case OP_ASSERTBACK_NOT:
         {
+        int rc;
+        int *local_workspace;
+        PCRE2_SIZE *local_offsets;
         PCRE2_SPTR endasscode = code + GET(code, 1);
-        PCRE2_SIZE local_offsets[2];
-        int rc;
-        int local_workspace[1000];
+        RWS_anchor *rws = (RWS_anchor *)RWS;

+        if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
+          {
+          rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
+          if (rc != 0) return rc;
+          RWS = (int *)rws;
+          }
+
+        local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
+        local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
+        rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
+
         while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);

         rc = internal_dfa_match(
@@ -2600,11 +2697,14 @@
           ptr,                                  /* where we currently are */
           (PCRE2_SIZE)(ptr - start_subject),    /* start offset */
           local_offsets,                        /* offset vector */
-          sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
+          RWS_OVEC_OSIZE/OVEC_UNIT,             /* size of same */
           local_workspace,                      /* workspace vector */
-          sizeof(local_workspace)/sizeof(int),  /* size of same */
-          rlevel);                              /* function recursion level */
+          RWS_RSIZE,                            /* size of same */
+          rlevel,                               /* function recursion level */
+          RWS);                                 /* recursion workspace */

+        rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
+
         if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
         if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
             { ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
@@ -2670,12 +2770,24 @@

         else
           {
-          PCRE2_SIZE local_offsets[2];
-          int local_workspace[1000];
           int rc;
+          int *local_workspace;
+          PCRE2_SIZE *local_offsets;
           PCRE2_SPTR asscode = code + LINK_SIZE + 1;
           PCRE2_SPTR endasscode = asscode + GET(asscode, 1);
+          RWS_anchor *rws = (RWS_anchor *)RWS;

+          if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
+            {
+            rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
+            if (rc != 0) return rc;
+            RWS = (int *)rws;
+            }
+
+          local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
+          local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
+          rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
+
           while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);

           rc = internal_dfa_match(
@@ -2684,11 +2796,14 @@
             ptr,                                  /* where we currently are */
             (PCRE2_SIZE)(ptr - start_subject),    /* start offset */
             local_offsets,                        /* offset vector */
-            sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
+            RWS_OVEC_OSIZE/OVEC_UNIT,             /* size of same */
             local_workspace,                      /* workspace vector */
-            sizeof(local_workspace)/sizeof(int),  /* size of same */
-            rlevel);                              /* function recursion level */
+            RWS_RSIZE,                            /* size of same */
+            rlevel,                               /* function recursion level */
+            RWS);                                 /* recursion work space */

+          rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
+
           if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
           if ((rc >= 0) ==
                 (condcode == OP_ASSERT || condcode == OP_ASSERTBACK))
@@ -2702,14 +2817,26 @@
       /*-----------------------------------------------------------------*/
       case OP_RECURSE:
         {
+        int rc;
+        int *local_workspace;
+        PCRE2_SIZE *local_offsets;
+        RWS_anchor *rws = (RWS_anchor *)RWS;
         dfa_recursion_info *ri;
-        PCRE2_SIZE local_offsets[1000];
-        int local_workspace[1000];
         PCRE2_SPTR callpat = start_code + GET(code, 1);
         uint32_t recno = (callpat == mb->start_code)? 0 :
           GET2(callpat, 1 + LINK_SIZE);
-        int rc;

+        if (rws->free < RWS_RSIZE + RWS_OVEC_RSIZE)
+          {
+          rc = more_workspace(&rws, RWS_OVEC_RSIZE, mb);
+          if (rc != 0) return rc;
+          RWS = (int *)rws;
+          }
+
+        local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
+        local_workspace = ((int *)local_offsets) + RWS_OVEC_RSIZE;
+        rws->free -= RWS_RSIZE + RWS_OVEC_RSIZE;
+
         /* Check for repeating a recursion without advancing the subject
         pointer. This should catch convoluted mutual recursions. (Some simple
         cases are caught at compile time.) */
@@ -2732,11 +2859,13 @@
           ptr,                                  /* where we currently are */
           (PCRE2_SIZE)(ptr - start_subject),    /* start offset */
           local_offsets,                        /* offset vector */
-          sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
+          RWS_OVEC_RSIZE/OVEC_UNIT,             /* size of same */
           local_workspace,                      /* workspace vector */
-          sizeof(local_workspace)/sizeof(int),  /* size of same */
-          rlevel);                              /* function recursion level */
+          RWS_RSIZE,                            /* size of same */
+          rlevel,                               /* function recursion level */
+          RWS);                                 /* recursion workspace */

+        rws->free += RWS_RSIZE + RWS_OVEC_RSIZE;
         mb->recursive = new_recursive.prevrec;  /* Done this recursion */

         /* Ran out of internal offsets */
@@ -2782,10 +2911,25 @@
       case OP_SCBRAPOS:
       case OP_BRAPOSZERO:
         {
+        int rc;
+        int *local_workspace;
+        PCRE2_SIZE *local_offsets;
         PCRE2_SIZE charcount, matched_count;
         PCRE2_SPTR local_ptr = ptr;
+        RWS_anchor *rws = (RWS_anchor *)RWS;
         BOOL allow_zero;

+        if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
+          {
+          rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
+          if (rc != 0) return rc;
+          RWS = (int *)rws;
+          }
+
+        local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
+        local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
+        rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
+
         if (codevalue == OP_BRAPOSZERO)
           {
           allow_zero = TRUE;
@@ -2798,19 +2942,17 @@

         for (matched_count = 0;; matched_count++)
           {
-          PCRE2_SIZE local_offsets[2];
-          int local_workspace[1000];
-
-          int rc = internal_dfa_match(
+          rc = internal_dfa_match(
             mb,                                   /* fixed match data */
             code,                                 /* this subexpression's code */
             local_ptr,                            /* where we currently are */
             (PCRE2_SIZE)(ptr - start_subject),    /* start offset */
             local_offsets,                        /* offset vector */
-            sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
+            RWS_OVEC_OSIZE/OVEC_UNIT,             /* size of same */
             local_workspace,                      /* workspace vector */
-            sizeof(local_workspace)/sizeof(int),  /* size of same */
-            rlevel);                              /* function recursion level */
+            RWS_RSIZE,                            /* size of same */
+            rlevel,                               /* function recursion level */
+            RWS);                                 /* recursion workspace */

           /* Failed to match */

@@ -2827,6 +2969,8 @@
           local_ptr += charcount;    /* Advance temporary position ptr */
           }

+        rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
+
         /* At this point we have matched the subpattern matched_count
         times, and local_ptr is pointing to the character after the end of the
         last match. */
@@ -2869,20 +3013,36 @@
       /*-----------------------------------------------------------------*/
       case OP_ONCE:
         {
-        PCRE2_SIZE local_offsets[2];
-        int local_workspace[1000];
+        int rc;
+        int *local_workspace;
+        PCRE2_SIZE *local_offsets;
+        RWS_anchor *rws = (RWS_anchor *)RWS;

-        int rc = internal_dfa_match(
+        if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
+          {
+          rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
+          if (rc != 0) return rc;
+          RWS = (int *)rws;
+          }
+
+        local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
+        local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
+        rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
+
+        rc = internal_dfa_match(
           mb,                                   /* fixed match data */
           code,                                 /* this subexpression's code */
           ptr,                                  /* where we currently are */
           (PCRE2_SIZE)(ptr - start_subject),    /* start offset */
           local_offsets,                        /* offset vector */
-          sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
+          RWS_OVEC_OSIZE/OVEC_UNIT,             /* size of same */
           local_workspace,                      /* workspace vector */
-          sizeof(local_workspace)/sizeof(int),  /* size of same */
-          rlevel);                              /* function recursion level */
+          RWS_RSIZE,                            /* size of same */
+          rlevel,                               /* function recursion level */
+          RWS);                                 /* recursion workspace */

+        rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
+
         if (rc >= 0)
           {
           PCRE2_SPTR end_subpattern = code;
@@ -3063,6 +3223,7 @@
   PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data,
   pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
 {
+int rc;
 const pcre2_real_code *re = (const pcre2_real_code *)code;

PCRE2_SPTR start_match;
@@ -3071,9 +3232,9 @@
PCRE2_SPTR req_cu_ptr;

BOOL utf, anchored, startline, firstline;
-
BOOL has_first_cu = FALSE;
BOOL has_req_cu = FALSE;
+
PCRE2_UCHAR first_cu = 0;
PCRE2_UCHAR first_cu2 = 0;
PCRE2_UCHAR req_cu = 0;
@@ -3088,6 +3249,17 @@
dfa_match_block actual_match_block;
dfa_match_block *mb = &actual_match_block;

+/* Set up a starting block of memory for use during recursive calls to
+internal_dfa_match(). By putting this on the stack, it minimizes resource use
+in the case when it is not needed. If this is too small, more memory is
+obtained from the heap. At the start of each block is an anchor structure.*/
+
+int base_recursion_workspace[RWS_BASE_SIZE];
+RWS_anchor *rws = (RWS_anchor *)base_recursion_workspace;
+rws->next = NULL;
+rws->size = RWS_BASE_SIZE;
+rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
+
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */

@@ -3184,6 +3356,7 @@
mb->memctl = re->memctl;
mb->match_limit = PRIV(default_match_context).match_limit;
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
+ mb->heap_limit = PRIV(default_match_context).heap_limit;
}
else
{
@@ -3198,6 +3371,7 @@
mb->memctl = mcontext->memctl;
mb->match_limit = mcontext->match_limit;
mb->match_limit_depth = mcontext->depth_limit;
+ mb->heap_limit = mcontext->heap_limit;
}

if (mb->match_limit > re->limit_match)
@@ -3206,6 +3380,9 @@
if (mb->match_limit_depth > re->limit_depth)
mb->match_limit_depth = re->limit_depth;

+if (mb->heap_limit > re->limit_heap)
+ mb->heap_limit = re->limit_heap;
+
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
re->name_count * re->name_entry_size;
mb->tables = re->tables;
@@ -3215,6 +3392,7 @@
mb->moptions = options;
mb->poptions = re->overall_options;
mb->match_call_count = 0;
+mb->heap_used = 0;

/* Process the \R and newline settings. */

@@ -3351,8 +3529,6 @@

for (;;)
{
- int rc;
-
/* ----------------- Start of match optimizations ---------------- */

   /* There are some optimizations that avoid running the match if a known
@@ -3544,7 +3720,7 @@
       in characters, we treat it as code units to avoid spending too much time
       in this optimization. */

-      if (end_subject - start_match < re->minlength) return PCRE2_ERROR_NOMATCH;
+      if (end_subject - start_match < re->minlength) goto NOMATCH_EXIT;

       /* If req_cu is set, we know that that code unit must appear in the
       subject for the match to succeed. If the first code unit is set, req_cu
@@ -3621,7 +3797,8 @@
     (uint32_t)match_data->oveccount * 2,  /* actual size of same */
     workspace,                    /* workspace vector */
     (int)wscount,                 /* size of same */
-    0);                           /* function recurse level */
+    0,                            /* function recurse level */
+    base_recursion_workspace);    /* initial workspace for recursion */

   /* Anything other than "no match" means we are done, always; otherwise, carry
   on only if not anchored. */
@@ -3637,7 +3814,7 @@
     match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
     match_data->startchar = (PCRE2_SIZE)(start_match - subject);
     match_data->rc = rc;
-    return rc;
+    goto EXIT;
     }

/* Advance to the next subject character unless we are at the end of a line
@@ -3668,8 +3845,18 @@

} /* "Bumpalong" loop */

+NOMATCH_EXIT:
+rc = PCRE2_ERROR_NOMATCH;

-return PCRE2_ERROR_NOMATCH;
+EXIT:
+while (rws->next != NULL)
+ {
+ RWS_anchor *next = rws->next;
+ rws->next = next->next;
+ mb->memctl.free(next, mb->memctl.memory_data);
+ }
+
+return rc;
}

/* End of pcre2_dfa_match.c */

Modified: code/trunk/src/pcre2_internal.h
===================================================================
--- code/trunk/src/pcre2_internal.h    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/src/pcre2_internal.h    2018-04-27 16:48:35 UTC (rev 932)
@@ -253,6 +253,11 @@

#define START_FRAMES_SIZE 20480

+/* Similarly, for DFA matching, an initial internal workspace vector is
+allocated on the stack. */
+
+#define DFA_START_RWS_SIZE 30720
+
/* Define the default BSR convention. */

#ifdef BSR_ANYCRLF

Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/src/pcre2_intmodedep.h    2018-04-27 16:48:35 UTC (rev 932)
@@ -896,6 +896,8 @@
   PCRE2_SPTR last_used_ptr;       /* Latest consulted character */
   const uint8_t *tables;          /* Character tables */
   PCRE2_SIZE start_offset;        /* The start offset value */
+  PCRE2_SIZE heap_limit;          /* As it says */
+  PCRE2_SIZE heap_used;           /* As it says */ 
   uint32_t match_limit;           /* As it says */
   uint32_t match_limit_depth;     /* As it says */
   uint32_t match_call_count;      /* Number of calls of internal function */

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/src/pcre2test.c    2018-04-27 16:48:35 UTC (rev 932)
@@ -5760,6 +5760,8 @@

 for (;;)
   {
+  uint32_t stack_start = 0;
+
   if (errnumber == PCRE2_ERROR_HEAPLIMIT)
     {
     PCRE2_SET_HEAP_LIMIT(dat_context, mid);
@@ -5775,6 +5777,7 @@

   if ((dat_datctl.control & CTL_DFA) != 0)
     {
+    stack_start = DFA_START_RWS_SIZE/1024;
     if (dfa_workspace == NULL)
       dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
     if (dfa_matched++ == 0)
@@ -5789,11 +5792,21 @@
       dat_datctl.options, match_data, PTR(dat_context));

   else
+    {
+    stack_start = START_FRAMES_SIZE/1024;
     PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
       dat_datctl.options, match_data, PTR(dat_context));
+    }

   if (capcount == errnumber)
     {
+    if ((mid & 0x80000000u) != 0)
+      {
+      fprintf(outfile, "Can't find minimum %s limit: check pattern for "
+        "restriction\n", msg);
+      break;
+      }
+
     min = mid;
     mid = (mid == max - 1)? max : (max != UINT32_MAX)? (min + max)/2 : mid*2;
     }
@@ -5802,11 +5815,12 @@
            capcount == PCRE2_ERROR_PARTIAL)
     {
     /* If we've not hit the error with a heap limit less than the size of the
-    initial stack frame vector, the heap is not being used, so the minimum
-    limit is zero; there's no need to go on. The other limits are always
-    greater than zero. */
+    initial stack frame vector (for pcre2_match()) or the initial stack
+    workspace vector (for pcre2_dfa_match()), the heap is not being used, so
+    the minimum limit is zero; there's no need to go on. The other limits are
+    always greater than zero. */

-    if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < START_FRAMES_SIZE/1024)
+    if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < stack_start)
       {
       fprintf(outfile, "Minimum %s limit = 0\n", msg);
       break;
@@ -6771,7 +6785,7 @@
         PCRE2_SIZE end = pmatch[i].rm_eo;
         for (j = last_printed + 1; j < i; j++)
           fprintf(outfile, "%2d: <unset>\n", (int)j);
-        last_printed = i; 
+        last_printed = i;
         if (start > end)
           {
           start = pmatch[i].rm_eo;
@@ -7139,18 +7153,16 @@
         (double)CLOCKS_PER_SEC);
     }

- /* Find the heap, match and depth limits if requested. The match and heap
- limits are not relevant for DFA matching and the depth and heap limits are
- not relevant for JIT. The return from check_match_limit() is the return from
- the final call to pcre2_match() or pcre2_dfa_match(). */
+ /* Find the heap, match and depth limits if requested. The depth and heap
+ limits are not relevant for JIT. The return from check_match_limit() is the
+ return from the final call to pcre2_match() or pcre2_dfa_match(). */

   if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
     {
     capcount = 0;  /* This stops compiler warnings */

-    if ((dat_datctl.control & CTL_DFA) == 0 &&
-        (FLD(compiled_code, executable_jit) == NULL ||
-          (dat_datctl.options & PCRE2_NO_JIT) != 0))
+    if (FLD(compiled_code, executable_jit) == NULL ||
+          (dat_datctl.options & PCRE2_NO_JIT) != 0)
       {
       (void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
       }
@@ -7165,6 +7177,12 @@
       capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT,
         "depth");
       }
+       
+    if (capcount == 0)
+      {
+      fprintf(outfile, "Matched, but offsets vector is too small to show all matches\n");
+      capcount = dat_datctl.oveccount;
+      }
     }

   /* Otherwise just run a single match, setting up a callout if required (the
@@ -7877,7 +7895,7 @@
 (void)PCRE2_CONFIG(PCRE2_CONFIG_NEWLINE, &optval);
 print_newline_config(optval, FALSE);
 (void)PCRE2_CONFIG(PCRE2_CONFIG_BSR, &optval);
-printf("  \\R matches %s\n", 
+printf("  \\R matches %s\n",
   (optval == PCRE2_BSR_ANYCRLF)? "CR, LF, or CRLF only" :
                                  "all Unicode newlines");
 (void)PCRE2_CONFIG(PCRE2_CONFIG_NEVER_BACKSLASH_C, &optval);

Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/testdata/testinput6    2018-04-27 16:48:35 UTC (rev 932)
@@ -4874,6 +4874,14 @@
 \= Expect depth limit exceeded
     a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]

+/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
+\= Expect heap limit exceeded
+    a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
+
+/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
+\= Expect success
+    a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
+
 /(02-)?[0-9]{3}-[0-9]{3}/
     02-123-123

Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2018-04-21 16:43:49 UTC (rev 931)
+++ code/trunk/testdata/testoutput6    2018-04-27 16:48:35 UTC (rev 932)
@@ -7667,6 +7667,16 @@
     a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
 Failed: error -53: matching depth limit exceeded

+/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
+\= Expect heap limit exceeded
+    a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
+Failed: error -63: heap limit exceeded
+
+/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
+\= Expect success
+    a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
+ 0: a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
+
 /(02-)?[0-9]{3}-[0-9]{3}/
     02-123-123
  0: 02-123-123
@@ -7673,6 +7683,7 @@

 /^(a(?2))(b)(?1)/
     abbab\=find_limits 
+Minimum heap limit = 0
 Minimum match limit = 4
 Minimum depth limit = 2
  0: abbab

[Pcre-svn] [932] code/trunk: Re-factor pcre2_dfa_match() to …