[Pcre-svn] [716] code/trunk/doc: Documentation update.

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [716] code/trunk/doc: Documentation update.
Revision: 716
          http://www.exim.org/viewvc/pcre2?view=rev&revision=716
Author:   ph10
Date:     2017-03-29 18:18:08 +0100 (Wed, 29 Mar 2017)
Log Message:
-----------
Documentation update.


Modified Paths:
--------------
    code/trunk/doc/html/pcre2build.html
    code/trunk/doc/html/pcre2callout.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2build.3
    code/trunk/doc/pcre2callout.3


Modified: code/trunk/doc/html/pcre2build.html
===================================================================
--- code/trunk/doc/html/pcre2build.html    2017-03-29 08:12:32 UTC (rev 715)
+++ code/trunk/doc/html/pcre2build.html    2017-03-29 17:18:08 UTC (rev 716)
@@ -23,18 +23,18 @@
 <li><a name="TOC8" href="#SEC8">NEWLINE RECOGNITION</a>
 <li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
 <li><a name="TOC10" href="#SEC10">HANDLING VERY LARGE PATTERNS</a>
-<li><a name="TOC11" href="#SEC11">AVOIDING EXCESSIVE STACK USAGE</a>
-<li><a name="TOC12" href="#SEC12">LIMITING PCRE2 RESOURCE USAGE</a>
-<li><a name="TOC13" href="#SEC13">CREATING CHARACTER TABLES AT BUILD TIME</a>
-<li><a name="TOC14" href="#SEC14">USING EBCDIC CODE</a>
-<li><a name="TOC15" href="#SEC15">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a>
-<li><a name="TOC16" href="#SEC16">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
-<li><a name="TOC17" href="#SEC17">PCRE2GREP BUFFER SIZE</a>
-<li><a name="TOC18" href="#SEC18">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
-<li><a name="TOC19" href="#SEC19">INCLUDING DEBUGGING CODE</a>
-<li><a name="TOC20" href="#SEC20">DEBUGGING WITH VALGRIND SUPPORT</a>
-<li><a name="TOC21" href="#SEC21">CODE COVERAGE REPORTING</a>
-<li><a name="TOC22" href="#SEC22">SUPPORT FOR FUZZERS</a>
+<li><a name="TOC11" href="#SEC11">LIMITING PCRE2 RESOURCE USAGE</a>
+<li><a name="TOC12" href="#SEC12">CREATING CHARACTER TABLES AT BUILD TIME</a>
+<li><a name="TOC13" href="#SEC13">USING EBCDIC CODE</a>
+<li><a name="TOC14" href="#SEC14">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a>
+<li><a name="TOC15" href="#SEC15">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
+<li><a name="TOC16" href="#SEC16">PCRE2GREP BUFFER SIZE</a>
+<li><a name="TOC17" href="#SEC17">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
+<li><a name="TOC18" href="#SEC18">INCLUDING DEBUGGING CODE</a>
+<li><a name="TOC19" href="#SEC19">DEBUGGING WITH VALGRIND SUPPORT</a>
+<li><a name="TOC20" href="#SEC20">CODE COVERAGE REPORTING</a>
+<li><a name="TOC21" href="#SEC21">SUPPORT FOR FUZZERS</a>
+<li><a name="TOC22" href="#SEC22">OBSOLETE OPTION</a>
 <li><a name="TOC23" href="#SEC23">SEE ALSO</a>
 <li><a name="TOC24" href="#SEC24">AUTHOR</a>
 <li><a name="TOC25" href="#SEC25">REVISION</a>
@@ -78,11 +78,11 @@
 <pre>
   ./configure --help
 </pre>
-The following sections include descriptions of options whose names begin with
---enable or --disable. These settings specify changes to the defaults for the
-<b>configure</b> command. Because of the way that <b>configure</b> works,
---enable and --disable always come in pairs, so the complementary option always
-exists as well, but as it specifies the default, it is not described.
+The following sections include descriptions of "on/off" options whose names
+begin with --enable or --disable. Because of the way that <b>configure</b>
+works, --enable and --disable always come in pairs, so the complementary option
+always exists as well, but as it specifies the default, it is not described.
+Options that specify values have names that start with --with.
 </P>
 <br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
 <P>
@@ -138,10 +138,10 @@
 </P>
 <P>
 UTF support allows the libraries to process character code points up to
-0x10ffff in the strings that they handle. It also provides support for
-accessing the Unicode properties of such characters, using pattern escapes such
-as \P, \p, and \X. Only the general category properties such as <i>Lu</i> and
-<i>Nd</i> are supported. Details are given in the
+0x10ffff in the strings that they handle. Unicode support also gives access to
+the Unicode properties of characters, using pattern escapes such as \P, \p,
+and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
+supported. Details are given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation.
 </P>
@@ -165,7 +165,7 @@
 </P>
 <br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
 <P>
-Just-in-time compiler support is included in the build by specifying
+Just-in-time (JIT) compiler support is included in the build by specifying
 <pre>
   --enable-jit
 </pre>
@@ -227,7 +227,7 @@
 </pre>
 the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
 selected when PCRE2 is built can be overridden by applications that use the
-called.
+library.
 </P>
 <br><a name="SEC10" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
 <P>
@@ -248,36 +248,12 @@
 additional data when handling them. For the 32-bit library the value is always
 4 and cannot be overridden; the value of --with-link-size is ignored.
 </P>
-<br><a name="SEC11" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
+<br><a name="SEC11" href="#TOC1">LIMITING PCRE2 RESOURCE USAGE</a><br>
 <P>
-When matching with the <b>pcre2_match()</b> function, PCRE2 implements
-backtracking by making recursive calls to an internal function called
-<b>match()</b>. In environments where the size of the stack is limited, this can
-severely limit PCRE2's operation. (The Unix environment does not usually suffer
-from this problem, but it may sometimes be necessary to increase the maximum
-stack size. There is a discussion in the
-<a href="pcre2stack.html"><b>pcre2stack</b></a>
-documentation.) An alternative approach to recursion that uses memory from the
-heap to remember data, instead of using recursive function calls, has been
-implemented to work round the problem of limited stack size. If you want to
-build a version of PCRE2 that works this way, add
-<pre>
-  --disable-stack-for-recursion
-</pre>
-to the <b>configure</b> command. By default, the system functions <b>malloc()</b>
-and <b>free()</b> are called to manage the heap memory that is required, but
-custom memory management functions can be called instead. PCRE2 runs noticeably
-more slowly when built in this way. This option affects only the
-<b>pcre2_match()</b> function; it is not relevant for <b>pcre2_dfa_match()</b>.
-</P>
-<br><a name="SEC12" href="#TOC1">LIMITING PCRE2 RESOURCE USAGE</a><br>
-<P>
-Internally, PCRE2 has a function called <b>match()</b>, which it calls
-repeatedly (sometimes recursively) when matching a pattern with the
-<b>pcre2_match()</b> function. By controlling the maximum number of times this
-function may be called during a single matching operation, a limit can be
-placed on the resources used by a single call to <b>pcre2_match()</b>. The limit
-can be changed at run time, as described in the
+The <b>pcre2_match()</b> function increments a counter each time it goes round
+its main loop. Putting a limit on this counter controls the amount of computing
+resource used by a single call to <b>pcre2_match()</b>. The limit can be changed
+at run time, as described in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation. The default is 10 million, but this can be changed by adding a
 setting such as
@@ -285,21 +261,23 @@
   --with-match-limit=500000
 </pre>
 to the <b>configure</b> command. This setting has no effect on the
-<b>pcre2_dfa_match()</b> matching function.
+<b>pcre2_dfa_match()</b> matching function, but it does also limit JIT matching 
+(though the counting is done differently).
 </P>
 <P>
-In some environments it is desirable to limit the depth of recursive calls of
-<b>match()</b> more strictly than the total number of calls, in order to
-restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
-is specified) that is used. A second limit controls this; it defaults to the
-value that is set for --with-match-limit, which imposes no additional
-constraints. However, you can set a lower limit by adding, for example,
+In some environments it is desirable to limit the depth of nested backtracking
+in order to restrict the maximum amount of heap memory that is used. A second
+limit controls this; it defaults to the value that is set for
+--with-match-limit. You can set a lower default limit by adding, for example,
 <pre>
-  --with-match-limit-recursion=10000
+  --with-match-limit_depth=10000
 </pre>
 to the <b>configure</b> command. This value can also be overridden at run time.
+As well as applying to <b>pcre2_match()</b>, this limit also controls the depth 
+of recursive function calls in <b>pcre2_dfa_match()</b>. These are used for 
+lookaround assertions and recursion within patterns.
 </P>
-<br><a name="SEC13" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
+<br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
 <P>
 PCRE2 uses fixed tables for processing characters whose code points are less
 than 256. By default, PCRE2 is built with a set of tables that are distributed
@@ -311,12 +289,12 @@
 to the <b>configure</b> command, the distributed tables are no longer used.
 Instead, a program called <b>dftables</b> is compiled and run. This outputs the
 source for new set of tables, created in the default locale of your C run-time
-system. (This method of replacing the tables does not work if you are cross
+system. This method of replacing the tables does not work if you are cross
 compiling, because <b>dftables</b> is run on the local host. If you need to
 create alternative tables when cross compiling, you will have to do so "by
-hand".)
+hand".
 </P>
-<br><a name="SEC14" href="#TOC1">USING EBCDIC CODE</a><br>
+<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
 <P>
 PCRE2 assumes by default that it will run in an environment where the character
 code is ASCII or Unicode, which is a superset of ASCII. This is the case for
@@ -351,7 +329,7 @@
 and equivalent run-time options, refer to these character values in an EBCDIC
 environment.
 </P>
-<br><a name="SEC15" href="#TOC1">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a><br>
+<br><a name="SEC14" href="#TOC1">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a><br>
 <P>
 By default, on non-Windows systems, <b>pcre2grep</b> supports the use of
 callouts with string arguments within the patterns it is matching, in order to
@@ -360,7 +338,7 @@
 documentation. This support can be disabled by adding
 --disable-pcre2grep-callout to the <b>configure</b> command.
 </P>
-<br><a name="SEC16" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
+<br><a name="SEC15" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
 <P>
 By default, <b>pcre2grep</b> reads all files as plain text. You can build it so
 that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
@@ -373,7 +351,7 @@
 relevant libraries are installed on your system. Configuration will fail if
 they are not.
 </P>
-<br><a name="SEC17" href="#TOC1">PCRE2GREP BUFFER SIZE</a><br>
+<br><a name="SEC16" href="#TOC1">PCRE2GREP BUFFER SIZE</a><br>
 <P>
 <b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
 scanning, in order to be able to output "before" and "after" lines when it
@@ -391,7 +369,7 @@
 to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override
 these values by using --buffer-size and --max-buffer-size on the command line.
 </P>
-<br><a name="SEC18" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
+<br><a name="SEC17" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
 <P>
 If you add one of
 <pre>
@@ -425,7 +403,7 @@
 </pre>
 immediately before the <b>configure</b> command.
 </P>
-<br><a name="SEC19" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
+<br><a name="SEC18" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
 <P>
 If you add
 <pre>
@@ -434,7 +412,7 @@
 to the <b>configure</b> command, additional debugging code is included in the
 build. This feature is intended for use by the PCRE2 maintainers.
 </P>
-<br><a name="SEC20" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
+<br><a name="SEC19" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
 <P>
 If you add
 <pre>
@@ -444,7 +422,7 @@
 certain memory regions as unaddressable. This allows it to detect invalid
 memory accesses, and is mostly useful for debugging PCRE2 itself.
 </P>
-<br><a name="SEC21" href="#TOC1">CODE COVERAGE REPORTING</a><br>
+<br><a name="SEC20" href="#TOC1">CODE COVERAGE REPORTING</a><br>
 <P>
 If your C compiler is gcc, you can build a version of PCRE2 that can generate a
 code coverage report for its test suite. To enable this, you must install
@@ -501,7 +479,7 @@
 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
 documentation.
 </P>
-<br><a name="SEC22" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
+<br><a name="SEC21" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
 <P>
 There is a special option for use by people who want to run fuzzing tests on
 PCRE2:
@@ -514,14 +492,29 @@
 a pointer to a string and the length of the string. When called, this function
 tries to compile the string as a pattern, and if that succeeds, to match it.
 This is done both with no options and with some random options bits that are
-generated from the string. Setting --enable-fuzz-support also causes a binary
-called <b>pcre2fuzzcheck</b> to be created. This is normally run under valgrind
-or used when PCRE2 is compiled with address sanitizing enabled. It calls the
-fuzzing function and outputs information about it is doing. The input strings
-are specified by arguments: if an argument starts with "=" the rest of it is a
-literal input string. Otherwise, it is assumed to be a file name, and the
-contents of the file are the test string.
+generated from the string. 
 </P>
+<P>
+Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
+to be created. This is normally run under valgrind or used when PCRE2 is
+compiled with address sanitizing enabled. It calls the fuzzing function and
+outputs information about it is doing. The input strings are specified by
+arguments: if an argument starts with "=" the rest of it is a literal input
+string. Otherwise, it is assumed to be a file name, and the contents of the
+file are the test string.
+</P>
+<br><a name="SEC22" href="#TOC1">OBSOLETE OPTION</a><br>
+<P>
+In versions of PCRE2 prior to 10.30, there were two ways of handling 
+backtracking in the <b>pcre2_match()</b> function. The default was to use the 
+system stack, but if
+<pre>
+  --disable-stack-for-recursion
+</pre>
+was set, memory on the heap was used. From release 10.30 onwards this has 
+changed (the stack is no lonter used) and this option now does nothing except
+give a warning.
+</P>
 <br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2api</b>(3), <b>pcre2-config</b>(3).
@@ -537,9 +530,9 @@
 </P>
 <br><a name="SEC25" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 November 2016
+Last updated: 29 March 2017
 <br>
-Copyright &copy; 1997-2016 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2callout.html
===================================================================
--- code/trunk/doc/html/pcre2callout.html    2017-03-29 08:12:32 UTC (rev 715)
+++ code/trunk/doc/html/pcre2callout.html    2017-03-29 17:18:08 UTC (rev 716)
@@ -57,8 +57,8 @@
 </pre>
 If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
 automatically inserts callouts, all with number 255, before each item in the
-pattern except for immediately before or after a callout item in the pattern.
-For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+pattern except for immediately before or after an explicit callout. For
+example, if PCRE2_AUTO_CALLOUT is used with the pattern
 <pre>
   A(?C3)B
 </pre>
@@ -71,11 +71,9 @@
   A(\d{2}|--)
 </pre>
 With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
-<br>
-<br>
-(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
-<br>
-<br>
+<pre>
+  (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
+</pre>
 Notice that there is a callout before and after each parenthesis and
 alternation bar. If the pattern contains a conditional group whose condition is
 an assertion, an automatic callout is inserted immediately before the
@@ -140,12 +138,16 @@
 a pattern. If PCRE2_DOTALL is set, so that the dot can match any character, the
 pattern is automatically anchored. If PCRE2_DOTALL is not set, a match can
 start only after an internal newline or at the beginning of the subject, and
-<b>pcre2_compile()</b> remembers this. This optimization is disabled, however,
-if .* is in an atomic group or if there is a back reference to the capturing
-group in which it appears. It is also disabled if the pattern contains (*PRUNE)
-or (*SKIP). However, the presence of callouts does not affect it.
+<b>pcre2_compile()</b> remembers this. If a pattern has more than one top-level
+branch, automatic anchoring occurs if all branches are anchorable.
 </P>
 <P>
+This optimization is disabled, however, if .* is in an atomic group or if there
+is a back reference to the capturing group in which it appears. It is also
+disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
+callouts does not affect it.
+</P>
+<P>
 For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT and
 applied to the string "aa", the <b>pcre2test</b> output is:
 <pre>
@@ -175,10 +177,6 @@
 Another optimization, described in the next section, means that there is no
 subsequent attempt to match with an empty subject.
 </P>
-<P>
-If a pattern has more than one top-level branch, automatic anchoring occurs if
-all branches are anchorable.
-</P>
 <br><b>
 Other optimizations
 </b><br>
@@ -194,9 +192,10 @@
 result is still no match, the callout is obeyed.
 </P>
 <P>
-PCRE2 also knows the minimum length of a matching string, and will immediately
-give a "no match" return without actually running a match if the subject is not
-long enough, or, for unanchored patterns, if it has been scanned far enough.
+For most patterns PCRE2 also knows the minimum length of a matching string, and
+will immediately give a "no match" return without actually running a match if
+the subject is not long enough, or, for unanchored patterns, if it has been
+scanned far enough.
 </P>
 <P>
 You can disable these optimizations by passing the PCRE2_NO_START_OPTIMIZE
@@ -276,14 +275,43 @@
 callout.
 </P>
 <P>
-The <i>offset_vector</i> field is a pointer to the vector of capturing offsets
-(the "ovector") that was passed to the matching function in the match data
-block. When <b>pcre2_match()</b> is used, the contents can be inspected in
+The <i>offset_vector</i> field is a pointer to a vector of capturing offsets
+(the "ovector"). You may read certain elements in this vector, but you must not
+change any of them.
+</P>
+<P>
+For calls to <b>pcre2_match()</b>, the <i>offset_vector</i> field is not (since
+release 10.30) a pointer to the actual ovector that was passed to the matching
+function in the match data block. Instead it points to an internal ovector of a
+size large enough to hold all possible captured substrings in the pattern. Note
+that whenever a recursion or subroutine call within a pattern completes, the
+capturing state is reset to what it was before.
+</P>
+<P>
+The <i>capture_last</i> field contains the number of the most recently captured
+substring, and the <i>capture_top</i> field contains one more than the number of
+the highest numbered captured substring so far. If no substrings have yet been
+captured, the value of <i>capture_last</i> is 0 and the value of
+<i>capture_top</i> is 1. The values of these fields do not always differ by one;
+for example, when the callout in the pattern ((a)(b))(?C2) is taken,
+<i>capture_last</i> is 1 but <i>capture_top</i> is 4.
+</P>
+<P>
+The contents of ovector[2] to ovector[&#60;capture_top&#62;*2-1] can be inspected in
 order to extract substrings that have been matched so far, in the same way as
-for extracting substrings after a match has completed. For the DFA matching
-function, this field is not useful.
+extracting substrings after a match has completed. The values in ovector[0] and 
+ovector[1] are undefined and should not be used in any way. Substrings that 
+have not been captured (but whose numbers are less than <i>capture_top</i>) have 
+both of their ovector slots set to PCRE2_UNSET.
 </P>
 <P>
+For DFA matching, the <i>offset_vector</i> field points to the ovector that was
+passed to the matching function in the match data block, but it holds no useful
+information at callout time because <b>pcre2_dfa_match()</b> does not support
+substring capturing. The value of <i>capture_top</i> is always 1 and the value
+of <i>capture_last</i> is always 0 for DFA matching.
+</P>
+<P>
 The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
 that were passed to the matching function.
 </P>
@@ -300,20 +328,6 @@
 current match pointer.
 </P>
 <P>
-When the <b>pcre2_match()</b> is used, the <i>capture_top</i> field contains one
-more than the number of the highest numbered captured substring so far. If no
-substrings have been captured, the value of <i>capture_top</i> is one. This is
-always the case when the DFA functions are used, because they do not support
-captured substrings.
-</P>
-<P>
-The <i>capture_last</i> field contains the number of the most recently captured
-substring. However, when a recursion exits, the value reverts to what it was
-outside the recursion, as do the values of all captured substrings. If no
-substrings have been captured, the value of <i>capture_last</i> is 0. This is
-always the case for the DFA matching functions.
-</P>
-<P>
 The <i>pattern_position</i> field contains the offset in the pattern string to
 the next item to be matched.
 </P>
@@ -413,9 +427,9 @@
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 29 September 2016
+Last updated: 29 March 2017
 <br>
-Copyright &copy; 1997-2016 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2017-03-29 08:12:32 UTC (rev 715)
+++ code/trunk/doc/pcre2.txt    2017-03-29 17:18:08 UTC (rev 716)
@@ -3221,12 +3221,12 @@


          ./configure --help


-       The  following  sections  include  descriptions  of options whose names
-       begin with --enable or --disable. These settings specify changes to the
-       defaults  for  the configure command. Because of the way that configure
-       works, --enable and --disable always come in pairs, so  the  complemen-
-       tary  option always exists as well, but as it specifies the default, it
-       is not described.
+       The  following  sections include descriptions of "on/off" options whose
+       names begin with --enable or --disable. Because of the way that config-
+       ure  works, --enable and --disable always come in pairs, so the comple-
+       mentary option always exists as well, but as it specifies the  default,
+       it is not described.  Options that specify values have names that start
+       with --with.



 BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
@@ -3283,11 +3283,11 @@
        application has locked this out by setting PCRE2_NEVER_UTF.


        UTF support allows the libraries to process character code points up to
-       0x10ffff in the strings that they handle. It also provides support  for
-       accessing  the  Unicode  properties  of  such characters, using pattern
-       escapes such as \P, \p, and \X. Only the  general  category  properties
-       such  as Lu and Nd are supported. Details are given in the pcre2pattern
-       documentation.
+       0x10ffff in the strings that they handle. Unicode  support  also  gives
+       access  to  the Unicode properties of characters, using pattern escapes
+       such as \P, \p, and \X. Only the general category properties such as Lu
+       and  Nd are supported. Details are given in the pcre2pattern documenta-
+       tion.


        Pattern escapes such as \d and \w do not by default make use of Unicode
        properties.  The  application  can  request that they do by setting the
@@ -3310,14 +3310,15 @@


JUST-IN-TIME COMPILER SUPPORT

-       Just-in-time compiler support is included in the build by specifying
+       Just-in-time  (JIT) compiler support is included in the build by speci-
+       fying


          --enable-jit


-       This  support  is available only for certain hardware architectures. If
-       this option is set for an unsupported architecture,  a  building  error
-       occurs.   See the pcre2jit documentation for a discussion of JIT usage.
-       When JIT support is enabled, pcre2grep automatically makes use  of  it,
+       This support is available only for certain hardware  architectures.  If
+       this  option  is  set for an unsupported architecture, a building error
+       occurs.  See the pcre2jit documentation for a discussion of JIT  usage.
+       When  JIT  support is enabled, pcre2grep automatically makes use of it,
        unless you add


          --disable-pcre2grep-jit
@@ -3327,14 +3328,14 @@


NEWLINE RECOGNITION

-       By  default, PCRE2 interprets the linefeed (LF) character as indicating
-       the end of a line. This is the normal newline  character  on  Unix-like
-       systems.  You can compile PCRE2 to use carriage return (CR) instead, by
+       By default, PCRE2 interprets the linefeed (LF) character as  indicating
+       the  end  of  a line. This is the normal newline character on Unix-like
+       systems. You can compile PCRE2 to use carriage return (CR) instead,  by
        adding


          --enable-newline-is-cr


-       to the configure  command.  There  is  also  an  --enable-newline-is-lf
+       to  the  configure  command.  There  is  also an --enable-newline-is-lf
        option, which explicitly specifies linefeed as the newline character.


        Alternatively, you can specify that line endings are to be indicated by
@@ -3347,108 +3348,84 @@


          --enable-newline-is-anycrlf


-       which  causes  PCRE2 to recognize any of the three sequences CR, LF, or
+       which causes PCRE2 to recognize any of the three sequences CR,  LF,  or
        CRLF as indicating a line ending. Finally, a fifth option, specified by


          --enable-newline-is-any


-       causes PCRE2 to recognize any Unicode  newline  sequence.  The  Unicode
+       causes  PCRE2  to  recognize  any Unicode newline sequence. The Unicode
        newline sequences are the three just mentioned, plus the single charac-
        ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
-       U+0085),  LS  (line  separator,  U+2028),  and PS (paragraph separator,
+       U+0085), LS (line separator,  U+2028),  and  PS  (paragraph  separator,
        U+2029).


        Whatever default line ending convention is selected when PCRE2 is built
-       can  be  overridden by applications that use the library. At build time
+       can be overridden by applications that use the library. At  build  time
        it is conventional to use the standard for your operating system.



WHAT \R MATCHES

-       By default, the sequence \R in a pattern matches  any  Unicode  newline
-       sequence,  independently  of  what has been selected as the line ending
+       By  default,  the  sequence \R in a pattern matches any Unicode newline
+       sequence, independently of what has been selected as  the  line  ending
        sequence. If you specify


          --enable-bsr-anycrlf


-       the default is changed so that \R matches only CR, LF, or  CRLF.  What-
-       ever  is selected when PCRE2 is built can be overridden by applications
-       that use the called.
+       the  default  is changed so that \R matches only CR, LF, or CRLF. What-
+       ever is selected when PCRE2 is built can be overridden by  applications
+       that use the library.



HANDLING VERY LARGE PATTERNS

-       Within a compiled pattern, offset values are used  to  point  from  one
-       part  to another (for example, from an opening parenthesis to an alter-
-       nation metacharacter). By default, in the 8-bit and  16-bit  libraries,
-       two-byte  values  are used for these offsets, leading to a maximum size
-       for a compiled pattern of around 64K code units. This is sufficient  to
+       Within  a  compiled  pattern,  offset values are used to point from one
+       part to another (for example, from an opening parenthesis to an  alter-
+       nation  metacharacter).  By default, in the 8-bit and 16-bit libraries,
+       two-byte values are used for these offsets, leading to a  maximum  size
+       for  a compiled pattern of around 64K code units. This is sufficient to
        handle all but the most gigantic patterns. Nevertheless, some people do
-       want to process truly enormous patterns, so it is possible  to  compile
-       PCRE2  to  use three-byte or four-byte offsets by adding a setting such
+       want  to  process truly enormous patterns, so it is possible to compile
+       PCRE2 to use three-byte or four-byte offsets by adding a  setting  such
        as


          --with-link-size=3


-       to the configure command. The value given must be 2, 3, or 4.  For  the
-       16-bit  library,  a  value of 3 is rounded up to 4. In these libraries,
-       using longer offsets slows down the operation of PCRE2 because  it  has
-       to  load additional data when handling them. For the 32-bit library the
-       value is always 4 and cannot be overridden; the value  of  --with-link-
+       to  the  configure command. The value given must be 2, 3, or 4. For the
+       16-bit library, a value of 3 is rounded up to 4.  In  these  libraries,
+       using  longer  offsets slows down the operation of PCRE2 because it has
+       to load additional data when handling them. For the 32-bit library  the
+       value  is  always 4 and cannot be overridden; the value of --with-link-
        size is ignored.



-AVOIDING EXCESSIVE STACK USAGE
-
-       When  matching  with the pcre2_match() function, PCRE2 implements back-
-       tracking by making recursive  calls  to  an  internal  function  called
-       match().  In  environments where the size of the stack is limited, this
-       can severely limit PCRE2's operation. (The Unix  environment  does  not
-       usually  suffer from this problem, but it may sometimes be necessary to
-       increase  the  maximum  stack  size.  There  is  a  discussion  in  the
-       pcre2stack  documentation.)  An  alternative approach to recursion that
-       uses memory from the heap to remember data, instead of using  recursive
-       function  calls, has been implemented to work round the problem of lim-
-       ited stack size. If you want to build a version  of  PCRE2  that  works
-       this way, add
-
-         --disable-stack-for-recursion
-
-       to the configure command. By default, the system functions malloc() and
-       free() are called to manage the heap memory that is required, but  cus-
-       tom  memory  management  functions  can  be  called instead. PCRE2 runs
-       noticeably more slowly when built in this way. This option affects only
-       the pcre2_match() function; it is not relevant for pcre2_dfa_match().
-
-
 LIMITING PCRE2 RESOURCE USAGE


-       Internally, PCRE2 has a function called match(), which it calls repeat-
-       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
-       pcre2_match() function. By controlling the maximum number of times this
-       function may be called during a single matching operation, a limit  can
-       be  placed on the resources used by a single call to pcre2_match(). The
-       limit can be changed at run time, as described in the pcre2api documen-
-       tation.  The default is 10 million, but this can be changed by adding a
-       setting such as
+       The pcre2_match() function increments a counter each time it goes round
+       its  main  loop. Putting a limit on this counter controls the amount of
+       computing resource used by a single call to  pcre2_match().  The  limit
+       can be changed at run time, as described in the pcre2api documentation.
+       The default is 10 million, but this can be changed by adding a  setting
+       such as


          --with-match-limit=500000


-       to  the  configure  command.  This  setting  has  no  effect   on   the
-       pcre2_dfa_match() matching function.
+       to   the   configure  command.  This  setting  has  no  effect  on  the
+       pcre2_dfa_match() matching function, but it does also limit JIT  match-
+       ing (though the counting is done differently).


-       In  some  environments  it is desirable to limit the depth of recursive
-       calls of match() more strictly than the total number of calls, in order
-       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
-       for-recursion is specified) that is used. A second limit controls this;
-       it  defaults  to  the  value  that is set for --with-match-limit, which
-       imposes no additional constraints. However, you can set a  lower  limit
-       by adding, for example,
+       In some environments it is desirable to limit the depth of nested back-
+       tracking in order to restrict the maximum amount of heap memory that is
+       used.  A  second  limit controls this; it defaults to the value that is
+       set for --with-match-limit. You  can  set  a  lower  default  limit  by
+       adding, for example,


-         --with-match-limit-recursion=10000
+         --with-match-limit_depth=10000


        to  the  configure  command.  This  value can also be overridden at run
-       time.
+       time.  As well as applying to pcre2_match(), this limit  also  controls
+       the  depth  of recursive function calls in pcre2_dfa_match(). These are
+       used for lookaround assertions and recursion within patterns.



 CREATING CHARACTER TABLES AT BUILD TIME
@@ -3463,10 +3440,10 @@
        to  the  configure  command, the distributed tables are no longer used.
        Instead, a program called dftables is compiled and  run.  This  outputs
        the source for new set of tables, created in the default locale of your
-       C run-time system. (This method of replacing the tables does  not  work
-       if  you are cross compiling, because dftables is run on the local host.
-       If you need to create alternative tables when cross compiling, you will
-       have to do so "by hand".)
+       C run-time system. This method of replacing the tables does not work if
+       you  are cross compiling, because dftables is run on the local host. If
+       you need to create alternative tables when cross  compiling,  you  will
+       have to do so "by hand".



 USING EBCDIC CODE
@@ -3672,15 +3649,30 @@
        string. When called, this function tries to compile  the  string  as  a
        pattern,  and if that succeeds, to match it.  This is done both with no
        options and with some random options bits that are generated  from  the
-       string.  Setting  --enable-fuzz-support  also  causes  a  binary called
-       pcre2fuzzcheck to be created. This is normally run  under  valgrind  or
-       used  when  PCRE2 is compiled with address sanitizing enabled. It calls
-       the fuzzing function and outputs information about  it  is  doing.  The
-       input  strings  are  specified by arguments: if an argument starts with
-       "=" the rest of it is a literal input string. Otherwise, it is  assumed
-       to be a file name, and the contents of the file are the test string.
+       string.


+       Setting  --enable-fuzz-support  also  causes  a binary called pcre2fuz-
+       zcheck to be created. This is normally run under valgrind or used  when
+       PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
+       function and outputs information about it is doing. The  input  strings
+       are  specified by arguments: if an argument starts with "=" the rest of
+       it is a literal input string. Otherwise, it is assumed  to  be  a  file
+       name, and the contents of the file are the test string.


+
+OBSOLETE OPTION
+
+       In  versions  of  PCRE2 prior to 10.30, there were two ways of handling
+       backtracking in the pcre2_match() function. The default was to use  the
+       system stack, but if
+
+         --disable-stack-for-recursion
+
+       was  set,  memory on the heap was used. From release 10.30 onwards this
+       has changed (the stack is no lonter used)  and  this  option  now  does
+       nothing except give a warning.
+
+
 SEE ALSO


        pcre2api(3), pcre2-config(3).
@@ -3695,8 +3687,8 @@


REVISION

-       Last updated: 01 November 2016
-       Copyright (c) 1997-2016 University of Cambridge.
+       Last updated: 29 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------



@@ -3740,9 +3732,8 @@

        If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled,
        PCRE2 automatically inserts callouts, all with number 255, before  each
-       item  in  the  pattern except for immediately before or after a callout
-       item in the pattern.  For example, if PCRE2_AUTO_CALLOUT is  used  with
-       the pattern
+       item  in the pattern except for immediately before or after an explicit
+       callout. For example, if PCRE2_AUTO_CALLOUT is used with the pattern


          A(?C3)B


@@ -3756,30 +3747,30 @@

        With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were


-       (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
+         (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)


-       Notice  that  there  is a callout before and after each parenthesis and
+       Notice that there is a callout before and after  each  parenthesis  and
        alternation bar. If the pattern contains a conditional group whose con-
-       dition  is  an  assertion, an automatic callout is inserted immediately
-       before the condition. Such a callout may also be  inserted  explicitly,
+       dition is an assertion, an automatic callout  is  inserted  immediately
+       before  the  condition. Such a callout may also be inserted explicitly,
        for example:


          (?(?C9)(?=a)ab|de)  (?(?C%text%)(?!=d)ab|de)


-       This  applies only to assertion conditions (because they are themselves
+       This applies only to assertion conditions (because they are  themselves
        independent groups).


-       Callouts can be useful for tracking the progress of  pattern  matching.
+       Callouts  can  be useful for tracking the progress of pattern matching.
        The pcre2test program has a pattern qualifier (/auto_callout) that sets
-       automatic callouts.  When any callouts are  present,  the  output  from
-       pcre2test  indicates  how  the pattern is being matched. This is useful
-       information when you are trying to optimize the performance of  a  par-
+       automatic  callouts.   When  any  callouts are present, the output from
+       pcre2test indicates how the pattern is being matched.  This  is  useful
+       information  when  you are trying to optimize the performance of a par-
        ticular pattern.



MISSING CALLOUTS

-       You  should  be  aware  that, because of optimizations in the way PCRE2
+       You should be aware that, because of optimizations  in  the  way  PCRE2
        compiles and matches patterns, callouts sometimes do not happen exactly
        as you might expect.


@@ -3786,8 +3777,8 @@
    Auto-possessification


        At compile time, PCRE2 "auto-possessifies" repeated items when it knows
-       that what follows cannot be part of the repeat. For example, a+[bc]  is
-       compiled  as if it were a++[bc]. The pcre2test output when this pattern
+       that  what follows cannot be part of the repeat. For example, a+[bc] is
+       compiled as if it were a++[bc]. The pcre2test output when this  pattern
        is compiled with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied
        to the string "aaaa" is:


@@ -3796,11 +3787,11 @@
           +2 ^   ^    [bc]
          No match


-       This  indicates that when matching [bc] fails, there is no backtracking
+       This indicates that when matching [bc] fails, there is no  backtracking
        into a+ (because it is being treated as a++) and therefore the callouts
-       that  would  be  taken for the backtracks do not occur. You can disable
-       the  auto-possessify  feature  by  passing   PCRE2_NO_AUTO_POSSESS   to
-       pcre2_compile(),  or  starting  the pattern with (*NO_AUTO_POSSESS). In
+       that would be taken for the backtracks do not occur.  You  can  disable
+       the   auto-possessify   feature  by  passing  PCRE2_NO_AUTO_POSSESS  to
+       pcre2_compile(), or starting the pattern  with  (*NO_AUTO_POSSESS).  In
        this case, the output changes to this:


          --->aaaa
@@ -3817,15 +3808,18 @@
    Automatic .* anchoring


        By default, an optimization is applied when .* is the first significant
-       item in a pattern. If PCRE2_DOTALL is set, so that the  dot  can  match
-       any  character,  the pattern is automatically anchored. If PCRE2_DOTALL
-       is not set, a match can start only after an internal newline or at  the
-       beginning  of  the  subject,  and  pcre2_compile() remembers this. This
-       optimization is disabled, however, if .* is in an atomic  group  or  if
-       there  is  a back reference to the capturing group in which it appears.
-       It is also disabled if the pattern contains (*PRUNE) or  (*SKIP).  How-
-       ever, the presence of callouts does not affect it.
+       item  in  a  pattern. If PCRE2_DOTALL is set, so that the dot can match
+       any character, the pattern is automatically anchored.  If  PCRE2_DOTALL
+       is  not set, a match can start only after an internal newline or at the
+       beginning of the subject, and pcre2_compile() remembers this. If a pat-
+       tern  has more than one top-level branch, automatic anchoring occurs if
+       all branches are anchorable.


+       This optimization is disabled, however, if .* is in an atomic group  or
+       if  there  is  a  back  reference  to  the  capturing group in which it
+       appears. It is also  disabled  if  the  pattern  contains  (*PRUNE)  or
+       (*SKIP). However, the presence of callouts does not affect it.
+
        For  example,  if  the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT
        and applied to the string "aa", the pcre2test output is:


@@ -3856,39 +3850,36 @@
        ter.   Another  optimization, described in the next section, means that
        there is no subsequent attempt to match with an empty subject.


-       If a pattern has more than one top-level  branch,  automatic  anchoring
-       occurs if all branches are anchorable.
-
    Other optimizations


-       Other  optimizations  that  provide fast "no match" results also affect
+       Other optimizations that provide fast "no match"  results  also  affect
        callouts.  For example, if the pattern is


          ab(?C4)cd


-       PCRE2 knows that any matching string must contain the  letter  "d".  If
-       the  subject  string  is  "abyz",  the  lack of "d" means that matching
-       doesn't ever start, and the callout is  never  reached.  However,  with
+       PCRE2  knows  that  any matching string must contain the letter "d". If
+       the subject string is "abyz", the  lack  of  "d"  means  that  matching
+       doesn't  ever  start,  and  the callout is never reached. However, with
        "abyd", though the result is still no match, the callout is obeyed.


-       PCRE2  also  knows  the  minimum  length of a matching string, and will
-       immediately give a "no match" return without actually running  a  match
-       if  the  subject is not long enough, or, for unanchored patterns, if it
-       has been scanned far enough.
+       For most patterns PCRE2 also knows the minimum  length  of  a  matching
+       string,  and will immediately give a "no match" return without actually
+       running a match if the subject is not long enough, or,  for  unanchored
+       patterns, if it has been scanned far enough.


        You can disable these optimizations by passing the PCRE2_NO_START_OPTI-
-       MIZE  option  to  pcre2_compile(),  or  by  starting  the  pattern with
-       (*NO_START_OPT). This slows down the matching process, but does  ensure
+       MIZE option  to  pcre2_compile(),  or  by  starting  the  pattern  with
+       (*NO_START_OPT).  This slows down the matching process, but does ensure
        that callouts such as the example above are obeyed.



THE CALLOUT INTERFACE

-       During  matching,  when  PCRE2  reaches a callout point, if an external
-       function is set in the match context, it is  called.  This  applies  to
-       both  normal  and DFA matching. The first argument to the callout func-
-       tion is a pointer to a pcre2_callout block. The second argument is  the
-       void  *  callout  data that was supplied when the callout was set up by
+       During matching, when PCRE2 reaches a callout  point,  if  an  external
+       function  is  set  in  the match context, it is called. This applies to
+       both normal and DFA matching. The first argument to the  callout  func-
+       tion  is a pointer to a pcre2_callout block. The second argument is the
+       void * callout data that was supplied when the callout was  set  up  by
        calling pcre2_set_callout() (see the pcre2api documentation). The call-
        out block structure contains the following fields:


@@ -3908,51 +3899,78 @@
          PCRE2_SIZE    callout_string_length;
          PCRE2_SPTR    callout_string;


-       The  version field contains the version number of the block format. The
-       current version is 1; the three callout string fields  were  added  for
-       this  version. If you are writing an application that might use an ear-
-       lier release of PCRE2, you  should  check  the  version  number  before
-       accessing  any  of  these  fields.  The version number will increase in
-       future if more fields are added, but the intention is never  to  remove
+       The version field contains the version number of the block format.  The
+       current  version  is  1; the three callout string fields were added for
+       this version. If you are writing an application that might use an  ear-
+       lier  release  of  PCRE2,  you  should  check the version number before
+       accessing any of these fields. The  version  number  will  increase  in
+       future  if  more fields are added, but the intention is never to remove
        any of the existing fields.


    Fields for numerical callouts


-       For  a  numerical  callout,  callout_string is NULL, and callout_number
-       contains the number of the callout, in the range  0-255.  This  is  the
-       number  that  follows  (?C for callouts that part of the pattern; it is
+       For a numerical callout, callout_string  is  NULL,  and  callout_number
+       contains  the  number  of  the callout, in the range 0-255. This is the
+       number that follows (?C for callouts that part of the  pattern;  it  is
        255 for automatically generated callouts.


    Fields for string callouts


-       For callouts with string arguments, callout_number is always zero,  and
-       callout_string  points  to the string that is contained within the com-
+       For  callouts with string arguments, callout_number is always zero, and
+       callout_string points to the string that is contained within  the  com-
        piled pattern. Its length is given by callout_string_length. Duplicated
        ending delimiters that were present in the original pattern string have
        been turned into single characters, but there is no other processing of
-       the  callout string argument. An additional code unit containing binary
-       zero is present after the string, but is not included  in  the  length.
-       The  delimiter  that was used to start the string is also stored within
-       the pattern, immediately before the string itself. You can access  this
+       the callout string argument. An additional code unit containing  binary
+       zero  is  present  after the string, but is not included in the length.
+       The delimiter that was used to start the string is also  stored  within
+       the  pattern, immediately before the string itself. You can access this
        delimiter as callout_string[-1] if you need it.


        The callout_string_offset field is the code unit offset to the start of
        the callout argument string within the original pattern string. This is
-       provided  for the benefit of applications such as script languages that
+       provided for the benefit of applications such as script languages  that
        might need to report errors in the callout string within the pattern.


    Fields for all callouts


-       The remaining fields in the callout block are the same for  both  kinds
+       The  remaining  fields in the callout block are the same for both kinds
        of callout.


-       The offset_vector field is a pointer to the vector of capturing offsets
-       (the "ovector") that was passed to the matching function in  the  match
-       data  block.  When pcre2_match() is used, the contents can be inspected
-       in order to extract substrings that have been matched so  far,  in  the
-       same  way as for extracting substrings after a match has completed. For
-       the DFA matching function, this field is not useful.
+       The offset_vector field is a pointer to a vector of  capturing  offsets
+       (the  "ovector"). You may read certain elements in this vector, but you
+       must not change any of them.


+       For calls to pcre2_match(),  the  offset_vector  field  is  not  (since
+       release  10.30)  a pointer to the actual ovector that was passed to the
+       matching function in the match data block.  Instead  it  points  to  an
+       internal  ovector  of a size large enough to hold all possible captured
+       substrings in the pattern. Note that whenever a recursion or subroutine
+       call  within  a pattern completes, the capturing state is reset to what
+       it was before.
+
+       The capture_last field contains the number of the  most  recently  cap-
+       tured  substring,  and the capture_top field contains one more than the
+       number of the highest numbered captured substring so far.  If  no  sub-
+       strings  have yet been captured, the value of capture_last is 0 and the
+       value of capture_top is 1. The values of these  fields  do  not  always
+       differ   by   one;  for  example,  when  the  callout  in  the  pattern
+       ((a)(b))(?C2) is taken, capture_last is 1 but capture_top is 4.
+
+       The  contents  of  ovector[2]  to  ovector[<capture_top>*2-1]  can   be
+       inspected in order to extract substrings that have been matched so far,
+       in the same way as extracting substrings after a match  has  completed.
+       The values in ovector[0] and ovector[1] are undefined and should not be
+       used in any way. Substrings that have not been captured (but whose num-
+       bers are less than capture_top) have both of their ovector slots set to
+       PCRE2_UNSET.
+
+       For DFA matching, the offset_vector field points to  the  ovector  that
+       was  passed  to  the  matching function in the match data block, but it
+       holds no useful information at callout time  because  pcre2_dfa_match()
+       does  not  support  substring  capturing.  The  value of capture_top is
+       always 1 and the value of capture_last is always 0 for DFA matching.
+
        The subject and subject_length fields contain copies of the values that
        were passed to the matching function.


@@ -3966,18 +3984,6 @@
        The current_position field contains the offset within  the  subject  of
        the current match pointer.


-       When the pcre2_match() is used, the capture_top field contains one more
-       than the number of the highest numbered captured substring so  far.  If
-       no substrings have been captured, the value of capture_top is one. This
-       is always the case when the DFA functions are used, because they do not
-       support captured substrings.
-
-       The  capture_last  field  contains the number of the most recently cap-
-       tured substring. However, when a recursion exits, the value reverts  to
-       what  it  was  outside  the recursion, as do the values of all captured
-       substrings. If no substrings have been  captured,  the  value  of  cap-
-       ture_last is 0. This is always the case for the DFA matching functions.
-
        The pattern_position field contains the offset in the pattern string to
        the next item to be matched.


@@ -4075,8 +4081,8 @@

REVISION

-       Last updated: 29 September 2016
-       Copyright (c) 1997-2016 University of Cambridge.
+       Last updated: 29 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------




Modified: code/trunk/doc/pcre2build.3
===================================================================
--- code/trunk/doc/pcre2build.3    2017-03-29 08:12:32 UTC (rev 715)
+++ code/trunk/doc/pcre2build.3    2017-03-29 17:18:08 UTC (rev 716)
@@ -1,4 +1,4 @@
-.TH PCRE2BUILD 3 "01 November 2016" "PCRE2 10.23"
+.TH PCRE2BUILD 3 "29 March 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .
@@ -55,11 +55,11 @@
 .sp
   ./configure --help
 .sp
-The following sections include descriptions of options whose names begin with
---enable or --disable. These settings specify changes to the defaults for the
-\fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
---enable and --disable always come in pairs, so the complementary option always
-exists as well, but as it specifies the default, it is not described.
+The following sections include descriptions of "on/off" options whose names
+begin with --enable or --disable. Because of the way that \fBconfigure\fP
+works, --enable and --disable always come in pairs, so the complementary option
+always exists as well, but as it specifies the default, it is not described.
+Options that specify values have names that start with --with.
 .
 .
 .SH "BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES"
@@ -119,10 +119,10 @@
 locked this out by setting PCRE2_NEVER_UTF.
 .P
 UTF support allows the libraries to process character code points up to
-0x10ffff in the strings that they handle. It also provides support for
-accessing the Unicode properties of such characters, using pattern escapes such
-as \eP, \ep, and \eX. Only the general category properties such as \fILu\fP and
-\fINd\fP are supported. Details are given in the
+0x10ffff in the strings that they handle. Unicode support also gives access to
+the Unicode properties of characters, using pattern escapes such as \eP, \ep,
+and \eX. Only the general category properties such as \fILu\fP and \fINd\fP are
+supported. Details are given in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@@ -151,7 +151,7 @@
 .SH "JUST-IN-TIME COMPILER SUPPORT"
 .rs
 .sp
-Just-in-time compiler support is included in the build by specifying
+Just-in-time (JIT) compiler support is included in the build by specifying
 .sp
   --enable-jit
 .sp
@@ -217,7 +217,7 @@
 .sp
 the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is
 selected when PCRE2 is built can be overridden by applications that use the
-called.
+library.
 .
 .
 .SH "HANDLING VERY LARGE PATTERNS"
@@ -241,41 +241,13 @@
 4 and cannot be overridden; the value of --with-link-size is ignored.
 .
 .
-.SH "AVOIDING EXCESSIVE STACK USAGE"
-.rs
-.sp
-When matching with the \fBpcre2_match()\fP function, PCRE2 implements
-backtracking by making recursive calls to an internal function called
-\fBmatch()\fP. In environments where the size of the stack is limited, this can
-severely limit PCRE2's operation. (The Unix environment does not usually suffer
-from this problem, but it may sometimes be necessary to increase the maximum
-stack size. There is a discussion in the
-.\" HREF
-\fBpcre2stack\fP
-.\"
-documentation.) An alternative approach to recursion that uses memory from the
-heap to remember data, instead of using recursive function calls, has been
-implemented to work round the problem of limited stack size. If you want to
-build a version of PCRE2 that works this way, add
-.sp
-  --disable-stack-for-recursion
-.sp
-to the \fBconfigure\fP command. By default, the system functions \fBmalloc()\fP
-and \fBfree()\fP are called to manage the heap memory that is required, but
-custom memory management functions can be called instead. PCRE2 runs noticeably
-more slowly when built in this way. This option affects only the
-\fBpcre2_match()\fP function; it is not relevant for \fBpcre2_dfa_match()\fP.
-.
-.
 .SH "LIMITING PCRE2 RESOURCE USAGE"
 .rs
 .sp
-Internally, PCRE2 has a function called \fBmatch()\fP, which it calls
-repeatedly (sometimes recursively) when matching a pattern with the
-\fBpcre2_match()\fP function. By controlling the maximum number of times this
-function may be called during a single matching operation, a limit can be
-placed on the resources used by a single call to \fBpcre2_match()\fP. The limit
-can be changed at run time, as described in the
+The \fBpcre2_match()\fP function increments a counter each time it goes round
+its main loop. Putting a limit on this counter controls the amount of computing
+resource used by a single call to \fBpcre2_match()\fP. The limit can be changed
+at run time, as described in the
 .\" HREF
 \fBpcre2api\fP
 .\"
@@ -285,18 +257,20 @@
   --with-match-limit=500000
 .sp
 to the \fBconfigure\fP command. This setting has no effect on the
-\fBpcre2_dfa_match()\fP matching function.
+\fBpcre2_dfa_match()\fP matching function, but it does also limit JIT matching 
+(though the counting is done differently).
 .P
-In some environments it is desirable to limit the depth of recursive calls of
-\fBmatch()\fP more strictly than the total number of calls, in order to
-restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
-is specified) that is used. A second limit controls this; it defaults to the
-value that is set for --with-match-limit, which imposes no additional
-constraints. However, you can set a lower limit by adding, for example,
+In some environments it is desirable to limit the depth of nested backtracking
+in order to restrict the maximum amount of heap memory that is used. A second
+limit controls this; it defaults to the value that is set for
+--with-match-limit. You can set a lower default limit by adding, for example,
 .sp
-  --with-match-limit-recursion=10000
+  --with-match-limit_depth=10000
 .sp
 to the \fBconfigure\fP command. This value can also be overridden at run time.
+As well as applying to \fBpcre2_match()\fP, this limit also controls the depth 
+of recursive function calls in \fBpcre2_dfa_match()\fP. These are used for 
+lookaround assertions and recursion within patterns.
 .
 .
 .SH "CREATING CHARACTER TABLES AT BUILD TIME"
@@ -312,10 +286,10 @@
 to the \fBconfigure\fP command, the distributed tables are no longer used.
 Instead, a program called \fBdftables\fP is compiled and run. This outputs the
 source for new set of tables, created in the default locale of your C run-time
-system. (This method of replacing the tables does not work if you are cross
+system. This method of replacing the tables does not work if you are cross
 compiling, because \fBdftables\fP is run on the local host. If you need to
 create alternative tables when cross compiling, you will have to do so "by
-hand".)
+hand".
 .
 .
 .SH "USING EBCDIC CODE"
@@ -529,14 +503,30 @@
 a pointer to a string and the length of the string. When called, this function
 tries to compile the string as a pattern, and if that succeeds, to match it.
 This is done both with no options and with some random options bits that are
-generated from the string. Setting --enable-fuzz-support also causes a binary
-called \fBpcre2fuzzcheck\fP to be created. This is normally run under valgrind
-or used when PCRE2 is compiled with address sanitizing enabled. It calls the
-fuzzing function and outputs information about it is doing. The input strings
-are specified by arguments: if an argument starts with "=" the rest of it is a
-literal input string. Otherwise, it is assumed to be a file name, and the
-contents of the file are the test string.
+generated from the string. 
+.P
+Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP
+to be created. This is normally run under valgrind or used when PCRE2 is
+compiled with address sanitizing enabled. It calls the fuzzing function and
+outputs information about it is doing. The input strings are specified by
+arguments: if an argument starts with "=" the rest of it is a literal input
+string. Otherwise, it is assumed to be a file name, and the contents of the
+file are the test string.
 .
+.
+.SH "OBSOLETE OPTION"
+.rs
+.sp
+In versions of PCRE2 prior to 10.30, there were two ways of handling 
+backtracking in the \fBpcre2_match()\fP function. The default was to use the 
+system stack, but if
+.sp
+  --disable-stack-for-recursion
+.sp
+was set, memory on the heap was used. From release 10.30 onwards this has 
+changed (the stack is no lonter used) and this option now does nothing except
+give a warning.
+.
 .SH "SEE ALSO"
 .rs
 .sp
@@ -557,6 +547,6 @@
 .rs
 .sp
 .nf
-Last updated: 01 November 2016
-Copyright (c) 1997-2016 University of Cambridge.
+Last updated: 29 March 2017
+Copyright (c) 1997-2017 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2callout.3
===================================================================
--- code/trunk/doc/pcre2callout.3    2017-03-29 08:12:32 UTC (rev 715)
+++ code/trunk/doc/pcre2callout.3    2017-03-29 17:18:08 UTC (rev 716)
@@ -1,4 +1,4 @@
-.TH PCRE2CALLOUT 3 "29 September 2016" "PCRE2 10.23"
+.TH PCRE2CALLOUT 3 "29 March 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -40,8 +40,8 @@
 .sp
 If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
 automatically inserts callouts, all with number 255, before each item in the
-pattern except for immediately before or after a callout item in the pattern.
-For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+pattern except for immediately before or after an explicit callout. For
+example, if PCRE2_AUTO_CALLOUT is used with the pattern
 .sp
   A(?C3)B
 .sp
@@ -55,7 +55,7 @@
 .sp
 With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
 .sp
-(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
+  (?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
 .sp
 Notice that there is a callout before and after each parenthesis and
 alternation bar. If the pattern contains a conditional group whose condition is
@@ -124,11 +124,14 @@
 a pattern. If PCRE2_DOTALL is set, so that the dot can match any character, the
 pattern is automatically anchored. If PCRE2_DOTALL is not set, a match can
 start only after an internal newline or at the beginning of the subject, and
-\fBpcre2_compile()\fP remembers this. This optimization is disabled, however,
-if .* is in an atomic group or if there is a back reference to the capturing
-group in which it appears. It is also disabled if the pattern contains (*PRUNE)
-or (*SKIP). However, the presence of callouts does not affect it.
+\fBpcre2_compile()\fP remembers this. If a pattern has more than one top-level
+branch, automatic anchoring occurs if all branches are anchorable.
 .P
+This optimization is disabled, however, if .* is in an atomic group or if there
+is a back reference to the capturing group in which it appears. It is also
+disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
+callouts does not affect it.
+.P
 For example, if the pattern .*\ed is compiled with PCRE2_AUTO_CALLOUT and
 applied to the string "aa", the \fBpcre2test\fP output is:
 .sp
@@ -157,9 +160,6 @@
 This shows more match attempts, starting at the second subject character.
 Another optimization, described in the next section, means that there is no
 subsequent attempt to match with an empty subject.
-.P
-If a pattern has more than one top-level branch, automatic anchoring occurs if
-all branches are anchorable.
 .
 .
 .SS "Other optimizations"
@@ -175,9 +175,10 @@
 start, and the callout is never reached. However, with "abyd", though the
 result is still no match, the callout is obeyed.
 .P
-PCRE2 also knows the minimum length of a matching string, and will immediately
-give a "no match" return without actually running a match if the subject is not
-long enough, or, for unanchored patterns, if it has been scanned far enough.
+For most patterns PCRE2 also knows the minimum length of a matching string, and
+will immediately give a "no match" return without actually running a match if
+the subject is not long enough, or, for unanchored patterns, if it has been
+scanned far enough.
 .P
 You can disable these optimizations by passing the PCRE2_NO_START_OPTIMIZE
 option to \fBpcre2_compile()\fP, or by starting the pattern with
@@ -259,13 +260,38 @@
 The remaining fields in the callout block are the same for both kinds of
 callout.
 .P
-The \fIoffset_vector\fP field is a pointer to the vector of capturing offsets
-(the "ovector") that was passed to the matching function in the match data
-block. When \fBpcre2_match()\fP is used, the contents can be inspected in
+The \fIoffset_vector\fP field is a pointer to a vector of capturing offsets
+(the "ovector"). You may read certain elements in this vector, but you must not
+change any of them.
+.P
+For calls to \fBpcre2_match()\fP, the \fIoffset_vector\fP field is not (since
+release 10.30) a pointer to the actual ovector that was passed to the matching
+function in the match data block. Instead it points to an internal ovector of a
+size large enough to hold all possible captured substrings in the pattern. Note
+that whenever a recursion or subroutine call within a pattern completes, the
+capturing state is reset to what it was before.
+.P
+The \fIcapture_last\fP field contains the number of the most recently captured
+substring, and the \fIcapture_top\fP field contains one more than the number of
+the highest numbered captured substring so far. If no substrings have yet been
+captured, the value of \fIcapture_last\fP is 0 and the value of
+\fIcapture_top\fP is 1. The values of these fields do not always differ by one;
+for example, when the callout in the pattern ((a)(b))(?C2) is taken,
+\fIcapture_last\fP is 1 but \fIcapture_top\fP is 4.
+.P
+The contents of ovector[2] to ovector[<capture_top>*2-1] can be inspected in
 order to extract substrings that have been matched so far, in the same way as
-for extracting substrings after a match has completed. For the DFA matching
-function, this field is not useful.
+extracting substrings after a match has completed. The values in ovector[0] and 
+ovector[1] are undefined and should not be used in any way. Substrings that 
+have not been captured (but whose numbers are less than \fIcapture_top\fP) have 
+both of their ovector slots set to PCRE2_UNSET.
 .P
+For DFA matching, the \fIoffset_vector\fP field points to the ovector that was
+passed to the matching function in the match data block, but it holds no useful
+information at callout time because \fBpcre2_dfa_match()\fP does not support
+substring capturing. The value of \fIcapture_top\fP is always 1 and the value
+of \fIcapture_last\fP is always 0 for DFA matching.
+.P
 The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
 that were passed to the matching function.
 .P
@@ -279,18 +305,6 @@
 The \fIcurrent_position\fP field contains the offset within the subject of the
 current match pointer.
 .P
-When the \fBpcre2_match()\fP is used, the \fIcapture_top\fP field contains one
-more than the number of the highest numbered captured substring so far. If no
-substrings have been captured, the value of \fIcapture_top\fP is one. This is
-always the case when the DFA functions are used, because they do not support
-captured substrings.
-.P
-The \fIcapture_last\fP field contains the number of the most recently captured
-substring. However, when a recursion exits, the value reverts to what it was
-outside the recursion, as do the values of all captured substrings. If no
-substrings have been captured, the value of \fIcapture_last\fP is 0. This is
-always the case for the DFA matching functions.
-.P
 The \fIpattern_position\fP field contains the offset in the pattern string to
 the next item to be matched.
 .P
@@ -396,6 +410,6 @@
 .rs
 .sp
 .nf
-Last updated: 29 September 2016
-Copyright (c) 1997-2016 University of Cambridge.
+Last updated: 29 March 2017
+Copyright (c) 1997-2017 University of Cambridge.
 .fi