[Pcre-svn] [147] code/trunk: Further substitution tests (cod…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [147] code/trunk: Further substitution tests (code and data), and more documentation.
Revision: 147
          http://www.exim.org/viewvc/pcre2?view=rev&revision=147
Author:   ph10
Date:     2014-11-14 18:41:20 +0000 (Fri, 14 Nov 2014)


Log Message:
-----------
Further substitution tests (code and data), and more documentation.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/html/pcre2.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2jit.html
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/html/pcre2syntax.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.3
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2jit.3
    code/trunk/doc/pcre2pattern.3
    code/trunk/doc/pcre2syntax.3
    code/trunk/doc/pcre2test.1
    code/trunk/doc/pcre2test.txt
    code/trunk/src/pcre2_substitute.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput5
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput5


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/ChangeLog    2014-11-14 18:41:20 UTC (rev 147)
@@ -51,4 +51,6 @@
 whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
 matched against "abcd".


+8. The pcre2_substitute() function has been implemented.
+
****

Modified: code/trunk/doc/html/pcre2.html
===================================================================
--- code/trunk/doc/html/pcre2.html    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/html/pcre2.html    2014-11-14 18:41:20 UTC (rev 147)
@@ -135,7 +135,7 @@
 listing), and the short pages for individual functions, are concatenated in
 <b>pcre2.txt</b>, for ease of searching. The sections are as follows:
 <pre>
-  pcre2              this document FIXME CHECK THIS LIST
+  pcre2              this document
   pcre2-config       show PCRE2 installation configuration information
   pcre2api           details of PCRE2's native C API
   pcre2build         building PCRE2


Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/html/pcre2api.html    2014-11-14 18:41:20 UTC (rev 147)
@@ -1089,7 +1089,7 @@
 Which characters are interpreted as newlines can be specified by a setting in
 the compile context that is passed to <b>pcre2_compile()</b> or by a special
 sequence at the start of the pattern, as described in the section entitled
-<a href="pcrepattern.html#newlines">"Newline conventions"</a>
+<a href="pcre2pattern.html#newlines">"Newline conventions"</a>
 in the <b>pcre2pattern</b> documentation. A default is defined when PCRE2 is
 built.
 <pre>
@@ -1243,7 +1243,7 @@
 \w, and some of the POSIX character classes. By default, only ASCII characters
 are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to
 classify characters. More details are given in the section on
-<a href="pcre2.html#genericchartypes">generic character types</a>
+<a href="pcre2pattern.html#genericchartypes">generic character types</a>
 in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 page. If you set PCRE2_UCP, matching one of the items it affects takes much
@@ -1924,11 +1924,8 @@
 <P>
 When PCRE2 is built, a default newline convention is set; this is usually the
 standard convention for the operating system. The default can be overridden in
-either a
-<a href="#compilecontext">compile context</a>
-or a
-<a href="#matchcontext">match context.</a>
-However, changing the newline convention at match time disables JIT matching.
+a
+<a href="#compilecontext">compile context.</a>
 During matching, the newline choice affects the behaviour of the dot,
 circumflex, and dollar metacharacters. It may also alter the way the match
 position is advanced after a match failure for an unanchored pattern.
@@ -2290,7 +2287,7 @@
 can be distinguished from a genuine zero-length substring by inspecting the
 appropriate offset in the ovector, which contains PCRE2_UNSET for unset
 substrings.
-<a name="extractbynname"></a></P>
+<a name="extractbyname"></a></P>
 <br><a name="SEC27" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
 <P>
 <b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
@@ -2358,7 +2355,8 @@
 be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
 </P>
 <P>
-In the replacement string, which is interpreted as a UTF string in UTF mode, a
+In the replacement string, which is interpreted as a UTF string in UTF mode,
+and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
 dollar character is an escape character that can specify the insertion of
 characters from capturing groups in the pattern. The following forms are
 recognized:


Modified: code/trunk/doc/html/pcre2jit.html
===================================================================
--- code/trunk/doc/html/pcre2jit.html    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/html/pcre2jit.html    2014-11-14 18:41:20 UTC (rev 147)
@@ -51,11 +51,12 @@
 you want to use JIT. The support is limited to the following hardware
 platforms:
 <pre>
-  ARM v5, v7, and Thumb2
+  ARM 32-bit (v5, v7, and Thumb2)
+  ARM 64-bit
   Intel x86 32-bit and 64-bit
-  MIPS 32-bit
+  MIPS 32-bit and 64-bit
   Power PC 32-bit and 64-bit
-  SPARC 32-bit (experimental)
+  SPARC 32-bit
 </pre>
 If --enable-jit is set on an unsupported platform, compilation fails.
 </P>
@@ -73,11 +74,11 @@
 call <b>pcre2_jit_compile()</b> after successfully compiling a pattern with
 <b>pcre2_compile()</b>. This function has two arguments: the first is the
 compiled pattern pointer that was returned by <b>pcre2_compile()</b>, and the
-second is a set of option bits, which must include at least one of
-PCRE2_JIT_COMPLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
+second is zero or more of the following option bits: PCRE2_JIT_COMPLETE,
+PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
 </P>
 <P>
-If JIT support is not available, a call to <b>pcre2_jit_comple()</b> does
+If JIT support is not available, a call to <b>pcre2_jit_compile()</b> does
 nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled pattern
 is passed to the JIT compiler, which turns it into machine code that executes
 much faster than the normal interpretive code, but yields exactly the same
@@ -95,6 +96,20 @@
 using interpretive code.
 </P>
 <P>
+You can call <b>pcre2_jit_compile()</b> multiple times for the same compiled
+pattern. It does nothing if it has previously compiled code for any of the
+option bits. For example, you can call it once with PCRE2_JIT_COMPLETE and
+(perhaps later, when you find you need partial matching) again with
+PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it will ignore
+PCRE2_JIT_COMPLETE and just compile code for partial matching. If
+<b>pcre2_jit_compile()</b> is called with no option bits set, it immediately
+returns zero. This is an alternative way of testing if JIT is available.
+</P>
+<P>
+At present, it is not possible to free JIT compiled code except when the entire
+compiled pattern is freed by calling <b>pcre2_free_code()</b>.
+</P>
+<P>
 In some circumstances you may need to call additional functions. These are
 described in the section entitled
 <a href="#stackcontrol">"Controlling the JIT stack"</a>
@@ -167,7 +182,7 @@
 pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
 is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
 that is no longer needed. (For the technically minded: the address space is
-allocated by mmap or VirtualAlloc.)  FIXME Is this right?
+allocated by mmap or VirtualAlloc.)
 </P>
 <P>
 JIT uses far less memory for recursion than the interpretive code,
@@ -187,7 +202,8 @@
 used. There are three cases for the values of the other two options:
 <pre>
   (1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
-      on the machine stack is used.
+      on the machine stack is used. This is the default when a match
+      context is created.


   (2) If <i>callback</i> is NULL and <i>data</i> is not NULL, <i>data</i> must be
       a pointer to a valid JIT stack, the result of calling
@@ -402,7 +418,7 @@
 </P>
 <br><a name="SEC13" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 08 November 2014
+Last updated: 12 November 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/html/pcre2pattern.html    2014-11-14 18:41:20 UTC (rev 147)
@@ -100,8 +100,8 @@
 <P>
 Some applications that allow their users to supply patterns may wish to
 restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF
-option is set at compile time, (*UTF) is not allowed, and its appearance causes
-an error.
+option is passed to <b>pcre2_compile()</b>, (*UTF) is not allowed, and its
+appearance in a pattern causes an error.
 </P>
 <br><b>
 Unicode property support
@@ -113,7 +113,23 @@
 instead of recognizing only characters with codes less than 128 via a lookup
 table.
 </P>
+<P>
+Some applications that allow their users to supply patterns may wish to
+restrict them for security reasons. If the PCRE2_NEVER_UCP option is passed to
+<b>pcre2_compile()</b>, (*UCP) is not allowed, and its appearance in a pattern
+causes an error.
+</P>
 <br><b>
+Locking out empty string matching
+</b><br>
+<P>
+Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same effect
+as passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option to whichever
+matching function is subsequently called to match the pattern. These options
+lock out the matching of empty strings, either entirely, or only at the start
+of the subject.
+</P>
+<br><b>
 Disabling auto-possessification
 </b><br>
 <P>
@@ -133,6 +149,28 @@
 reaching "no match" results. For more details, see the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation.
+</P>
+<br><b>
+Setting match and recursion limits
+</b><br>
+<P>
+The caller of <b>pcre2_match()</b> can set a limit on the number of times the
+internal <b>match()</b> function is called and on the maximum depth of
+recursive calls. These facilities are provided to catch runaway matches that
+are provoked by patterns with huge matching trees (a typical example is a
+pattern with nested unlimited repeats) and to avoid running out of system stack
+by too much recursion. When one of these limits is reached, <b>pcre2_match()</b>
+gives an error return. The limits can also be set by items at the start of the
+pattern of the form
+<pre>
+  (*LIMIT_MATCH=d)
+  (*LIMIT_RECURSION=d)
+</pre>
+where d is any number of decimal digits. However, the value of the setting must
+be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
+for it to have any effect. In other words, the pattern writer can lower the
+limits set by the programmer, but not raise them. If there is more than one
+setting of one of these limits, the lower value is used.
 <a name="newlines"></a></P>
 <br><b>
 Newline conventions
@@ -179,26 +217,14 @@
 convention.
 </P>
 <br><b>
-Setting match and recursion limits
+Specifying what \R matches
 </b><br>
 <P>
-The caller of <b>pcre2_match()</b> can set a limit on the number of times the
-internal <b>match()</b> function is called and on the maximum depth of
-recursive calls. These facilities are provided to catch runaway matches that
-are provoked by patterns with huge matching trees (a typical example is a
-pattern with nested unlimited repeats) and to avoid running out of system stack
-by too much recursion. When one of these limits is reached, <b>pcre2_match()</b>
-gives an error return. The limits can also be set by items at the start of the
-pattern of the form
-<pre>
-  (*LIMIT_MATCH=d)
-  (*LIMIT_RECURSION=d)
-</pre>
-where d is any number of decimal digits. However, the value of the setting must
-be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
-for it to have any effect. In other words, the pattern writer can lower the
-limits set by the programmer, but not raise them. If there is more than one
-setting of one of these limits, the lower value is used.
+It is possible to restrict \R to match only CR, LF, or CRLF (instead of the
+complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF
+at compile time. This effect can also be achieved by starting a pattern with
+(*BSR_ANYCRLF). For completeness, (*BSR_UNICODE) is also recognized,
+corresponding to PCRE2_BSR_UNICODE.
 </P>
 <br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
 <P>
@@ -2280,8 +2306,8 @@
 </PRE>
 </P>
 <P>
-There are four kinds of condition: references to subpatterns, references to
-recursion, a pseudo-condition called DEFINE, and assertions.
+There are five kinds of condition: references to subpatterns, references to
+recursion, two pseudo-conditions called DEFINE and VERSION, and assertions.
 </P>
 <br><b>
 Checking for a used subpattern by number
@@ -2389,6 +2415,23 @@
 components of an IPv4 address, insisting on a word boundary at each end.
 </P>
 <br><b>
+Checking the PCRE2 version
+</b><br>
+<P>
+Programs that link with a PCRE2 library can check the version by calling
+<b>pcre2_config()</b> with appropriate arguments. Users of applications that do
+not have access to the underlying code cannot do this. A special "condition"
+called VERSION exists to allow such users to discover which version of PCRE2
+they are dealing with by using this condition to match a string such as
+"yesno". VERSION must be followed either by "=" or "&#62;=" and a version number.
+For example:
+<pre>
+  (?(VERSION&#62;=10.4)yes|no)
+</pre>
+This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or
+"no" otherwise.
+</P>
+<br><b>
 Assertion conditions
 </b><br>
 <P>
@@ -3180,7 +3223,7 @@
 <br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2api</b>(3), <b>pcre2callout</b>(3), <b>pcre2matching</b>(3),
-<b>pcre2syntax</b>(3), <b>pcre2</b>(3), <b>pcre216(3)</b>, <b>pcre232(3)</b>.
+<b>pcre2syntax</b>(3), <b>pcre2</b>(3).
 </P>
 <br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
 <P>
@@ -3193,7 +3236,7 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 November 2014
+Last updated: 14 November 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2syntax.html
===================================================================
--- code/trunk/doc/html/pcre2syntax.html    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/html/pcre2syntax.html    2014-11-14 18:41:20 UTC (rev 147)
@@ -493,17 +493,18 @@
   (?(condition)yes-pattern)
   (?(condition)yes-pattern|no-pattern)


-  (?(n)...        absolute reference condition
-  (?(+n)...       relative reference condition
-  (?(-n)...       relative reference condition
-  (?(&#60;name&#62;)...   named reference condition (Perl)
-  (?('name')...   named reference condition (Perl)
-  (?(name)...     named reference condition (PCRE2)
-  (?(R)...        overall recursion condition
-  (?(Rn)...       specific group recursion condition
-  (?(R&name)...   specific recursion condition
-  (?(DEFINE)...   define subpattern for reference
-  (?(assert)...   assertion condition
+  (?(n)               absolute reference condition
+  (?(+n)              relative reference condition
+  (?(-n)              relative reference condition
+  (?(&#60;name&#62;)          named reference condition (Perl)
+  (?('name')          named reference condition (Perl)
+  (?(name)            named reference condition (PCRE2)
+  (?(R)               overall recursion condition
+  (?(Rn)              specific group recursion condition
+  (?(R&name)          specific recursion condition
+  (?(DEFINE)          define subpattern for reference
+  (?(VERSION[&#62;]=n.m)  test PCRE2 version
+  (?(assert)          assertion condition
 </PRE>
 </P>
 <br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
@@ -552,7 +553,7 @@
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 October 2014
+Last updated: 14 November 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/html/pcre2test.html    2014-11-14 18:41:20 UTC (rev 147)
@@ -201,10 +201,11 @@
 <P>
 <b>-t</b>
 Run each compile and match many times with a timer, and output the resulting
-times per compile or match. You can control the number of iterations that are
-used for timing by following <b>-t</b> with a number (as a separate item on the
-command line). For example, "-t 1000" iterates 1000 times. The default is to
-iterate 500,000 times.
+times per compile or match. When JIT is used, separate times are given for the
+initial compile and the JIT compile. You can control the number of iterations
+that are used for timing by following <b>-t</b> with a number (as a separate
+item on the command line). For example, "-t 1000" iterates 1000 times. The
+default is to iterate 500,000 times.
 </P>
 <P>
 <b>-tm</b>
@@ -490,7 +491,6 @@
       tables=[0|1|2]            select internal tables
 </pre>
 The effects of these modifiers are described in the following sections.
-FIXME: Give more examples.
 </P>
 <br><b>
 Newline and \R handling
@@ -528,7 +528,31 @@
 <P>
 The <b>info</b> modifier requests information about the compiled pattern
 (whether it is anchored, has a fixed first character, and so on). The
-information is obtained from the <b>pcre2_pattern_info()</b> function.
+information is obtained from the <b>pcre2_pattern_info()</b> function. Here are
+some typical examples:
+<pre>
+    re&#62; /(?i)(^a|^b)/m,info
+  Capturing subpattern count = 1
+  Compile options: multiline
+  Overall options: caseless multiline
+  First code unit at start or follows newline
+  Subject length lower bound = 1
+
+    re&#62; /(?i)abc/info
+  Capturing subpattern count = 0
+  Compile options: &#60;none&#62;
+  Overall options: caseless
+  First code unit = 'a' (caseless)
+  Last code unit = 'c' (caseless)
+  Subject length lower bound = 3
+</pre>
+"Compile options" are those specified to the compile function; "overall
+options" have added options that are taken or deduced from the pattern. If both
+sets of options are the same, just a single "options" line is output. "First
+code unit" is where any match must start; if there is more than one they are
+listed as "starting code units". "Last code unit" is the last literal code unit
+that must be present in any match. This is not necessarily the last character.
+These lines are omitted if no starting or ending code units are recorded.
 </P>
 <br><b>
 Specifying a pattern in hex
@@ -543,8 +567,8 @@
 This feature is provided as a way of creating patterns that contain binary zero
 characters. By default, <b>pcre2test</b> passes patterns as zero-terminated
 strings to <b>pcre2_compile()</b>, giving the length as PCRE2_ZERO_TERMINATED.
-However, for patterns specified in hexadecimal, the length of the pattern is
-passed.
+However, for patterns specified in hexadecimal, the actual length of the
+pattern is passed.
 </P>
 <br><b>
 JIT compilation
@@ -571,7 +595,7 @@
 </P>
 <P>
 If the <b>jitfast</b> modifier is specified, matching is done using the JIT
-"fast path" interface (\fBpcre2_jit_match()), which skips some of the sanity
+"fast path" interface, \fBpcre2_jit_match(), which skips some of the sanity
 checks that are done by <b>pcre2_match()</b>, and of course does not work when
 JIT is not supported. If <b>jitfast</b> is specified without <b>jit</b>, jit=7 is
 assumed.
@@ -604,11 +628,17 @@
 Showing pattern memory
 </b><br>
 <P>
-The <b>/memory</b> modifier causes the size in bytes of the memory block used to
-hold the compiled pattern to be output. This does not include the size of the
+The <b>/memory</b> modifier causes the size in bytes of the memory used to hold
+the compiled pattern to be output. This does not include the size of the
 <b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is
 subsequently passed to the JIT compiler, the size of the JIT compiled code is
-also output.
+also output. Here is an example:
+<pre>
+    re&#62; /a(b)c/jit,memory
+  Memory allocation (code space): 21
+  Memory allocation (JIT code): 1910
+
+</PRE>
 </P>
 <br><b>
 Limiting nested parentheses
@@ -650,8 +680,8 @@
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation for details). If the number specified by the modifier is greater
 than zero, <b>pcre2_set_compile_recursion_guard()</b> is called to set up
-callback from <b>pcre2_compile()</b> to a local function. The argument it is
-passed is the current nesting parenthesis depth; if this is greater than the
+callback from <b>pcre2_compile()</b> to a local function. The argument it
+receives is the current nesting parenthesis depth; if this is greater than the
 value given by the modifier, non-zero is returned, causing the compilation to
 be aborted.
 </P>
@@ -688,6 +718,7 @@
       allusedtext         show all consulted text
   /g  global              global matching
       mark                show mark values
+      replace=&#60;string&#62;    specify a replacement string
       startchar           show starting character when relevant
 </pre>
 These modifiers may not appear in a <b>#pattern</b> command. If you want them as
@@ -759,11 +790,11 @@
       offset=&#60;n&#62;                set starting offset
       ovector=&#60;n&#62;               set size of output vector
       recursion_limit=&#60;n&#62;       set a recursion limit
+      replace=&#60;string&#62;          specify a replacement string
       startchar                 show startchar when relevant
       zero_terminate            pass the subject as zero-terminated
 </pre>
 The effects of these modifiers are described in the following sections.
-FIXME: Give more examples.
 </P>
 <br><b>
 Showing more text
@@ -841,6 +872,30 @@
 function.
 </P>
 <br><b>
+Finding all matches in a string
+</b><br>
+<P>
+Searching for all possible matches within a subject can be requested by the
+<b>global</b> or <b>/altglobal</b> modifier. After finding a match, the matching
+function is called again to search the remainder of the subject. The difference
+between <b>global</b> and <b>altglobal</b> is that the former uses the
+<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
+to start searching at a new point within the entire string (which is what Perl
+does), whereas the latter passes over a shortened substring. This makes a
+difference to the matching process if the pattern begins with a lookbehind
+assertion (including \b or \B).
+</P>
+<P>
+If an empty string is matched, the next match is done with the
+PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
+another, non-empty, match at the same point in the subject. If this match
+fails, the start offset is advanced, and the normal match is retried. This
+imitates the way Perl handles such cases when using the <b>/g</b> modifier or
+the <b>split()</b> function. Normally, the start offset is advanced by one
+character, but if the newline convention recognizes CRLF as a newline, and the
+current character is CR followed by LF, an advance of two is used.
+</P>
+<br><b>
 Testing substring extraction functions
 </b><br>
 <P>
@@ -867,28 +922,46 @@
 parentheses after each substring.
 </P>
 <br><b>
-Finding all matches in a string
+Testing the substitution function
 </b><br>
 <P>
-Searching for all possible matches within a subject can be requested by the
-<b>global</b> or <b>/altglobal</b> modifier. After finding a match, the matching
-function is called again to search the remainder of the subject. The difference
-between <b>global</b> and <b>altglobal</b> is that the former uses the
-<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
-to start searching at a new point within the entire string (which is what Perl
-does), whereas the latter passes over a shortened substring. This makes a
-difference to the matching process if the pattern begins with a lookbehind
-assertion (including \b or \B).
+If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
+called instead of one of the matching functions. Unlike subject strings,
+<b>pcre2test</b> does not process replacement strings for escape sequences. In
+UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
+If so, it is correctly converted to a UTF string of the appropriate code unit
+width. If it is not a valid UTF-8 string, the individual code units are copied
+directly. This provides a means of passing an invalid UTF-8 string for testing
+purposes.
 </P>
 <P>
-If an empty string is matched, the next match is done with the
-PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
-another, non-empty, match at the same point in the subject. If this match
-fails, the start offset is advanced, and the normal match is retried. This
-imitates the way Perl handles such cases when using the <b>/g</b> modifier or
-the <b>split()</b> function. Normally, the start offset is advanced by one
-character, but if the newline convention recognizes CRLF as a newline, and the
-current character is CR followed by LF, an advance of two is used.
+If the <b>global</b> modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
+<b>pcre2_substitute()</b>. After a successful substitution, the modified string
+is output, preceded by the number of replacements. This may be zero if there
+were no matches. Here is a simple example of a substitution test:
+<pre>
+  /abc/replace=xxx
+      =abc=abc=
+   1: =xxx=abc=
+      =abc=abc=\=global
+   2: =xxx=xxx=
+</pre>
+Subject and replacement strings should be kept relatively short for
+substitution tests, as fixed-size buffers are used. To make it easy to test for
+buffer overflow, if the replacement string starts with a number in square
+brackets, that number is passed to <b>pcre2_substitute()</b> as the size of the
+output buffer, with the replacement string starting at the next character. Here
+is an example that tests the edge case:
+<pre>
+  /abc/
+      123abc123\=replace=[10]XYZ
+   1: 123XYZ123
+      123abc123\=replace=[9]XYZ
+  Failed: error -47: no more memory
+</pre>
+A replacement string is ignored with POSIX and DFA matching. Specifying partial
+matching provokes an error return ("bad option value") from
+<b>pcre2_substitute()</b>.
 </P>
 <br><b>
 Setting the JIT stack size
@@ -969,10 +1042,10 @@
 A value of zero is useful when testing the POSIX API because it causes
 <b>regexec()</b> to be called with a NULL capture vector. When not testing the
 POSIX API, a value of zero is used to cause
-<b>pcre2_match_data_create_from_pattern</b> to be called, in order to create a
+<b>pcre2_match_data_create_from_pattern()</b> to be called, in order to create a
 match block of exactly the right size for the pattern. (It is not possible to
-create a match block with a zero-length ovector; there is always one pair of
-offsets.)
+create a match block with a zero-length ovector; there is always at least one
+pair of offsets.)
 </P>
 <br><b>
 Passing the subject as zero-terminated
@@ -985,7 +1058,7 @@
 this modifier has no effect, as there is no facility for passing a length.)
 </P>
 <P>
-When testing <b>pcre2_substitute</b>, this modifier also has the effect of
+When testing <b>pcre2_substitute()</b>, this modifier also has the effect of
 passing the replacement string as zero-terminated.
 </P>
 <br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
@@ -1233,7 +1306,7 @@
 </P>
 <br><a name="SEC20" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 09 November 2014
+Last updated: 14 November 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre2.3
===================================================================
--- code/trunk/doc/pcre2.3    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2.3    2014-11-14 18:41:20 UTC (rev 147)
@@ -132,7 +132,7 @@
 listing), and the short pages for individual functions, are concatenated in
 \fBpcre2.txt\fP, for ease of searching. The sections are as follows:
 .sp
-  pcre2              this document FIXME CHECK THIS LIST
+  pcre2              this document
   pcre2-config       show PCRE2 installation configuration information
   pcre2api           details of PCRE2's native C API
   pcre2build         building PCRE2


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2.txt    2014-11-14 18:41:20 UTC (rev 147)
@@ -116,7 +116,7 @@
        tions,  are  concatenated in pcre2.txt, for ease of searching. The sec-
        tions are as follows:


-         pcre2              this document FIXME CHECK THIS LIST
+         pcre2              this document
          pcre2-config       show PCRE2 installation configuration information
          pcre2api           details of PCRE2's native C API
          pcre2build         building PCRE2
@@ -1928,12 +1928,10 @@


        When  PCRE2 is built, a default newline convention is set; this is usu-
        ally the standard convention for the operating system. The default  can
-       be overridden in either a compile context or a match context.  However,
-       changing the newline convention at match time  disables  JIT  matching.
-       During  matching,  the newline choice affects the behaviour of the dot,
-       circumflex, and dollar metacharacters. It may also alter  the  way  the
-       match position is advanced after a match failure for an unanchored pat-
-       tern.
+       be  overridden  in  a  compile  context.   During matching, the newline
+       choice affects  the  behaviour  of  the  dot,  circumflex,  and  dollar
+       metacharacters.  It  may  also  alter  the  way  the  match position is
+       advanced after a match failure for an unanchored pattern.


        When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is
        set,  and a match attempt for an unanchored pattern fails when the cur-
@@ -2320,46 +2318,47 @@
        given as PCRE2_ZERO_TERMINATED for a zero-terminated string.


        In  the replacement string, which is interpreted as a UTF string in UTF
-       mode, a dollar character is an escape character that  can  specify  the
-       insertion  of characters from capturing groups in the pattern. The fol-
-       lowing forms are recognized:
+       mode, and is checked for UTF  validity  unless  the  PCRE2_NO_UTF_CHECK
+       option is set, a dollar character is an escape character that can spec-
+       ify the insertion of characters from capturing groups in  the  pattern.
+       The following forms are recognized:


          $$      insert a dollar character
          $<n>    insert the contents of group <n>
          ${<n>}  insert the contents of group <n>


-       Either a group number or a group name  can  be  given  for  <n>.  Curly
-       brackets  are  required only if the following character would be inter-
+       Either  a  group  number  or  a  group name can be given for <n>. Curly
+       brackets are required only if the following character would  be  inter-
        preted as part of the number or name. The number may be zero to include
-       the  entire  matched  string.   For  example,  if  the pattern a(b)c is
-       matched with "[abc]" and the replacement string "+$1$0$1+", the  result
-       is  "[+babcb+]". Group insertion is done by calling pcre2_copy_byname()
+       the entire matched string.   For  example,  if  the  pattern  a(b)c  is
+       matched  with "[abc]" and the replacement string "+$1$0$1+", the result
+       is "[+babcb+]". Group insertion is done by calling  pcre2_copy_byname()
        or pcre2_copy_bynumber() as appropriate.


-       The first seven arguments of pcre2_substitute() are  the  same  as  for
+       The  first  seven  arguments  of pcre2_substitute() are the same as for
        pcre2_match(), except that the partial matching options are not permit-
-       ted, and match_data may be passed as NULL, in which case a  match  data
-       block  is obtained and freed within this function, using memory manage-
-       ment functions from the match context, if provided, or else those  that
+       ted,  and  match_data may be passed as NULL, in which case a match data
+       block is obtained and freed within this function, using memory  manage-
+       ment  functions from the match context, if provided, or else those that
        were used to allocate memory for the compiled code.


-       There  is  one additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes
+       There is one additional option, PCRE2_SUBSTITUTE_GLOBAL,  which  causes
        the function to iterate over the subject string, replacing every match-
        ing substring. If this is not set, only the first matching substring is
        replaced.


-       The outlengthptr argument must point to a variable  that  contains  the
-       length,  in  code units, of the output buffer. It is updated to contain
+       The  outlengthptr  argument  must point to a variable that contains the
+       length, in code units, of the output buffer. It is updated  to  contain
        the length of the new string, excluding the trailing zero that is auto-
        matically added.


-       The  function  returns  the number of replacements that were made. This
-       may be zero if no matches were found,  and  is  never  greater  than  1
+       The function returns the number of replacements that  were  made.  This
+       may  be  zero  if  no  matches  were found, and is never greater than 1
        unless PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a neg-
-       ative error code is returned. Except for PCRE2_ERROR_NOMATCH (which  is
+       ative  error code is returned. Except for PCRE2_ERROR_NOMATCH (which is
        never returned), any errors from pcre2_match() or the substring copying
        functions  are  passed  straight  back.  PCRE2_ERROR_BADREPLACEMENT  is
-       returned  for an invalid replacement string (unrecognized sequence fol-
+       returned for an invalid replacement string (unrecognized sequence  fol-
        lowing a dollar sign), and PCRE2_ERROR_NOMEMORY is returned if the out-
        put buffer is not big enough.


@@ -2369,54 +2368,54 @@
        int pcre2_substring_nametable_scan(const pcre2_code *code,
          PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);


-       When  a  pattern  is compiled with the PCRE2_DUPNAMES option, names for
-       subpatterns are not required to be unique. Duplicate names  are  always
-       allowed  for subpatterns with the same number, created by using the (?|
-       feature. Indeed, if such subpatterns are named, they  are  required  to
+       When a pattern is compiled with the PCRE2_DUPNAMES  option,  names  for
+       subpatterns  are  not required to be unique. Duplicate names are always
+       allowed for subpatterns with the same number, created by using the  (?|
+       feature.  Indeed,  if  such subpatterns are named, they are required to
        use the same names.


        Normally, patterns with duplicate names are such that in any one match,
-       only one of the named subpatterns participates. An example is shown  in
+       only  one of the named subpatterns participates. An example is shown in
        the pcre2pattern documentation.


-       When   duplicates   are   present,   pcre2_substring_copy_byname()  and
-       pcre2_substring_get_byname() return the first  substring  corresponding
+       When  duplicates   are   present,   pcre2_substring_copy_byname()   and
+       pcre2_substring_get_byname()  return  the first substring corresponding
        to the given name that is set. If none are set, PCRE2_ERROR_NOSUBSTRING
-       is returned. The  pcre2_substring_number_from_name()  function  returns
-       one  of  the  numbers  that are associated with the name, but it is not
+       is  returned.  The  pcre2_substring_number_from_name() function returns
+       one of the numbers that are associated with the name,  but  it  is  not
        defined which it is.


-       If you want to get full details of all captured substrings for a  given
-       name,  you  must use the pcre2_substring_nametable_scan() function. The
-       first argument is the compiled pattern, and the second is the name.  If
-       the  third  and fourth arguments are NULL, the function returns a group
+       If  you want to get full details of all captured substrings for a given
+       name, you must use the pcre2_substring_nametable_scan()  function.  The
+       first  argument is the compiled pattern, and the second is the name. If
+       the third and fourth arguments are NULL, the function returns  a  group
        number (it is not defined which). Otherwise, the third and fourth argu-
-       ments  must  be pointers to variables that are updated by the function.
+       ments must be pointers to variables that are updated by  the  function.
        After it has run, they point to the first and last entries in the name-
        to-number table for the given name, and the function returns the length
-       of each entry. In both cases, PCRE2_ERROR_NOSUBSTRING  is  returned  if
+       of  each  entry.  In both cases, PCRE2_ERROR_NOSUBSTRING is returned if
        there are no entries for the given name.


        The format of the name table is described above in the section entitled
-       Information about a pattern above.  Given all the relevant entries  for
+       Information  about a pattern above.  Given all the relevant entries for
        the name, you can extract each of their numbers, and hence the captured
        data.



FINDING ALL POSSIBLE MATCHES

-       The traditional matching function uses a  similar  algorithm  to  Perl,
+       The  traditional  matching  function  uses a similar algorithm to Perl,
        which stops when it finds the first match, starting at a given point in
-       the subject. If you want to find all possible matches, or  the  longest
-       possible  match  at  a  given  position, consider using the alternative
-       matching function (see below) instead.  If you cannot use the  alterna-
+       the  subject.  If you want to find all possible matches, or the longest
+       possible match at a given  position,  consider  using  the  alternative
+       matching  function (see below) instead.  If you cannot use the alterna-
        tive function, you can kludge it up by making use of the callout facil-
        ity, which is described in the pcre2callout documentation.


        What you have to do is to insert a callout right at the end of the pat-
-       tern.   When your callout function is called, extract and save the cur-
-       rent matched substring. Then return 1, which  forces  pcre2_match()  to
-       backtrack  and  try other alternatives. Ultimately, when it runs out of
+       tern.  When your callout function is called, extract and save the  cur-
+       rent  matched  substring.  Then return 1, which forces pcre2_match() to
+       backtrack and try other alternatives. Ultimately, when it runs  out  of
        matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.



@@ -2428,26 +2427,26 @@
          pcre2_match_context *mcontext,
          int *workspace, PCRE2_SIZE wscount);


-       The function pcre2_dfa_match() is called  to  match  a  subject  string
-       against  a  compiled pattern, using a matching algorithm that scans the
-       subject string just once, and does not backtrack.  This  has  different
-       characteristics  to  the  normal  algorithm, and is not compatible with
-       Perl. Some of the features of PCRE2 patterns are not supported.  Never-
-       theless,  there are times when this kind of matching can be useful. For
-       a discussion of the two matching algorithms, and  a  list  of  features
+       The  function  pcre2_dfa_match()  is  called  to match a subject string
+       against a compiled pattern, using a matching algorithm that  scans  the
+       subject  string  just  once, and does not backtrack. This has different
+       characteristics to the normal algorithm, and  is  not  compatible  with
+       Perl.  Some of the features of PCRE2 patterns are not supported. Never-
+       theless, there are times when this kind of matching can be useful.  For
+       a  discussion  of  the  two matching algorithms, and a list of features
        that pcre2_dfa_match() does not support, see the pcre2matching documen-
        tation.


-       The arguments for the pcre2_dfa_match() function are the  same  as  for
+       The  arguments  for  the pcre2_dfa_match() function are the same as for
        pcre2_match(), plus two extras. The ovector within the match data block
        is used in a different way, and this is described below. The other com-
-       mon  arguments  are used in the same way as for pcre2_match(), so their
+       mon arguments are used in the same way as for pcre2_match(),  so  their
        description is not repeated here.


-       The two additional arguments provide workspace for  the  function.  The
-       workspace  vector  should  contain at least 20 elements. It is used for
+       The  two  additional  arguments provide workspace for the function. The
+       workspace vector should contain at least 20 elements. It  is  used  for
        keeping  track  of  multiple  paths  through  the  pattern  tree.  More
-       workspace  is needed for patterns and subjects where there are a lot of
+       workspace is needed for patterns and subjects where there are a lot  of
        potential matches.


        Here is an example of a simple call to pcre2_dfa_match():
@@ -2467,45 +2466,45 @@


    Option bits for pcre_dfa_match()


-       The unused bits of the options argument for pcre2_dfa_match()  must  be
-       zero.  The  only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
+       The  unused  bits of the options argument for pcre2_dfa_match() must be
+       zero. The only bits that may be set are  PCRE2_ANCHORED,  PCRE2_NOTBOL,
        PCRE2_NOTEOL,          PCRE2_NOTEMPTY,          PCRE2_NOTEMPTY_ATSTART,
        PCRE2_NO_UTF_CHECK,       PCRE2_PARTIAL_HARD,       PCRE2_PARTIAL_SOFT,
-       PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but  the  last  four  of
-       these  are  exactly the same as for pcre2_match(), so their description
+       PCRE2_DFA_SHORTEST,  and  PCRE2_DFA_RESTART.  All  but the last four of
+       these are exactly the same as for pcre2_match(), so  their  description
        is not repeated here.


          PCRE2_PARTIAL_HARD
          PCRE2_PARTIAL_SOFT


-       These have the same general effect as they do  for  pcre2_match(),  but
-       the  details are slightly different. When PCRE2_PARTIAL_HARD is set for
-       pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if  the  end  of  the
+       These  have  the  same general effect as they do for pcre2_match(), but
+       the details are slightly different. When PCRE2_PARTIAL_HARD is set  for
+       pcre2_dfa_match(),  it  returns  PCRE2_ERROR_PARTIAL  if the end of the
        subject is reached and there is still at least one matching possibility
        that requires additional characters. This happens even if some complete
-       matches  have  already  been found. When PCRE2_PARTIAL_SOFT is set, the
-       return code PCRE2_ERROR_NOMATCH is converted  into  PCRE2_ERROR_PARTIAL
-       if  the  end  of  the  subject  is reached, there have been no complete
+       matches have already been found. When PCRE2_PARTIAL_SOFT  is  set,  the
+       return  code  PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
+       if the end of the subject is  reached,  there  have  been  no  complete
        matches, but there is still at least one matching possibility. The por-
-       tion  of  the  string that was inspected when the longest partial match
+       tion of the string that was inspected when the  longest  partial  match
        was found is set as the first matching string in both cases. There is a
-       more  detailed  discussion  of partial and multi-segment matching, with
+       more detailed discussion of partial and  multi-segment  matching,  with
        examples, in the pcre2partial documentation.


          PCRE2_DFA_SHORTEST


-       Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm  to
+       Setting  the PCRE2_DFA_SHORTEST option causes the matching algorithm to
        stop as soon as it has found one match. Because of the way the alterna-
-       tive algorithm works, this is necessarily the shortest  possible  match
+       tive  algorithm  works, this is necessarily the shortest possible match
        at the first possible matching point in the subject string.


          PCRE2_DFA_RESTART


-       When  pcre2_dfa_match() returns a partial match, it is possible to call
+       When pcre2_dfa_match() returns a partial match, it is possible to  call
        it again, with additional subject characters, and have it continue with
        the same match. The PCRE2_DFA_RESTART option requests this action; when
-       it is set, the workspace and wscount options must  reference  the  same
-       vector  as  before  because data about the match so far is left in them
+       it  is  set,  the workspace and wscount options must reference the same
+       vector as before because data about the match so far is  left  in  them
        after a partial match. There is more discussion of this facility in the
        pcre2partial documentation.


@@ -2513,8 +2512,8 @@

        When pcre2_dfa_match() succeeds, it may have matched more than one sub-
        string in the subject. Note, however, that all the matches from one run
-       of  the  function  start  at the same point in the subject. The shorter
-       matches are all initial substrings of the longer matches. For  example,
+       of the function start at the same point in  the  subject.  The  shorter
+       matches  are all initial substrings of the longer matches. For example,
        if the pattern


          <.*>
@@ -2529,66 +2528,66 @@
          <something> <something else>
          <something> <something else> <something further>


-       On  success,  the  yield of the function is a number greater than zero,
-       which is the number of matched substrings.  The  offsets  of  the  sub-
-       strings  are  returned in the ovector, and can be extracted in the same
-       way as for pcre2_match().   They  are  returned  in  reverse  order  of
-       length;  that  is, the longest matching string is given first. If there
-       were too many matches to fit into the ovector, the yield of  the  func-
+       On success, the yield of the function is a number  greater  than  zero,
+       which  is  the  number  of  matched substrings. The offsets of the sub-
+       strings are returned in the ovector, and can be extracted in  the  same
+       way  as  for  pcre2_match().   They  are  returned  in reverse order of
+       length; that is, the longest matching string is given first.  If  there
+       were  too  many matches to fit into the ovector, the yield of the func-
        tion is zero, and the vector is filled with the longest matches.


-       NOTE:  PCRE2's  "auto-possessification" optimization usually applies to
-       character repeats at the end of a pattern (as well as internally).  For
-       example,  the  pattern "a\d+" is compiled as if it were "a\d++" because
-       there is no point in backtracking into the  repeated  digits.  For  DFA
-       matching,  this  means  that  only  one possible match is found. If you
-       really do want multiple matches in such cases, either use  an  ungreedy
-       repeat  ("a\d+?")  or set the PCRE2_NO_AUTO_POSSESS option when compil-
+       NOTE: PCRE2's "auto-possessification" optimization usually  applies  to
+       character  repeats at the end of a pattern (as well as internally). For
+       example, the pattern "a\d+" is compiled as if it were  "a\d++"  because
+       there  is  no  point  in backtracking into the repeated digits. For DFA
+       matching, this means that only one possible  match  is  found.  If  you
+       really  do  want multiple matches in such cases, either use an ungreedy
+       repeat ("a\d+?") or set the PCRE2_NO_AUTO_POSSESS option  when  compil-
        ing.


    Error returns from pcre2_dfa_match()


        The pcre2_dfa_match() function returns a negative number when it fails.
-       Many  of  the  errors  are  the same as for pcre2_match(), as described
+       Many of the errors are the same  as  for  pcre2_match(),  as  described
        above.  There are in addition the following errors that are specific to
        pcre2_dfa_match():


          PCRE2_ERROR_DFA_UITEM


-       This  return  is  given  if pcre2_dfa_match() encounters an item in the
+       This return is given if pcre2_dfa_match() encounters  an  item  in  the
        pattern that it does not support, for instance, the use of \C or a back
        reference.


          PCRE2_ERROR_DFA_UCOND


-       This  return  is given if pcre2_dfa_match() encounters a condition item
-       that uses a back reference for the condition, or a test  for  recursion
+       This return is given if pcre2_dfa_match() encounters a  condition  item
+       that  uses  a back reference for the condition, or a test for recursion
        in a specific group. These are not supported.


          PCRE2_ERROR_DFA_WSSIZE


-       This  return  is  given  if  pcre2_dfa_match() runs out of space in the
+       This return is given if pcre2_dfa_match() runs  out  of  space  in  the
        workspace vector.


          PCRE2_ERROR_DFA_RECURSE


-       When a recursive subpattern is processed, the matching  function  calls
+       When  a  recursive subpattern is processed, the matching function calls
        itself recursively, using private memory for the ovector and workspace.
-       This error is given if the internal ovector is not large  enough.  This
+       This  error  is given if the internal ovector is not large enough. This
        should be extremely rare, as a vector of size 1000 is used.


          PCRE2_ERROR_DFA_BADRESTART


-       When  pcre2_dfa_match()  is  called  with the pcre2_dfa_RESTART option,
-       some plausibility checks are made on the  contents  of  the  workspace,
-       which  should  contain data about the previous partial match. If any of
+       When pcre2_dfa_match() is called  with  the  pcre2_dfa_RESTART  option,
+       some  plausibility  checks  are  made on the contents of the workspace,
+       which should contain data about the previous partial match. If  any  of
        these checks fail, this error is given.



SEE ALSO

-       pcre2build(3),   pcre2libs(3),    pcre2callout(3),    pcre2matching(3),
-       pcre2partial(3),     pcre2posix(3),    pcre2demo(3),    pcre2sample(3),
+       pcre2build(3),    pcre2libs(3),    pcre2callout(3),   pcre2matching(3),
+       pcre2partial(3),    pcre2posix(3),    pcre2demo(3),     pcre2sample(3),
        pcre2stack(3).



@@ -3508,11 +3507,12 @@
        built if you want to use JIT. The support is limited to  the  following
        hardware platforms:


-         ARM v5, v7, and Thumb2
+         ARM 32-bit (v5, v7, and Thumb2)
+         ARM 64-bit
          Intel x86 32-bit and 64-bit
-         MIPS 32-bit
+         MIPS 32-bit and 64-bit
          Power PC 32-bit and 64-bit
-         SPARC 32-bit (experimental)
+         SPARC 32-bit


        If --enable-jit is set on an unsupported platform, compilation fails.


@@ -3531,10 +3531,10 @@
        is to call pcre2_jit_compile() after successfully compiling  a  pattern
        with pcre2_compile(). This function has two arguments: the first is the
        compiled pattern pointer that was returned by pcre2_compile(), and  the
-       second  is  a  set  of  option bits, which must include at least one of
-       PCRE2_JIT_COMPLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
+       second  is  zero  or  more of the following option bits: PCRE2_JIT_COM-
+       PLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.


-       If JIT support is not available,  a  call  to  pcre2_jit_comple()  does
+       If JIT support is not available, a  call  to  pcre2_jit_compile()  does
        nothing  and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled
        pattern is passed to the JIT compiler, which turns it into machine code
        that executes much faster than the normal interpretive code, but yields
@@ -3550,81 +3550,94 @@
        pcre2_match()  is  called,  the appropriate code is run if it is avail-
        able. Otherwise, the pattern is matched using interpretive code.


-       In some circumstances you may need to call additional functions.  These
-       are  described  in  the  section  entitled  "Controlling the JIT stack"
+       You can call pcre2_jit_compile() multiple times for the  same  compiled
+       pattern.  It does nothing if it has previously compiled code for any of
+       the option bits. For example, you can call it once with  PCRE2_JIT_COM-
+       PLETE  and  (perhaps  later,  when  you find you need partial matching)
+       again with PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time  it
+       will ignore PCRE2_JIT_COMPLETE and just compile code for partial match-
+       ing. If pcre2_jit_compile() is called with no option bits set, it imme-
+       diately  returns  zero. This is an alternative way of testing if JIT is
+       available.
+
+       At present, it is not possible to free JIT compiled  code  except  when
+       the entire compiled pattern is freed by calling pcre2_free_code().
+
+       In  some circumstances you may need to call additional functions. These
+       are described in the  section  entitled  "Controlling  the  JIT  stack"
        below.


        There are some pcre2_match() options that are not supported by JIT, and
-       there  are  also some pattern items that JIT cannot handle. Details are
-       given below. In both cases, matching automatically falls  back  to  the
-       interpretive  code.  If  you want to know whether JIT was actually used
-       for a particular match, you should arrange for a JIT callback  function
-       to  be set up as described in the section entitled "Controlling the JIT
-       stack" below, even if you do not  need  to  supply  a  non-default  JIT
+       there are also some pattern items that JIT cannot handle.  Details  are
+       given  below.  In  both cases, matching automatically falls back to the
+       interpretive code. If you want to know whether JIT  was  actually  used
+       for  a particular match, you should arrange for a JIT callback function
+       to be set up as described in the section entitled "Controlling the  JIT
+       stack"  below,  even  if  you  do  not need to supply a non-default JIT
        stack. Such a callback function is called whenever JIT code is about to
-       be obeyed. If the match-time options are not right for  JIT  execution,
+       be  obeyed.  If the match-time options are not right for JIT execution,
        the callback function is not obeyed.


-       If  the  JIT  compiler finds an unsupported item, no JIT data is gener-
-       ated. You can find out if JIT matching is available after  compiling  a
+       If the JIT compiler finds an unsupported item, no JIT  data  is  gener-
+       ated.  You  can find out if JIT matching is available after compiling a
        pattern by calling pcre2_pattern_info() with the PCRE2_INFO_JIT option.
-       A result of 1 means that JIT compilation was successful. A result of  0
-       means  that  JIT  support is not available, or the pattern was not pro-
+       A  result of 1 means that JIT compilation was successful. A result of 0
+       means that JIT support is not available, or the pattern  was  not  pro-
        cessed by pcre2_jit_compile(), or the JIT compiler was not able to han-
        dle the pattern.



UNSUPPORTED OPTIONS AND PATTERN ITEMS

-       The  pcre2_match()  options  that  are  supported  for JIT matching are
-       PCRE2_NOTBOL,  PCRE2_NOTEOL,  PCRE2_NOTEMPTY,   PCRE2_NOTEMPTY_ATSTART,
+       The pcre2_match() options that  are  supported  for  JIT  matching  are
+       PCRE2_NOTBOL,   PCRE2_NOTEOL,  PCRE2_NOTEMPTY,  PCRE2_NOTEMPTY_ATSTART,
        PCRE2_NO_UTF_CHECK,  PCRE2_PARTIAL_HARD,  and  PCRE2_PARTIAL_SOFT.  The
        PCRE2_ANCHORED option is not supported at match time.


-       The only unsupported pattern items are \C (match a  single  data  unit)
-       when  running in a UTF mode, and a callout immediately before an asser-
+       The  only  unsupported  pattern items are \C (match a single data unit)
+       when running in a UTF mode, and a callout immediately before an  asser-
        tion condition in a conditional group.



RETURN VALUES FROM JIT MATCHING

        When a pattern is matched using JIT matching, the return values are the
-       same  as  those  given by the interpretive pcre2_match() code, with the
-       addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This  means
-       that  the memory used for the JIT stack was insufficient. See "Control-
+       same as those given by the interpretive pcre2_match()  code,  with  the
+       addition  of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means
+       that the memory used for the JIT stack was insufficient. See  "Control-
        ling the JIT stack" below for a discussion of JIT stack usage.


-       The error code PCRE2_ERROR_MATCHLIMIT is returned by the  JIT  code  if
-       searching  a  very large pattern tree goes on for too long, as it is in
-       the same circumstance when JIT is not used, but the details of  exactly
-       what  is counted are not the same. The PCRE2_ERROR_RECURSIONLIMIT error
+       The  error  code  PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if
+       searching a very large pattern tree goes on for too long, as it  is  in
+       the  same circumstance when JIT is not used, but the details of exactly
+       what is counted are not the same. The PCRE2_ERROR_RECURSIONLIMIT  error
        code is never returned when JIT matching is used.



CONTROLLING THE JIT STACK

        When the compiled JIT code runs, it needs a block of memory to use as a
-       stack.   By  default,  it  uses 32K on the machine stack. However, some
-       large  or  complicated  patterns  need  more  than  this.   The   error
-       PCRE2_ERROR_JIT_STACKLIMIT  is  given  when  there is not enough stack.
-       Three functions are provided for managing blocks of memory for  use  as
-       JIT  stacks. There is further discussion about the use of JIT stacks in
+       stack.  By default, it uses 32K on the  machine  stack.  However,  some
+       large   or   complicated  patterns  need  more  than  this.  The  error
+       PCRE2_ERROR_JIT_STACKLIMIT is given when there  is  not  enough  stack.
+       Three  functions  are provided for managing blocks of memory for use as
+       JIT stacks. There is further discussion about the use of JIT stacks  in
        the section entitled "JIT stack FAQ" below.


-       The pcre2_jit_stack_create() function creates a JIT  stack.  Its  argu-
-       ments  are  a general context (for memory allocation functions, or NULL
-       for standard memory allocation), a starting size and  a  maximum  size,
-       and   it   returns   a   pointer   to   an  opaque  structure  of  type
-       pcre2_jit_stack,   or   NULL   if    there    is    an    error.    The
-       pcre2_jit_stack_free()  function  is  used  to  free a stack that is no
-       longer needed. (For the technically minded: the address space is  allo-
-       cated by mmap or VirtualAlloc.)  FIXME Is this right?
+       The  pcre2_jit_stack_create()  function  creates a JIT stack. Its argu-
+       ments are a general context (for memory allocation functions,  or  NULL
+       for  standard  memory  allocation), a starting size and a maximum size,
+       and  it  returns  a  pointer   to   an   opaque   structure   of   type
+       pcre2_jit_stack,    or    NULL    if    there    is   an   error.   The
+       pcre2_jit_stack_free() function is used to free  a  stack  that  is  no
+       longer  needed. (For the technically minded: the address space is allo-
+       cated by mmap or VirtualAlloc.)


-       JIT  uses far less memory for recursion than the interpretive code, and
-       a maximum stack size of 512K to 1M should be more than enough  for  any
+       JIT uses far less memory for recursion than the interpretive code,  and
+       a  maximum  stack size of 512K to 1M should be more than enough for any
        pattern.


-       The  pcre2_jit_stack_assign()  function  specifies which stack JIT code
+       The pcre2_jit_stack_assign() function specifies which  stack  JIT  code
        should use. Its arguments are as follows:


          pcre2_match_context  *mcontext
@@ -3633,11 +3646,12 @@


        The first argument is a pointer to a match context. When this is subse-
        quently passed to a matching function, its information determines which
-       JIT stack is used. There are three cases for the values  of  the  other
+       JIT  stack  is  used. There are three cases for the values of the other
        two options:


          (1) If callback is NULL and data is NULL, an internal 32K block
-             on the machine stack is used.
+             on the machine stack is used. This is the default when a match
+             context is created.


          (2) If callback is NULL and data is not NULL, data must be
              a pointer to a valid JIT stack, the result of calling
@@ -3650,30 +3664,30 @@
              return value must be a valid JIT stack, the result of calling
              pcre2_jit_stack_create().


-       A  callback function is obeyed whenever JIT code is about to be run; it
+       A callback function is obeyed whenever JIT code is about to be run;  it
        is not obeyed when pcre2_match() is called with options that are incom-
-       patible  for JIT matching. A callback function can therefore be used to
-       determine whether a match operation was  executed  by  JIT  or  by  the
+       patible for JIT matching. A callback function can therefore be used  to
+       determine  whether  a  match  operation  was  executed by JIT or by the
        interpreter.


        You may safely use the same JIT stack for more than one pattern (either
-       by assigning directly or by callback), as long as the patterns are  all
-       matched  sequentially in the same thread. In a multithread application,
-       if you do not specify a JIT stack, or if you assign or pass  back  NULL
-       from  a  callback, that is thread-safe, because each thread has its own
-       machine stack. However, if you assign  or  pass  back  a  non-NULL  JIT
-       stack,  this  must  be  a  different  stack for each thread so that the
+       by  assigning directly or by callback), as long as the patterns are all
+       matched sequentially in the same thread. In a multithread  application,
+       if  you  do not specify a JIT stack, or if you assign or pass back NULL
+       from a callback, that is thread-safe, because each thread has  its  own
+       machine  stack.  However,  if  you  assign  or pass back a non-NULL JIT
+       stack, this must be a different stack  for  each  thread  so  that  the
        application is thread-safe.


-       Strictly speaking, even more is allowed. You can assign the  same  non-
-       NULL  stack  to a match context that is used by any number of patterns,
-       as long as they are not used for matching by multiple  threads  at  the
-       same  time.  For  example, you could use the same stack in all compiled
-       patterns, with a global mutex in the callback to wait until  the  stack
+       Strictly  speaking,  even more is allowed. You can assign the same non-
+       NULL stack to a match context that is used by any number  of  patterns,
+       as  long  as  they are not used for matching by multiple threads at the
+       same time. For example, you could use the same stack  in  all  compiled
+       patterns,  with  a global mutex in the callback to wait until the stack
        is available for use. However, this is an inefficient solution, and not
        recommended.


-       This is a suggestion for how a multithreaded program that needs to  set
+       This  is a suggestion for how a multithreaded program that needs to set
        up non-default JIT stacks might operate:


          During thread initalization
@@ -3685,7 +3699,7 @@
          Use a one-line callback function
            return thread_local_var


-       All  the  functions  described in this section do nothing if JIT is not
+       All the functions described in this section do nothing if  JIT  is  not
        available.



@@ -3694,20 +3708,20 @@
        (1) Why do we need JIT stacks?


        PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack
-       where  the local data of the current node is pushed before checking its
+       where the local data of the current node is pushed before checking  its
        child nodes.  Allocating real machine stack on some platforms is diffi-
        cult. For example, the stack chain needs to be updated every time if we
-       extend the stack on PowerPC.  Although it  is  possible,  its  updating
+       extend  the  stack  on  PowerPC.  Although it is possible, its updating
        time overhead decreases performance. So we do the recursion in memory.


        (2) Why don't we simply allocate blocks of memory with malloc()?


-       Modern  operating  systems  have  a  nice  feature: they can reserve an
+       Modern operating systems have a  nice  feature:  they  can  reserve  an
        address space instead of allocating memory. We can safely allocate mem-
-       ory  pages  inside  this address space, so the stack could grow without
+       ory pages inside this address space, so the stack  could  grow  without
        moving memory data (this is important because of pointers). Thus we can
-       allocate  1M  address space, and use only a single memory page (usually
-       4K) if that is enough. However, we can still grow up to 1M  anytime  if
+       allocate 1M address space, and use only a single memory  page  (usually
+       4K)  if  that is enough. However, we can still grow up to 1M anytime if
        needed.


        (3) Who "owns" a JIT stack?
@@ -3715,8 +3729,8 @@
        The owner of the stack is the user program, not the JIT studied pattern
        or anything else. The user program must ensure that if a stack is being
        used by pcre2_match(), (that is, it is assigned to a match context that
-       is passed to the pattern currently running), that  stack  must  not  be
-       used  by any other threads (to avoid overwriting the same memory area).
+       is  passed  to  the  pattern currently running), that stack must not be
+       used by any other threads (to avoid overwriting the same memory  area).
        The best practice for multithreaded programs is to allocate a stack for
        each thread, and return this stack through the JIT callback function.


@@ -3724,36 +3738,36 @@

        You can free a JIT stack at any time, as long as it will not be used by
        pcre2_match() again. When you assign the stack to a match context, only
-       a  pointer  is  set. There is no reference counting or any other magic.
+       a pointer is set. There is no reference counting or  any  other  magic.
        You can free compiled patterns, contexts, and stacks in any order, any-
-       time.  Just  do not call pcre2_match() with a match context pointing to
+       time. Just do not call pcre2_match() with a match context  pointing  to
        an already freed stack, as that will cause SEGFAULT. (Also, do not free
-       a  stack  currently  used  by pcre2_match() in another thread). You can
-       also replace the stack in a context at any time when it is not in  use.
+       a stack currently used by pcre2_match() in  another  thread).  You  can
+       also  replace the stack in a context at any time when it is not in use.
        You can also free the previous stack before assigning a replacement.


-       (5)  Should  I  allocate/free  a  stack every time before/after calling
+       (5) Should I allocate/free a  stack  every  time  before/after  calling
        pcre2_match()?


-       No, because this is too costly in  terms  of  resources.  However,  you
-       could  implement  some clever idea which release the stack if it is not
-       used in let's say two minutes. The JIT callback  can  help  to  achieve
+       No,  because  this  is  too  costly in terms of resources. However, you
+       could implement some clever idea which release the stack if it  is  not
+       used  in  let's  say  two minutes. The JIT callback can help to achieve
        this without keeping a list of patterns.


-       (6)  OK, the stack is for long term memory allocation. But what happens
-       if a pattern causes stack overflow with a stack of 1M? Is that 1M  kept
+       (6) OK, the stack is for long term memory allocation. But what  happens
+       if  a pattern causes stack overflow with a stack of 1M? Is that 1M kept
        until the stack is freed?


-       Especially  on embedded sytems, it might be a good idea to release mem-
-       ory sometimes without freeing the stack. There is no API  for  this  at
-       the  moment.  Probably a function call which returns with the currently
-       allocated memory for any stack and another which allows releasing  mem-
+       Especially on embedded sytems, it might be a good idea to release  mem-
+       ory  sometimes  without  freeing the stack. There is no API for this at
+       the moment.  Probably a function call which returns with the  currently
+       allocated  memory for any stack and another which allows releasing mem-
        ory (shrinking the stack) would be a good idea if someone needs this.


        (7) This is too much of a headache. Isn't there any better solution for
        JIT stack handling?


-       No, thanks to Windows. If POSIX threads were used everywhere, we  could
+       No,  thanks to Windows. If POSIX threads were used everywhere, we could
        throw out this complicated API.



@@ -3762,18 +3776,18 @@
        void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);


        The JIT executable allocator does not free all memory when it is possi-
-       ble.  It expects new allocations, and keeps some free memory around  to
-       improve  allocation  speed. However, in low memory conditions, it might
-       be better to free all possible memory. You can cause this to happen  by
-       calling  pcre2_jit_free_unused_memory(). Its argument is a general con-
+       ble.   It expects new allocations, and keeps some free memory around to
+       improve allocation speed. However, in low memory conditions,  it  might
+       be  better to free all possible memory. You can cause this to happen by
+       calling pcre2_jit_free_unused_memory(). Its argument is a general  con-
        text, for custom memory management, or NULL for standard memory manage-
        ment.



EXAMPLE CODE

-       This  is  a  single-threaded example that specifies a JIT stack without
-       using a callback. A real program should include  error  checking  after
+       This is a single-threaded example that specifies a  JIT  stack  without
+       using  a  callback.  A real program should include error checking after
        all the function calls.


          int rc;
@@ -3801,28 +3815,28 @@
 JIT FAST PATH API


        Because the API described above falls back to interpreted matching when
-       JIT is not available, it is convenient for programs  that  are  written
+       JIT  is  not  available, it is convenient for programs that are written
        for  general  use  in  many  environments.  However,  calling  JIT  via
        pcre2_match() does have a performance impact. Programs that are written
-       for  use  where  JIT  is known to be available, and which need the best
-       possible performance, can instead use a "fast path"  API  to  call  JIT
-       matching  directly instead of calling pcre2_match() (obviously only for
+       for use where JIT is known to be available, and  which  need  the  best
+       possible  performance,  can  instead  use a "fast path" API to call JIT
+       matching directly instead of calling pcre2_match() (obviously only  for
        patterns that have been successfully processed by pcre2_jit_compile()).


-       The fast path  function  is  called  pcre2_jit_match(),  and  it  takes
+       The  fast  path  function  is  called  pcre2_jit_match(),  and it takes
        exactly the same arguments as pcre2_match(). The return values are also
        the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or
-       complete)  is  requested that was not compiled. Unsupported option bits
+       complete) is requested that was not compiled. Unsupported  option  bits
        (for example, PCRE2_ANCHORED) are ignored.


-       When you call pcre2_match(), as well as testing for invalid options,  a
+       When  you call pcre2_match(), as well as testing for invalid options, a
        number of other sanity checks are performed on the arguments. For exam-
        ple, if the subject pointer is NULL, an immediate error is given. Also,
-       unless  PCRE2_NO_UTF_CHECK  is  set, a UTF subject string is tested for
-       validity. In the interests of speed, these checks do not happen on  the
+       unless PCRE2_NO_UTF_CHECK is set, a UTF subject string  is  tested  for
+       validity.  In the interests of speed, these checks do not happen on the
        JIT fast path, and if invalid data is passed, the result is undefined.


-       Bypassing  the  sanity  checks  and the pcre2_match() wrapping can give
+       Bypassing the sanity checks and the  pcre2_match()  wrapping  can  give
        speedups of more than 10%.



@@ -3840,7 +3854,7 @@

REVISION

-       Last updated: 08 November 2014
+       Last updated: 12 November 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------



Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2api.3    2014-11-14 18:41:20 UTC (rev 147)
@@ -1063,7 +1063,7 @@
 Which characters are interpreted as newlines can be specified by a setting in
 the compile context that is passed to \fBpcre2_compile()\fP or by a special
 sequence at the start of the pattern, as described in the section entitled
-.\" HTML <a href="pcrepattern.html#newlines">
+.\" HTML <a href="pcre2pattern.html#newlines">
 .\" </a>
 "Newline conventions"
 .\"
@@ -1226,7 +1226,7 @@
 \ew, and some of the POSIX character classes. By default, only ASCII characters
 are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to
 classify characters. More details are given in the section on
-.\" HTML <a href="pcre2.html#genericchartypes">
+.\" HTML <a href="pcre2pattern.html#genericchartypes">
 .\" </a>
 generic character types
 .\"
@@ -1939,17 +1939,11 @@
 .sp
 When PCRE2 is built, a default newline convention is set; this is usually the
 standard convention for the operating system. The default can be overridden in
-either a
+a
 .\" HTML <a href="#compilecontext">
 .\" </a>
-compile context
+compile context.
 .\"
-or a
-.\" HTML <a href="#matchcontext">
-.\" </a>
-match context.
-.\"
-However, changing the newline convention at match time disables JIT matching.
 During matching, the newline choice affects the behaviour of the dot,
 circumflex, and dollar metacharacters. It may also alter the way the match
 position is advanced after a match failure for an unanchored pattern.
@@ -2322,7 +2316,7 @@
 substrings.
 .
 .
-.\" HTML <a name="extractbynname"></a>
+.\" HTML <a name="extractbyname"></a>
 .SH "EXTRACTING CAPTURED SUBSTRINGS BY NAME"
 .rs
 .sp


Modified: code/trunk/doc/pcre2jit.3
===================================================================
--- code/trunk/doc/pcre2jit.3    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2jit.3    2014-11-14 18:41:20 UTC (rev 147)
@@ -28,7 +28,7 @@
 platforms:
 .sp
   ARM 32-bit (v5, v7, and Thumb2)
-  ARM 64-bit 
+  ARM 64-bit
   Intel x86 32-bit and 64-bit
   MIPS 32-bit and 64-bit
   Power PC 32-bit and 64-bit
@@ -79,7 +79,7 @@
 \fBpcre2_jit_compile()\fP is called with no option bits set, it immediately
 returns zero. This is an alternative way of testing if JIT is available.
 .P
-At present, it is not possible to free JIT compiled code except when the entire 
+At present, it is not possible to free JIT compiled code except when the entire
 compiled pattern is freed by calling \fBpcre2_free_code()\fP.
 .P
 In some circumstances you may need to call additional functions. These are
@@ -186,8 +186,8 @@
 used. There are three cases for the values of the other two options:
 .sp
   (1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block
-      on the machine stack is used. This is the default when a match 
-      context is created. 
+      on the machine stack is used. This is the default when a match
+      context is created.
 .sp
   (2) If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must be
       a pointer to a valid JIT stack, the result of calling


Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2pattern.3    2014-11-14 18:41:20 UTC (rev 147)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "03 November 2014" "PCRE2 10.00"
+.TH PCRE2PATTERN 3 "14 November 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -63,8 +63,8 @@
 .P
 Some applications that allow their users to supply patterns may wish to
 restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF
-option is set at compile time, (*UTF) is not allowed, and its appearance causes
-an error.
+option is passed to \fBpcre2_compile()\fP, (*UTF) is not allowed, and its
+appearance in a pattern causes an error.
 .
 .
 .SS "Unicode property support"
@@ -75,8 +75,23 @@
 such as \ed and \ew to use Unicode properties to determine character types,
 instead of recognizing only characters with codes less than 128 via a lookup
 table.
+.P
+Some applications that allow their users to supply patterns may wish to
+restrict them for security reasons. If the PCRE2_NEVER_UCP option is passed to
+\fBpcre2_compile()\fP, (*UCP) is not allowed, and its appearance in a pattern
+causes an error.
 .
 .
+.SS "Locking out empty string matching"
+.rs
+.sp
+Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same effect
+as passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option to whichever
+matching function is subsequently called to match the pattern. These options
+lock out the matching of empty strings, either entirely, or only at the start
+of the subject.
+.
+.
 .SS "Disabling auto-possessification"
 .rs
 .sp
@@ -102,6 +117,28 @@
 documentation.
 .
 .
+.SS "Setting match and recursion limits"
+.rs
+.sp
+The caller of \fBpcre2_match()\fP can set a limit on the number of times the
+internal \fBmatch()\fP function is called and on the maximum depth of
+recursive calls. These facilities are provided to catch runaway matches that
+are provoked by patterns with huge matching trees (a typical example is a
+pattern with nested unlimited repeats) and to avoid running out of system stack
+by too much recursion. When one of these limits is reached, \fBpcre2_match()\fP
+gives an error return. The limits can also be set by items at the start of the
+pattern of the form
+.sp
+  (*LIMIT_MATCH=d)
+  (*LIMIT_RECURSION=d)
+.sp
+where d is any number of decimal digits. However, the value of the setting must
+be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
+for it to have any effect. In other words, the pattern writer can lower the
+limits set by the programmer, but not raise them. If there is more than one
+setting of one of these limits, the lower value is used.
+.
+.
 .\" HTML <a name="newlines"></a>
 .SS "Newline conventions"
 .rs
@@ -153,26 +190,14 @@
 convention.
 .
 .
-.SS "Setting match and recursion limits"
+.SS "Specifying what \eR matches"
 .rs
 .sp
-The caller of \fBpcre2_match()\fP can set a limit on the number of times the
-internal \fBmatch()\fP function is called and on the maximum depth of
-recursive calls. These facilities are provided to catch runaway matches that
-are provoked by patterns with huge matching trees (a typical example is a
-pattern with nested unlimited repeats) and to avoid running out of system stack
-by too much recursion. When one of these limits is reached, \fBpcre2_match()\fP
-gives an error return. The limits can also be set by items at the start of the
-pattern of the form
-.sp
-  (*LIMIT_MATCH=d)
-  (*LIMIT_RECURSION=d)
-.sp
-where d is any number of decimal digits. However, the value of the setting must
-be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
-for it to have any effect. In other words, the pattern writer can lower the
-limits set by the programmer, but not raise them. If there is more than one
-setting of one of these limits, the lower value is used.
+It is possible to restrict \eR to match only CR, LF, or CRLF (instead of the
+complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF
+at compile time. This effect can also be achieved by starting a pattern with
+(*BSR_ANYCRLF). For completeness, (*BSR_UNICODE) is also recognized,
+corresponding to PCRE2_BSR_UNICODE.
 .
 .
 .SH "EBCDIC CHARACTER CODES"
@@ -2302,8 +2327,8 @@
   (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
 .sp
 .P
-There are four kinds of condition: references to subpatterns, references to
-recursion, a pseudo-condition called DEFINE, and assertions.
+There are five kinds of condition: references to subpatterns, references to
+recursion, two pseudo-conditions called DEFINE and VERSION, and assertions.
 .
 .
 .SS "Checking for a used subpattern by number"
@@ -2418,6 +2443,23 @@
 components of an IPv4 address, insisting on a word boundary at each end.
 .
 .
+.SS "Checking the PCRE2 version"
+.rs
+.sp
+Programs that link with a PCRE2 library can check the version by calling
+\fBpcre2_config()\fP with appropriate arguments. Users of applications that do
+not have access to the underlying code cannot do this. A special "condition"
+called VERSION exists to allow such users to discover which version of PCRE2
+they are dealing with by using this condition to match a string such as
+"yesno". VERSION must be followed either by "=" or ">=" and a version number.
+For example:
+.sp
+  (?(VERSION>=10.4)yes|no)
+.sp
+This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or
+"no" otherwise.
+.
+.
 .SS "Assertion conditions"
 .rs
 .sp
@@ -3219,7 +3261,7 @@
 .rs
 .sp
 \fBpcre2api\fP(3), \fBpcre2callout\fP(3), \fBpcre2matching\fP(3),
-\fBpcre2syntax\fP(3), \fBpcre2\fP(3), \fBpcre216(3)\fP, \fBpcre232(3)\fP.
+\fBpcre2syntax\fP(3), \fBpcre2\fP(3).
 .
 .
 .SH AUTHOR
@@ -3236,6 +3278,6 @@
 .rs
 .sp
 .nf
-Last updated: 03 November 2014
+Last updated: 14 November 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2syntax.3
===================================================================
--- code/trunk/doc/pcre2syntax.3    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2syntax.3    2014-11-14 18:41:20 UTC (rev 147)
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "20 October 2014" "PCRE2 10.00"
+.TH PCRE2SYNTAX 3 "14 November 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -470,17 +470,18 @@
   (?(condition)yes-pattern)
   (?(condition)yes-pattern|no-pattern)
 .sp
-  (?(n)...        absolute reference condition
-  (?(+n)...       relative reference condition
-  (?(-n)...       relative reference condition
-  (?(<name>)...   named reference condition (Perl)
-  (?('name')...   named reference condition (Perl)
-  (?(name)...     named reference condition (PCRE2)
-  (?(R)...        overall recursion condition
-  (?(Rn)...       specific group recursion condition
-  (?(R&name)...   specific recursion condition
-  (?(DEFINE)...   define subpattern for reference
-  (?(assert)...   assertion condition
+  (?(n)               absolute reference condition
+  (?(+n)              relative reference condition
+  (?(-n)              relative reference condition
+  (?(<name>)          named reference condition (Perl)
+  (?('name')          named reference condition (Perl)
+  (?(name)            named reference condition (PCRE2)
+  (?(R)               overall recursion condition
+  (?(Rn)              specific group recursion condition
+  (?(R&name)          specific recursion condition
+  (?(DEFINE)          define subpattern for reference
+  (?(VERSION[>]=n.m)  test PCRE2 version
+  (?(assert)          assertion condition
 .
 .
 .SH "BACKTRACKING CONTROL"
@@ -535,6 +536,6 @@
 .rs
 .sp
 .nf
-Last updated: 20 October 2014
+Last updated: 14 November 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2test.1    2014-11-14 18:41:20 UTC (rev 147)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "12 November 2014" "PCRE 10.00"
+.TH PCRE2TEST 1 "14 November 2014" "PCRE 10.00"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -450,7 +450,6 @@
       tables=[0|1|2]            select internal tables
 .sp
 The effects of these modifiers are described in the following sections.
-FIXME: Give more examples.
 .
 .
 .SS "Newline and \eR handling"
@@ -484,7 +483,31 @@
 .P
 The \fBinfo\fP modifier requests information about the compiled pattern
 (whether it is anchored, has a fixed first character, and so on). The
-information is obtained from the \fBpcre2_pattern_info()\fP function.
+information is obtained from the \fBpcre2_pattern_info()\fP function. Here are
+some typical examples:
+.sp
+    re> /(?i)(^a|^b)/m,info
+  Capturing subpattern count = 1
+  Compile options: multiline
+  Overall options: caseless multiline
+  First code unit at start or follows newline
+  Subject length lower bound = 1
+.sp
+    re> /(?i)abc/info
+  Capturing subpattern count = 0
+  Compile options: <none>
+  Overall options: caseless
+  First code unit = 'a' (caseless)
+  Last code unit = 'c' (caseless)
+  Subject length lower bound = 3
+.sp
+"Compile options" are those specified to the compile function; "overall
+options" have added options that are taken or deduced from the pattern. If both
+sets of options are the same, just a single "options" line is output. "First
+code unit" is where any match must start; if there is more than one they are
+listed as "starting code units". "Last code unit" is the last literal code unit
+that must be present in any match. This is not necessarily the last character.
+These lines are omitted if no starting or ending code units are recorded.
 .
 .
 .SS "Specifying a pattern in hex"
@@ -499,8 +522,8 @@
 This feature is provided as a way of creating patterns that contain binary zero
 characters. By default, \fBpcre2test\fP passes patterns as zero-terminated
 strings to \fBpcre2_compile()\fP, giving the length as PCRE2_ZERO_TERMINATED.
-However, for patterns specified in hexadecimal, the length of the pattern is
-passed.
+However, for patterns specified in hexadecimal, the actual length of the
+pattern is passed.
 .
 .
 .SS "JIT compilation"
@@ -528,7 +551,7 @@
 setting the size of the JIT stack.
 .P
 If the \fBjitfast\fP modifier is specified, matching is done using the JIT
-"fast path" interface (\fBpcre2_jit_match()), which skips some of the sanity
+"fast path" interface, \fBpcre2_jit_match(), which skips some of the sanity
 checks that are done by \fBpcre2_match()\fP, and of course does not work when
 JIT is not supported. If \fBjitfast\fP is specified without \fBjit\fP, jit=7 is
 assumed.
@@ -560,11 +583,16 @@
 .SS "Showing pattern memory"
 .rs
 .sp
-The \fB/memory\fP modifier causes the size in bytes of the memory block used to
-hold the compiled pattern to be output. This does not include the size of the
+The \fB/memory\fP modifier causes the size in bytes of the memory used to hold
+the compiled pattern to be output. This does not include the size of the
 \fBpcre2_code\fP block; it is just the actual compiled data. If the pattern is
 subsequently passed to the JIT compiler, the size of the JIT compiled code is
-also output.
+also output. Here is an example:
+.sp
+    re> /a(b)c/jit,memory
+  Memory allocation (code space): 21
+  Memory allocation (JIT code): 1910
+.sp
 .
 .
 .SS "Limiting nested parentheses"
@@ -608,8 +636,8 @@
 .\"
 documentation for details). If the number specified by the modifier is greater
 than zero, \fBpcre2_set_compile_recursion_guard()\fP is called to set up
-callback from \fBpcre2_compile()\fP to a local function. The argument it is
-passed is the current nesting parenthesis depth; if this is greater than the
+callback from \fBpcre2_compile()\fP to a local function. The argument it
+receives is the current nesting parenthesis depth; if this is greater than the
 value given by the modifier, non-zero is returned, causing the compilation to
 be aborted.
 .
@@ -646,7 +674,7 @@
       allusedtext         show all consulted text
   /g  global              global matching
       mark                show mark values
-      replace=<string>    specify a replacement string 
+      replace=<string>    specify a replacement string
       startchar           show starting character when relevant
 .sp
 These modifiers may not appear in a \fB#pattern\fP command. If you want them as
@@ -721,12 +749,11 @@
       offset=<n>                set starting offset
       ovector=<n>               set size of output vector
       recursion_limit=<n>       set a recursion limit
-      replace=<string>          specify a replacement string 
+      replace=<string>          specify a replacement string
       startchar                 show startchar when relevant
       zero_terminate            pass the subject as zero-terminated
 .sp
 The effects of these modifiers are described in the following sections.
-FIXME: Give more examples.
 .
 .
 .SS "Showing more text"
@@ -850,14 +877,14 @@
 .SS "Testing the substitution function"
 .rs
 .sp
-If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is 
+If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
 called instead of one of the matching functions. Unlike subject strings,
 \fBpcre2test\fP does not process replacement strings for escape sequences. In
 UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
 If so, it is correctly converted to a UTF string of the appropriate code unit
 width. If it is not a valid UTF-8 string, the individual code units are copied
 directly. This provides a means of passing an invalid UTF-8 string for testing
-purposes. 
+purposes.
 .P
 If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
 \fBpcre2_substitute()\fP. After a successful substitution, the modified string
@@ -867,16 +894,23 @@
   /abc/replace=xxx
       =abc=abc=
    1: =xxx=abc=
-      =abc=abc=\=global
+      =abc=abc=\e=global
    2: =xxx=xxx=
 .sp
-Subject and replacement strings should be kept relatively short for 
+Subject and replacement strings should be kept relatively short for
 substitution tests, as fixed-size buffers are used. To make it easy to test for
-buffer overflow, if the replacement string starts with a number in square 
-brackets, that number is passed to \fBpcre2_substitute()\fP as the size of the 
-output buffer, with the replacement string starting at the next character.
-.P
-A replacement string is ignored with POSIX and DFA matching. Specifying partial 
+buffer overflow, if the replacement string starts with a number in square
+brackets, that number is passed to \fBpcre2_substitute()\fP as the size of the
+output buffer, with the replacement string starting at the next character. Here
+is an example that tests the edge case:
+.sp
+  /abc/
+      123abc123\e=replace=[10]XYZ
+   1: 123XYZ123
+      123abc123\e=replace=[9]XYZ
+  Failed: error -47: no more memory
+.sp
+A replacement string is ignored with POSIX and DFA matching. Specifying partial
 matching provokes an error return ("bad option value") from
 \fBpcre2_substitute()\fP.
 .
@@ -957,10 +991,10 @@
 A value of zero is useful when testing the POSIX API because it causes
 \fBregexec()\fP to be called with a NULL capture vector. When not testing the
 POSIX API, a value of zero is used to cause
-\fBpcre2_match_data_create_from_pattern\fP to be called, in order to create a
+\fBpcre2_match_data_create_from_pattern()\fP to be called, in order to create a
 match block of exactly the right size for the pattern. (It is not possible to
-create a match block with a zero-length ovector; there is always one pair of
-offsets.)
+create a match block with a zero-length ovector; there is always at least one
+pair of offsets.)
 .
 .
 .SS "Passing the subject as zero-terminated"
@@ -972,7 +1006,7 @@
 be passed as PCRE2_ZERO_TERMINATED. (When matching via the POSIX interface,
 this modifier has no effect, as there is no facility for passing a length.)
 .P
-When testing \fBpcre2_substitute\fP, this modifier also has the effect of
+When testing \fBpcre2_substitute()\fP, this modifier also has the effect of
 passing the replacement string as zero-terminated.
 .
 .
@@ -1237,6 +1271,6 @@
 .rs
 .sp
 .nf
-Last updated: 12 November 2014
+Last updated: 14 November 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/doc/pcre2test.txt    2014-11-14 18:41:20 UTC (rev 147)
@@ -150,17 +150,18 @@
                  Behave as if each subject line contains the given modifiers.


        -t        Run each compile and match many times with a timer, and  out-
-                 put the resulting times per compile or match. You can control
-                 the number of iterations that are used for timing by  follow-
-                 ing  -t  with  a  number  (as  a separate item on the command
-                 line). For  example,  "-t  1000"  iterates  1000  times.  The
-                 default is to iterate 500,000 times.
+                 put  the  resulting  times  per compile or match. When JIT is
+                 used, separate times are given for the  initial  compile  and
+                 the  JIT  compile.  You  can control the number of iterations
+                 that are used for timing by following -t with a number (as  a
+                 separate  item  on  the command line). For example, "-t 1000"
+                 iterates 1000 times. The default is to iterate 500,000 times.


        -tm       This is like -t except that it times only the matching phase,
                  not the compile phase.


-       -T -TM    These behave like -t and -tm, but in addition, at the end  of
-                 a  run, the total times for all compiles and matches are out-
+       -T -TM    These  behave like -t and -tm, but in addition, at the end of
+                 a run, the total times for all compiles and matches are  out-
                  put.


        -version  Output the PCRE2 version number and then exit.
@@ -168,139 +169,139 @@


DESCRIPTION

-       If pcre2test is given two filename arguments, it reads from  the  first
+       If  pcre2test  is given two filename arguments, it reads from the first
        and writes to the second. If the first name is "-", input is taken from
-       the standard input. If pcre2test is given only one argument,  it  reads
+       the  standard  input. If pcre2test is given only one argument, it reads
        from that file and writes to stdout. Otherwise, it reads from stdin and
-       writes to stdout. When the input is a terminal,  it  prompts  for  each
-       line  of  input, using "re>" to prompt for regular expression patterns,
+       writes  to  stdout.  When  the input is a terminal, it prompts for each
+       line of input, using "re>" to prompt for regular  expression  patterns,
        and "data>" to prompt for subject lines.


-       When pcre2test is built, a configuration option  can  specify  that  it
-       should  be linked with the libreadline or libedit library. When this is
-       done, if the input is from a terminal, it is read using the  readline()
+       When  pcre2test  is  built,  a configuration option can specify that it
+       should be linked with the libreadline or libedit library. When this  is
+       done,  if the input is from a terminal, it is read using the readline()
        function. This provides line-editing and history facilities. The output
        from the -help option states whether or not readline() will be used.


-       The program handles any number of tests, each of which  consists  of  a
-       set  of input lines. Each set starts with a regular expression pattern,
+       The  program  handles  any number of tests, each of which consists of a
+       set of input lines. Each set starts with a regular expression  pattern,
        followed by any number of subject lines to be matched against that pat-
-       tern.  In  between  sets  of test data, command lines that begin with a
-       hash (#) character may appear. This file  format,  with  some  restric-
+       tern. In between sets of test data, command lines  that  begin  with  a
+       hash  (#)  character  may  appear. This file format, with some restric-
        tions, can also be processed by the perltest.pl script that is distrib-
-       uted with PCRE2 as a means of checking that the behaviour of PCRE2  and
+       uted  with PCRE2 as a means of checking that the behaviour of PCRE2 and
        Perl is the same.


-       Each  subject line is matched separately and independently. If you want
+       Each subject line is matched separately and independently. If you  want
        to do multi-line matches, you have to use the \n escape sequence (or \r
-       or  \r\n,  etc.,  depending on the newline setting) in a single line of
-       input to encode the newline sequences. There is no limit on the  length
-       of  subject  lines; the input buffer is automatically extended if it is
-       too small. There is a replication feature that  makes  it  possible  to
+       or \r\n, etc., depending on the newline setting) in a  single  line  of
+       input  to encode the newline sequences. There is no limit on the length
+       of subject lines; the input buffer is automatically extended if  it  is
+       too  small.  There  is  a replication feature that makes it possible to
        generate long subject lines without having to supply them explicitly.


-       An  empty  line  or  the end of the file signals the end of the subject
-       lines for a test, at which point a  new  pattern  or  command  line  is
+       An empty line or the end of the file signals the  end  of  the  subject
+       lines  for  a  test,  at  which  point a new pattern or command line is
        expected if there is still input to be read.



COMMAND LINES

-       In  between sets of test data, a line that begins with a hash (#) char-
-       acter is interpreted as a command line. If the first character is  fol-
-       lowed  by  white space or an exclamation mark, the line is treated as a
-       comment, and ignored.  Otherwise, the  following  commands  are  recog-
+       In between sets of test data, a line that begins with a hash (#)  char-
+       acter  is interpreted as a command line. If the first character is fol-
+       lowed by white space or an exclamation mark, the line is treated  as  a
+       comment,  and  ignored.   Otherwise,  the following commands are recog-
        nized:


          #forbid_utf


-       Subsequent   patterns   automatically   have  the  PCRE2_NEVER_UTF  and
+       Subsequent  patterns  automatically  have   the   PCRE2_NEVER_UTF   and
        PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
-       property  features.  This is a trigger guard that is used in test files
-       to ensure that UTF/Unicode tests are not accidentally  added  to  files
-       that  are  used  when  UTF support is not included in the library. This
-       effect can also be obtained by the use of #pattern; the  difference  is
-       that  #forbid_utf  cannot  be  unset, and the automatic options are not
+       property features. This is a trigger guard that is used in  test  files
+       to  ensure  that  UTF/Unicode tests are not accidentally added to files
+       that are used when UTF support is not included  in  the  library.  This
+       effect  can  also be obtained by the use of #pattern; the difference is
+       that #forbid_utf cannot be unset, and the  automatic  options  are  not
        displayed in pattern information, to avoid cluttering up test output.


          #pattern <modifier-list>


-       This command sets a default modifier list that applies  to  all  subse-
+       This  command  sets  a default modifier list that applies to all subse-
        quent patterns. Modifiers on a pattern can change these settings.


          #perltest


-       The  appearance of this line causes all subsequent modifier settings to
+       The appearance of this line causes all subsequent modifier settings  to
        be checked for compatibility with the perltest.pl script, which is used
-       to  confirm that Perl gives the same results as PCRE2. Also, apart from
-       comment lines, none of the other command lines are  permitted,  because
-       they  and  many  of the modifiers are specific to pcre2test, and should
-       not be used in test files that are also processed by  perltest.pl.  The
-       #perltest  command  helps detect tests that are accidentally put in the
+       to confirm that Perl gives the same results as PCRE2. Also, apart  from
+       comment  lines,  none of the other command lines are permitted, because
+       they and many of the modifiers are specific to  pcre2test,  and  should
+       not  be  used in test files that are also processed by perltest.pl. The
+       #perltest command helps detect tests that are accidentally put  in  the
        wrong file.


          #subject <modifier-list>


-       This command sets a default modifier list that applies  to  all  subse-
-       quent  subject lines. Modifiers on a subject line can change these set-
+       This  command  sets  a default modifier list that applies to all subse-
+       quent subject lines. Modifiers on a subject line can change these  set-
        tings.



MODIFIER SYNTAX

        Modifier lists are used with both pattern and subject lines. Items in a
-       list  are  separated by commas and optional white space. Some modifiers
-       may be given for both patterns and subject lines,  whereas  others  are
-       valid  for  one  or  the other only. Each modifier has a long name, for
+       list are separated by commas and optional white space.  Some  modifiers
+       may  be  given  for both patterns and subject lines, whereas others are
+       valid for one or the other only. Each modifier has  a  long  name,  for
        example "anchored", and some of them must be followed by an equals sign
        and a value, for example, "offset=12".  Modifiers that do not take val-
        ues may be preceded by a minus sign to turn off a previous default set-
        ting.


        A few of the more common modifiers can also be specified as single let-
-       ters, for example "i" for "caseless". In documentation,  following  the
+       ters,  for  example "i" for "caseless". In documentation, following the
        Perl convention, these are written with a slash ("the /i modifier") for
-       clarity. Abbreviated modifiers must all be concatenated  in  the  first
-       item  of a modifier list. If the first item is not recognized as a long
-       modifier name, it is interpreted as a sequence of these  abbreviations.
+       clarity.  Abbreviated  modifiers  must all be concatenated in the first
+       item of a modifier list. If the first item is not recognized as a  long
+       modifier  name, it is interpreted as a sequence of these abbreviations.
        For example:


          /abc/ig,newline=cr,jit=3


-       This  is  a pattern line whose modifier list starts with two one-letter
-       modifiers (/i and /g). The lower-case  abbreviated  modifiers  are  the
+       This is a pattern line whose modifier list starts with  two  one-letter
+       modifiers  (/i  and  /g).  The lower-case abbreviated modifiers are the
        same as used in Perl.



PATTERN SYNTAX

-       A  pattern line must start with one of the following characters (common
+       A pattern line must start with one of the following characters  (common
        symbols, excluding pattern meta-characters):


          / ! " ' ` - = _ : ; , % & @ ~


-       This is interpreted as the pattern's delimiter.  A  regular  expression
-       may  be  continued  over several input lines, in which case the newline
+       This  is  interpreted  as the pattern's delimiter. A regular expression
+       may be continued over several input lines, in which  case  the  newline
        characters are included within it. It is possible to include the delim-
        iter within the pattern by escaping it with a backslash, for example


          /abc\/def/


-       If  you do this, the escape and the delimiter form part of the pattern,
+       If you do this, the escape and the delimiter form part of the  pattern,
        but since the delimiters are all non-alphanumeric, this does not affect
-       its  interpretation.  If  the terminating delimiter is immediately fol-
+       its interpretation. If the terminating delimiter  is  immediately  fol-
        lowed by a backslash, for example,


          /abc/\


-       then a backslash is added to the end of the pattern. This  is  done  to
-       provide  a  way of testing the error condition that arises if a pattern
+       then  a  backslash  is added to the end of the pattern. This is done to
+       provide a way of testing the error condition that arises if  a  pattern
        finishes with a backslash, because


          /abc\/


-       is interpreted as the first line of a pattern that starts with  "abc/",
-       causing  pcre2test to read the next line as a continuation of the regu-
+       is  interpreted as the first line of a pattern that starts with "abc/",
+       causing pcre2test to read the next line as a continuation of the  regu-
        lar expression.


        A pattern can be followed by a modifier list (details below).
@@ -308,7 +309,7 @@


SUBJECT LINE SYNTAX

-       Before   each   subject   line   is   passed   to   pcre2_match()    or
+       Before    each   subject   line   is   passed   to   pcre2_match()   or
        pcre2_dfa_match(), leading and trailing white space is removed, and the
        line is scanned for backslash escapes. The following provide a means of
        encoding non-printing characters in a visible way:
@@ -328,23 +329,23 @@
          \x{hh...}  hexadecimal character (any number of hex digits)


        The use of \x{hh...} is not dependent on the use of the utf modifier on
-       the pattern. It is recognized always. There may be any number of  hexa-
-       decimal  digits  inside  the  braces; invalid values provoke error mes-
+       the  pattern. It is recognized always. There may be any number of hexa-
+       decimal digits inside the braces; invalid  values  provoke  error  mes-
        sages.


-       Note that \xhh specifies one byte rather than one  character  in  UTF-8
-       mode;  this  makes it possible to construct invalid UTF-8 sequences for
-       testing purposes. On the other hand, \x{hh} is interpreted as  a  UTF-8
-       character  in UTF-8 mode, generating more than one byte if the value is
-       greater than 127.  When testing the 8-bit library not  in  UTF-8  mode,
+       Note  that  \xhh  specifies one byte rather than one character in UTF-8
+       mode; this makes it possible to construct invalid UTF-8  sequences  for
+       testing  purposes.  On the other hand, \x{hh} is interpreted as a UTF-8
+       character in UTF-8 mode, generating more than one byte if the value  is
+       greater  than  127.   When testing the 8-bit library not in UTF-8 mode,
        \x{hh} generates one byte for values less than 256, and causes an error
        for greater values.


        In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
        possible to construct invalid UTF-16 sequences for testing purposes.


-       In  UTF-32  mode,  all  4- to 8-digit \x{...} values are accepted. This
-       makes it possible to construct invalid  UTF-32  sequences  for  testing
+       In UTF-32 mode, all 4- to 8-digit \x{...}  values  are  accepted.  This
+       makes  it  possible  to  construct invalid UTF-32 sequences for testing
        purposes.


        There is a special backslash sequence that specifies replication of one
@@ -352,38 +353,38 @@


          \[<characters>]{<count>}


-       This makes it possible to test long strings without having  to  provide
+       This  makes  it possible to test long strings without having to provide
        them as part of the file. For example:


          \[abc]{4}


-       is  converted to "abcabcabcabc". This feature does not support nesting.
+       is converted to "abcabcabcabc". This feature does not support  nesting.
        To include a closing square bracket in the characters, code it as \x5D.


-       A backslash followed by an equals sign marke the  end  of  the  subject
+       A  backslash  followed  by  an equals sign marke the end of the subject
        string and the start of a modifier list. For example:


          abc\=notbol,notempty


-       A  backslash  followed  by  any  other  non-alphanumeric character just
+       A backslash followed  by  any  other  non-alphanumeric  character  just
        escapes that character. A backslash followed by anything else causes an
-       error.  However,  if the very last character in the line is a backslash
-       (and there is no modifier list), it is ignored. This  gives  a  way  of
-       passing  an  empty line as data, since a real empty line terminates the
+       error. However, if the very last character in the line is  a  backslash
+       (and  there  is  no  modifier list), it is ignored. This gives a way of
+       passing an empty line as data, since a real empty line  terminates  the
        data input.



PATTERN MODIFIERS

        There are three types of modifier that can appear in pattern lines, two
-       of  which  may also be used in a #pattern command. A pattern's modifier
+       of which may also be used in a #pattern command. A  pattern's  modifier
        list can add to or override default modifiers that were set by a previ-
        ous #pattern command.


    Setting compilation options


-       The  following modifiers set options for pcre2_compile(). The most com-
-       mon ones have single-letter abbreviations. See pcreapi for  a  descrip-
+       The following modifiers set options for pcre2_compile(). The most  com-
+       mon  ones  have single-letter abbreviations. See pcreapi for a descrip-
        tion of their effects.


              allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
@@ -409,13 +410,13 @@
              utf                       set PCRE2_UTF


        As well as turning on the PCRE2_UTF option, the utf modifier causes all
-       non-printing characters in output  strings  to  be  printed  using  the
-       \x{hh...}  notation. Otherwise, those less than 0x100 are output in hex
+       non-printing  characters  in  output  strings  to  be printed using the
+       \x{hh...} notation. Otherwise, those less than 0x100 are output in  hex
        without the curly brackets.


    Setting compilation controls


-       The following modifiers  affect  the  compilation  process  or  request
+       The  following  modifiers  affect  the  compilation  process or request
        information about the pattern:


              bsr=[anycrlf|unicode]     specify \R handling
@@ -437,7 +438,6 @@
              tables=[0|1|2]            select internal tables


        The effects of these modifiers are described in the following sections.
-       FIXME: Give more examples.


    Newline and \R handling


@@ -468,8 +468,33 @@

        The  info  modifier  requests  information  about  the compiled pattern
        (whether it is anchored, has a fixed first character, and so  on).  The
-       information is obtained from the pcre2_pattern_info() function.
+       information  is  obtained  from the pcre2_pattern_info() function. Here
+       are some typical examples:


+           re> /(?i)(^a|^b)/m,info
+         Capturing subpattern count = 1
+         Compile options: multiline
+         Overall options: caseless multiline
+         First code unit at start or follows newline
+         Subject length lower bound = 1
+
+           re> /(?i)abc/info
+         Capturing subpattern count = 0
+         Compile options: <none>
+         Overall options: caseless
+         First code unit = 'a' (caseless)
+         Last code unit = 'c' (caseless)
+         Subject length lower bound = 3
+
+       "Compile options" are those specified to the compile function; "overall
+       options" have added options that are taken or deduced from the pattern.
+       If both sets of options are the same, just a single "options"  line  is
+       output.  "First  code  unit" is where any match must start; if there is
+       more than one they are listed as  "starting  code  units".  "Last  code
+       unit"  is the last literal code unit that must be present in any match.
+       This is not necessarily the last character.  These lines are omitted if
+       no starting or ending code units are recorded.
+
    Specifying a pattern in hex


        The hex modifier specifies that the characters of the pattern are to be
@@ -482,7 +507,7 @@
        binary zero characters. By default, pcre2test passes patterns as  zero-
        terminated   strings   to   pcre2_compile(),   giving   the  length  as
        PCRE2_ZERO_TERMINATED.  However, for patterns specified in hexadecimal,
-       the length of the pattern is passed.
+       the actual length of the pattern is passed.


    JIT compilation


@@ -505,7 +530,7 @@
        size of the JIT stack.


        If  the  jitfast  modifier is specified, matching is done using the JIT
-       "fast path" interface (pcre2_jit_match()), which skips some of the san-
+       "fast path" interface, pcre2_jit_match(), which skips some of the  san-
        ity  checks that are done by pcre2_match(), and of course does not work
        when JIT is not supported. If jitfast is specified without  jit,  jit=7
        is assumed.
@@ -533,12 +558,17 @@


    Showing pattern memory


-       The /memory modifier causes the size in bytes of the memory block  used
-       to  hold  the  compiled pattern to be output. This does not include the
-       size of the pcre2_code block; it is just the actual compiled  data.  If
-       the pattern is subsequently passed to the JIT compiler, the size of the
-       JIT compiled code is also output.
+       The /memory modifier causes the size in bytes of  the  memory  used  to
+       hold  the compiled pattern to be output. This does not include the size
+       of the pcre2_code block; it is just the actual compiled  data.  If  the
+       pattern is subsequently passed to the JIT compiler, the size of the JIT
+       compiled code is also output. Here is an example:


+           re> /a(b)c/jit,memory
+         Memory allocation (code space): 21
+         Memory allocation (JIT code): 1910
+
+
    Limiting nested parentheses


        The parens_nest_limit modifier sets a limit  on  the  depth  of  nested
@@ -573,7 +603,7 @@
        mentation  for  details).  If  the  number specified by the modifier is
        greater than zero, pcre2_set_compile_recursion_guard() is called to set
        up  callback  from pcre2_compile() to a local function. The argument it
-       is passed is the current nesting parenthesis depth; if this is  greater
+       receives is the current nesting parenthesis depth; if this  is  greater
        than the value given by the modifier, non-zero is returned, causing the
        compilation to be aborted.


@@ -606,6 +636,7 @@
              allusedtext         show all consulted text
          /g  global              global matching
              mark                show mark values
+             replace=<string>    specify a replacement string
              startchar           show starting character when relevant


        These modifiers may not appear in a #pattern command. If you want  them
@@ -671,31 +702,31 @@
              offset=<n>                set starting offset
              ovector=<n>               set size of output vector
              recursion_limit=<n>       set a recursion limit
+             replace=<string>          specify a replacement string
              startchar                 show startchar when relevant
              zero_terminate            pass the subject as zero-terminated


        The effects of these modifiers are described in the following sections.
-       FIXME: Give more examples.


    Showing more text


-       The aftertext modifier requests that as well  as  outputting  the  sub-
-       string  that  matched  the entire pattern, pcre2test should in addition
-       output the remainder of the subject string. This is  useful  for  tests
-       where  the  subject contains multiple copies of the same substring. The
-       allaftertext modifier requests the same action for captured  substrings
-       as  well  as  the main matched substring. In each case the remainder is
-       output on the following line with a plus character following  the  cap-
+       The  aftertext  modifier  requests  that as well as outputting the sub-
+       string that matched the entire pattern, pcre2test  should  in  addition
+       output  the  remainder  of the subject string. This is useful for tests
+       where the subject contains multiple copies of the same  substring.  The
+       allaftertext  modifier requests the same action for captured substrings
+       as well as the main matched substring. In each case  the  remainder  is
+       output  on  the following line with a plus character following the cap-
        ture number.


-       The  allusedtext modifier requests that all the text that was consulted
-       during a successful pattern match by the interpreter should  be  shown.
-       This  feature  is not supported for JIT matching, and if requested with
-       JIT it is ignored (with  a  warning  message).  Setting  this  modifier
+       The allusedtext modifier requests that all the text that was  consulted
+       during  a  successful pattern match by the interpreter should be shown.
+       This feature is not supported for JIT matching, and if  requested  with
+       JIT  it  is  ignored  (with  a  warning message). Setting this modifier
        affects the output if there is a lookbehind at the start of a match, or
-       a lookahead at the end, or if \K is used  in  the  pattern.  Characters
-       that  precede or follow the start and end of the actual match are indi-
-       cated in the output by '<' or '>' characters underneath them.  Here  is
+       a  lookahead  at  the  end, or if \K is used in the pattern. Characters
+       that precede or follow the start and end of the actual match are  indi-
+       cated  in  the output by '<' or '>' characters underneath them. Here is
        an example:


            re> /(?<=pqr)abc(?=xyz)/
@@ -703,15 +734,15 @@
           0: pqrabcxyz
              <<<   >>>


-       This  shows  that  the  matched string is "abc", with the preceding and
+       This shows that the matched string is "abc",  with  the  preceding  and
        following strings "pqr" and "xyz" also consulted during the match.


-       The startchar modifier requests that the  starting  character  for  the
-       match  be  indicated,  if  it  is different to the start of the matched
+       The  startchar  modifier  requests  that the starting character for the
+       match be indicated, if it is different to  the  start  of  the  matched
        string. The only time when this occurs is when \K has been processed as
        part of the match. In this situation, the output for the matched string
-       is displayed from the starting character  instead  of  from  the  match
-       point,  with  circumflex  characters  under the earlier characters. For
+       is  displayed  from  the  starting  character instead of from the match
+       point, with circumflex characters under  the  earlier  characters.  For
        example:


            re> /abc\Kxyz/
@@ -719,7 +750,7 @@
           0: abcxyz
              ^^^


-       Unlike allusedtext, the startchar modifier can be used with JIT.   How-
+       Unlike  allusedtext, the startchar modifier can be used with JIT.  How-
        ever, these two modifiers are mutually exclusive.


    Showing the value of all capture groups
@@ -727,183 +758,223 @@
        The allcaptures modifier requests that the values of all potential cap-
        tured parentheses be output after a match. By default, only those up to
        the highest one actually used in the match are output (corresponding to
-       the return code from pcre2_match()). Groups that did not take  part  in
+       the  return  code from pcre2_match()). Groups that did not take part in
        the match are output as "<unset>".


    Testing callouts


-       A  callout function is supplied when pcre2test calls the library match-
-       ing functions, unless callout_none is specified. If callout_capture  is
+       A callout function is supplied when pcre2test calls the library  match-
+       ing  functions, unless callout_none is specified. If callout_capture is
        set, the current captured groups are output when a callout occurs.


-       The  callout_fail modifier can be given one or two numbers. If there is
+       The callout_fail modifier can be given one or two numbers. If there  is
        only one number, 1 is returned instead of 0 when a callout of that num-
-       ber  is  reached.  If two numbers are given, 1 is returned when callout
+       ber is reached. If two numbers are given, 1 is  returned  when  callout
        <n> is reached for the <m>th time.


-       The callout_data modifier can be given an unsigned or a  negative  num-
-       ber.   Any  value  other than zero is used as a return from pcre2test's
+       The  callout_data  modifier can be given an unsigned or a negative num-
+       ber.  Any value other than zero is used as a  return  from  pcre2test's
        callout function.


+   Finding all matches in a string
+
+       Searching for all possible matches within a subject can be requested by
+       the global or /altglobal modifier. After finding a match, the  matching
+       function  is  called  again to search the remainder of the subject. The
+       difference between global and altglobal is that  the  former  uses  the
+       start_offset  argument  to  pcre2_match() or pcre2_dfa_match() to start
+       searching at a new point within the entire string (which is  what  Perl
+       does), whereas the latter passes over a shortened substring. This makes
+       a difference to the matching process if the pattern begins with a look-
+       behind assertion (including \b or \B).
+
+       If  an  empty  string  is  matched,  the  next  match  is done with the
+       PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
+       for another, non-empty, match at the same point in the subject. If this
+       match fails, the start offset is advanced,  and  the  normal  match  is
+       retried.  This  imitates the way Perl handles such cases when using the
+       /g modifier or the split() function.  Normally,  the  start  offset  is
+       advanced  by  one  character,  but if the newline convention recognizes
+       CRLF as a newline, and the current character is CR followed by  LF,  an
+       advance of two is used.
+
    Testing substring extraction functions


-       The copy  and  get  modifiers  can  be  used  to  test  the  pcre2_sub-
+       The  copy  and  get  modifiers  can  be  used  to  test  the pcre2_sub-
        string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
-       given more than once, and each can specify a group name or number,  for
+       given  more than once, and each can specify a group name or number, for
        example:


           abcd\=copy=1,copy=3,get=G1


-       If  the  #subject  command  is  used to set default copy and get lists,
-       these can be unset by specifying a negative number for numbered  groups
+       If the #subject command is used to set  default  copy  and  get  lists,
+       these  can be unset by specifying a negative number for numbered groups
        and an empty name for named groups.


-       The  getall  modifier  tests pcre2_substring_list_get(), which extracts
+       The getall modifier tests  pcre2_substring_list_get(),  which  extracts
        all captured substrings.


-       If the subject line is successfully matched, the  substrings  extracted
-       by  the  convenience  functions  are  output  with C, G, or L after the
-       string number instead of a colon. This is in  addition  to  the  normal
-       full  list.  The string length (that is, the return from the extraction
+       If  the  subject line is successfully matched, the substrings extracted
+       by the convenience functions are output with  C,  G,  or  L  after  the
+       string  number  instead  of  a colon. This is in addition to the normal
+       full list. The string length (that is, the return from  the  extraction
        function) is given in parentheses after each substring.


- Finding all matches in a string
+ Testing the substitution function

-       Searching for all possible matches within a subject can be requested by
-       the  global or /altglobal modifier. After finding a match, the matching
-       function is called again to search the remainder of  the  subject.  The
-       difference  between  global  and  altglobal is that the former uses the
-       start_offset argument to pcre2_match() or  pcre2_dfa_match()  to  start
-       searching  at  a new point within the entire string (which is what Perl
-       does), whereas the latter passes over a shortened substring. This makes
-       a difference to the matching process if the pattern begins with a look-
-       behind assertion (including \b or \B).
+       If  the  replace  modifier  is  set, the pcre2_substitute() function is
+       called instead  of  one  of  the  matching  functions.  Unlike  subject
+       strings,  pcre2test  does  not  process  replacement strings for escape
+       sequences. In UTF mode, a replacement string is checked to see if it is
+       a valid UTF-8 string.  If so, it is correctly converted to a UTF string
+       of the appropriate code unit width. If it is not a valid UTF-8  string,
+       the individual code units are copied directly. This provides a means of
+       passing an invalid UTF-8 string for testing purposes.


-       If an empty string  is  matched,  the  next  match  is  done  with  the
-       PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
-       for another, non-empty, match at the same point in the subject. If this
-       match  fails,  the  start  offset  is advanced, and the normal match is
-       retried. This imitates the way Perl handles such cases when  using  the
-       /g  modifier  or  the  split()  function. Normally, the start offset is
-       advanced by one character, but if  the  newline  convention  recognizes
-       CRLF  as  a newline, and the current character is CR followed by LF, an
-       advance of two is used.
+       If the global modifier is set,  PCRE2_SUBSTITUTE_GLOBAL  is  passed  to
+       pcre2_substitute().  After  a  successful  substitution,  the  modified
+       string is output, preceded by the number of replacements. This  may  be
+       zero  if there were no matches. Here is a simple example of a substitu-
+       tion test:


+         /abc/replace=xxx
+             =abc=abc=
+          1: =xxx=abc=
+             =abc=abc=\=global
+          2: =xxx=xxx=
+
+       Subject and replacement strings should be  kept  relatively  short  for
+       substitution  tests, as fixed-size buffers are used. To make it easy to
+       test for buffer overflow, if the replacement string starts with a  num-
+       ber  in square brackets, that number is passed to pcre2_substitute() as
+       the size of the output buffer, with the replacement string starting  at
+       the next character. Here is an example that tests the edge case:
+
+         /abc/
+             123abc123\=replace=[10]XYZ
+          1: 123XYZ123
+             123abc123\=replace=[9]XYZ
+         Failed: error -47: no more memory
+
+       A replacement string is ignored with POSIX and DFA matching. Specifying
+       partial matching provokes an error return  ("bad  option  value")  from
+       pcre2_substitute().
+
    Setting the JIT stack size


-       The jitstack modifier provides a way of setting the maximum stack  size
-       that  is  used  by the just-in-time optimization code. It is ignored if
+       The  jitstack modifier provides a way of setting the maximum stack size
+       that is used by the just-in-time optimization code. It  is  ignored  if
        JIT optimization is not being used. The value is a number of kilobytes.
        Providing a stack that is larger than the default 32K is necessary only
        for very complicated patterns.


    Setting match and recursion limits


-       The match_limit and recursion_limit modifiers set the appropriate  lim-
+       The  match_limit and recursion_limit modifiers set the appropriate lim-
        its in the match context. These values are ignored when the find_limits
        modifier is specified.


    Finding minimum limits


-       If the find_limits modifier is present, pcre2test  calls  pcre2_match()
-       several  times,  setting  different  values  in  the  match context via
-       pcre2_set_match_limit() and pcre2_set_recursion_limit() until it  finds
-       the  minimum values for each parameter that allow pcre2_match() to com-
+       If  the  find_limits modifier is present, pcre2test calls pcre2_match()
+       several times, setting  different  values  in  the  match  context  via
+       pcre2_set_match_limit()  and pcre2_set_recursion_limit() until it finds
+       the minimum values for each parameter that allow pcre2_match() to  com-
        plete without error.


        If JIT is being used, only the match limit is relevant. If DFA matching
-       is  being used, neither limit is relevant, and this modifier is ignored
+       is being used, neither limit is relevant, and this modifier is  ignored
        (with a warning message).


-       The match_limit number is a measure of the amount of backtracking  that
-       takes  place,  and  learning  the minimum value can be instructive. For
-       most simple matches, the number is quite small, but for  patterns  with
-       very  large numbers of matching possibilities, it can become large very
-       quickly   with   increasing   length    of    subject    string.    The
-       match_limit_recursion  number  is  a  measure of how much stack (or, if
-       PCRE2 is compiled with NO_RECURSE, how much heap) memory is  needed  to
+       The  match_limit number is a measure of the amount of backtracking that
+       takes place, and learning the minimum value  can  be  instructive.  For
+       most  simple  matches, the number is quite small, but for patterns with
+       very large numbers of matching possibilities, it can become large  very
+       quickly    with    increasing    length    of   subject   string.   The
+       match_limit_recursion number is a measure of how  much  stack  (or,  if
+       PCRE2  is  compiled with NO_RECURSE, how much heap) memory is needed to
        complete the match attempt.


    Showing MARK names



        The mark modifier causes the names from backtracking control verbs that
-       are returned from calls to pcre2_match() to be displayed. If a mark  is
-       returned  for a match, non-match, or partial match, pcre2test shows it.
-       For a match, it is on a line by itself, tagged with  "MK:".  Otherwise,
+       are  returned from calls to pcre2_match() to be displayed. If a mark is
+       returned for a match, non-match, or partial match, pcre2test shows  it.
+       For  a  match, it is on a line by itself, tagged with "MK:". Otherwise,
        it is added to the non-match message.


    Showing memory usage


-       The  memory  modifier causes pcre2test to log all memory allocation and
+       The memory modifier causes pcre2test to log all memory  allocation  and
        freeing calls that occur during a match operation.


    Setting a starting offset


-       The offset modifier sets an offset  in  the  subject  string  at  which
+       The  offset  modifier  sets  an  offset  in the subject string at which
        matching starts. Its value is a number of code units, not characters.


    Setting the size of the output vector


-       The  ovector  modifier  applies  only  to  the subject line in which it
-       appears, though of course it can also be used to set  a  default  in  a
-       #subject  command. It specifies the number of pairs of offsets that are
+       The ovector modifier applies only to  the  subject  line  in  which  it
+       appears,  though  of  course  it can also be used to set a default in a
+       #subject command. It specifies the number of pairs of offsets that  are
        available for storing matching information. The default is 15.


-       A value of zero is useful when testing the POSIX API because it  causes
+       A  value of zero is useful when testing the POSIX API because it causes
        regexec() to be called with a NULL capture vector. When not testing the
-       POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
-       ate_from_pattern  to  be  called,  in  order to create a match block of
+       POSIX  API,  a  value  of  zero  is used to cause pcre2_match_data_cre-
+       ate_from_pattern() to be called, in order to create a  match  block  of
        exactly the right size for the pattern. (It is not possible to create a
-       match  block  with  a  zero-length ovector; there is always one pair of
-       offsets.)
+       match block with a zero-length ovector; there is always  at  least  one
+       pair of offsets.)


    Passing the subject as zero-terminated


        By default, the subject string is passed to a native API matching func-
        tion with its correct length. In order to test the facility for passing
-       a zero-terminated string, the zero_terminate modifier is  provided.  It
+       a  zero-terminated  string, the zero_terminate modifier is provided. It
        causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
-       via the POSIX interface, this modifier has no effect, as  there  is  no
+       via  the  POSIX  interface, this modifier has no effect, as there is no
        facility for passing a length.)


-       When  testing  pcre2_substitute,  this  modifier also has the effect of
+       When testing pcre2_substitute(), this modifier also has the  effect  of
        passing the replacement string as zero-terminated.



THE ALTERNATIVE MATCHING FUNCTION

-       By default,  pcre2test  uses  the  standard  PCRE2  matching  function,
+       By  default,  pcre2test  uses  the  standard  PCRE2  matching function,
        pcre2_match() to match each subject line. PCRE2 also supports an alter-
-       native matching function, pcre2_dfa_match(), which operates in  a  dif-
-       ferent  way, and has some restrictions. The differences between the two
+       native  matching  function, pcre2_dfa_match(), which operates in a dif-
+       ferent way, and has some restrictions. The differences between the  two
        functions are described in the pcre2matching documentation.


-       If the dfa modifier is set, the alternative matching function is  used.
-       This  function  finds all possible matches at a given point in the sub-
-       ject. If, however, the dfa_shortest modifier is set,  processing  stops
-       after  the  first  match is found. This is always the shortest possible
+       If  the dfa modifier is set, the alternative matching function is used.
+       This function finds all possible matches at a given point in  the  sub-
+       ject.  If,  however, the dfa_shortest modifier is set, processing stops
+       after the first match is found. This is always  the  shortest  possible
        match.



DEFAULT OUTPUT FROM pcre2test

-       This section describes the output when the  normal  matching  function,
+       This  section  describes  the output when the normal matching function,
        pcre2_match(), is being used.


-       When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
-       strings, starting with number 0 for the string that matched  the  whole
-       pattern.    Otherwise,  it  outputs  "No  match"  when  the  return  is
-       PCRE2_ERROR_NOMATCH, or "Partial  match:"  followed  by  the  partially
-       matching  substring  when the return is PCRE2_ERROR_PARTIAL. (Note that
-       this is the entire substring that  was  inspected  during  the  partial
-       match;  it  may  include  characters before the actual match start if a
+       When a match succeeds, pcre2test outputs  the  list  of  captured  sub-
+       strings,  starting  with number 0 for the string that matched the whole
+       pattern.   Otherwise,  it  outputs  "No  match"  when  the  return   is
+       PCRE2_ERROR_NOMATCH,  or  "Partial  match:"  followed  by the partially
+       matching substring when the return is PCRE2_ERROR_PARTIAL.  (Note  that
+       this  is  the  entire  substring  that was inspected during the partial
+       match; it may include characters before the actual  match  start  if  a
        lookbehind assertion, \K, \b, or \B was involved.)


        For any other return, pcre2test outputs the PCRE2 negative error number
-       and  a  short  descriptive  phrase. If the error is a failed UTF string
-       check, the offset of the start of the failing character and the  reason
-       code  are  also  output. Here is an example of an interactive pcre2test
+       and a short descriptive phrase. If the error is  a  failed  UTF  string
+       check,  the offset of the start of the failing character and the reason
+       code are also output. Here is an example of  an  interactive  pcre2test
        run.


          $ pcre2test
@@ -917,10 +988,10 @@
          No match


        Unset capturing substrings that are not followed by one that is set are
-       not  returned  by pcre2_match(), and are not shown by pcre2test. In the
-       following example, there are two capturing  substrings,  but  when  the
-       first  data  line is matched, the second, unset substring is not shown.
-       An "internal" unset substring is shown as "<unset>", as for the  second
+       not returned by pcre2_match(), and are not shown by pcre2test.  In  the
+       following  example,  there  are  two capturing substrings, but when the
+       first data line is matched, the second, unset substring is  not  shown.
+       An  "internal" unset substring is shown as "<unset>", as for the second
        data line.


            re> /(a)|(b)/
@@ -932,11 +1003,11 @@
           1: <unset>
           2: b


-       If  the strings contain any non-printing characters, they are output as
-       \xhh escapes if the value is less than 256 and UTF  mode  is  not  set.
+       If the strings contain any non-printing characters, they are output  as
+       \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
        Otherwise they are output as \x{hh...} escapes. See below for the defi-
-       nition of non-printing characters. If the /aftertext modifier  is  set,
-       the  output  for substring 0 is followed by the the rest of the subject
+       nition  of  non-printing characters. If the /aftertext modifier is set,
+       the output for substring 0 is followed by the the rest of  the  subject
        string, identified by "0+" like this:


            re> /cat/aftertext
@@ -944,7 +1015,7 @@
           0: cat
           0+ aract


-       If global matching is requested, the  results  of  successive  matching
+       If  global  matching  is  requested, the results of successive matching
        attempts are output in sequence, like this:


            re> /\Bi(\w\w)/g
@@ -956,8 +1027,8 @@
           0: ipp
           1: pp


-       "No  match" is output only if the first match attempt fails. Here is an
-       example of a failure message (the offset 4 that is specified by \>4  is
+       "No match" is output only if the first match attempt fails. Here is  an
+       example  of a failure message (the offset 4 that is specified by \>4 is
        past the end of the subject string):


            re> /xyz/
@@ -965,7 +1036,7 @@
          Error -24 (bad offset value)


        Note that whereas patterns can be continued over several lines (a plain
-       ">" prompt is used for continuations), subject lines may  not.  However
+       ">"  prompt  is used for continuations), subject lines may not. However
        newlines can be included in a subject by means of the \n escape (or \r,
        \r\n, etc., depending on the newline sequence setting).


@@ -973,7 +1044,7 @@
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION

        When the alternative matching function, pcre2_dfa_match(), is used, the
-       output  consists  of  a list of all the matches that start at the first
+       output consists of a list of all the matches that start  at  the  first
        point in the subject where there is at least one match. For example:


            re> /(tang|tangerine|tan)/
@@ -982,11 +1053,11 @@
           1: tang
           2: tan


-       (Using the normal matching function on this data  finds  only  "tang".)
-       The  longest matching string is always given first (and numbered zero).
-       After a PCRE2_ERROR_PARTIAL return, the  output  is  "Partial  match:",
-       followed  by  the  partially matching substring. (Note that this is the
-       entire substring that was inspected during the partial  match;  it  may
+       (Using  the  normal  matching function on this data finds only "tang".)
+       The longest matching string is always given first (and numbered  zero).
+       After  a  PCRE2_ERROR_PARTIAL  return,  the output is "Partial match:",
+       followed by the partially matching substring. (Note that  this  is  the
+       entire  substring  that  was inspected during the partial match; it may
        include characters before the actual match start if a lookbehind asser-
        tion, \K, \b, or \B was involved.)


@@ -1002,16 +1073,16 @@
           1: tan
           0: tan


-       The  alternative  matching function does not support substring capture,
-       so the modifiers that are concerned with captured  substrings  are  not
+       The alternative matching function does not support  substring  capture,
+       so  the  modifiers  that are concerned with captured substrings are not
        relevant.



RESTARTING AFTER A PARTIAL MATCH

-       When  the  alternative matching function has given the PCRE2_ERROR_PAR-
+       When the alternative matching function has given  the  PCRE2_ERROR_PAR-
        TIAL return, indicating that the subject partially matched the pattern,
-       you  can restart the match with additional subject data by means of the
+       you can restart the match with additional subject data by means of  the
        dfa_restart modifier. For example:


            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -1020,29 +1091,29 @@
          data> n05\=dfa,dfa_restart
           0: n05


-       For further information about partial matching,  see  the  pcre2partial
+       For  further  information  about partial matching, see the pcre2partial
        documentation.



CALLOUTS

        If the pattern contains any callout requests, pcre2test's callout func-
-       tion is called during matching. This works  with  both  matching  func-
+       tion  is  called  during  matching. This works with both matching func-
        tions. By default, the called function displays the callout number, the
-       start and current positions in the text at the callout  time,  and  the
+       start  and  current  positions in the text at the callout time, and the
        next pattern item to be tested. For example:


          --->pqrabcdef
            0    ^  ^     \d


-       This  output  indicates  that  callout  number  0  occurred for a match
-       attempt starting at the fourth character of the  subject  string,  when
-       the  pointer  was  at  the seventh character, and when the next pattern
-       item was \d. Just one circumflex is output if  the  start  and  current
+       This output indicates that  callout  number  0  occurred  for  a  match
+       attempt  starting  at  the fourth character of the subject string, when
+       the pointer was at the seventh character, and  when  the  next  pattern
+       item  was  \d.  Just  one circumflex is output if the start and current
        positions are the same.


        Callouts numbered 255 are assumed to be automatic callouts, inserted as
-       a result of the /auto_callout pattern modifier. In this  case,  instead
+       a  result  of the /auto_callout pattern modifier. In this case, instead
        of showing the callout number, the offset in the pattern, preceded by a
        plus, is output. For example:


@@ -1056,7 +1127,7 @@
           0: E*


        If a pattern contains (*MARK) items, an additional line is output when-
-       ever  a  change  of  latest mark is passed to the callout function. For
+       ever a change of latest mark is passed to  the  callout  function.  For
        example:


            re> /a(*MARK:X)bc/auto_callout
@@ -1070,30 +1141,30 @@
          +12 ^  ^
           0: abc


-       The mark changes between matching "a" and "b", but stays the  same  for
-       the  rest  of  the match, so nothing more is output. If, as a result of
-       backtracking, the mark reverts to being unset, the  text  "<unset>"  is
+       The  mark  changes between matching "a" and "b", but stays the same for
+       the rest of the match, so nothing more is output. If, as  a  result  of
+       backtracking,  the  mark  reverts to being unset, the text "<unset>" is
        output.


-       The  callout  function in pcre2test returns zero (carry on matching) by
-       default, but you can use a callout_fail modifier in a subject line  (as
+       The callout function in pcre2test returns zero (carry on  matching)  by
+       default,  but you can use a callout_fail modifier in a subject line (as
        described above) to change this and other parameters of the callout.


        Inserting callouts can be helpful when using pcre2test to check compli-
-       cated regular expressions. For further information about callouts,  see
+       cated  regular expressions. For further information about callouts, see
        the pcre2callout documentation.



NON-PRINTING CHARACTERS

        When pcre2test is outputting text in the compiled version of a pattern,
-       bytes other than 32-126 are always treated as  non-printing  characters
+       bytes  other  than 32-126 are always treated as non-printing characters
        and are therefore shown as hex escapes.


-       When  pcre2test  is outputting text that is a matched part of a subject
-       string, it behaves in the same way, unless a different locale has  been
-       set  for  the  pattern  (using the /locale modifier). In this case, the
-       isprint() function is used to  distinguish  printing  and  non-printing
+       When pcre2test is outputting text that is a matched part of  a  subject
+       string,  it behaves in the same way, unless a different locale has been
+       set for the pattern (using the /locale modifier).  In  this  case,  the
+       isprint()  function  is  used  to distinguish printing and non-printing
        characters.



@@ -1112,5 +1183,5 @@

REVISION

-       Last updated: 09 November 2014
+       Last updated: 14 November 2014
        Copyright (c) 1997-2014 University of Cambridge.


Modified: code/trunk/src/pcre2_substitute.c
===================================================================
--- code/trunk/src/pcre2_substitute.c    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/src/pcre2_substitute.c    2014-11-14 18:41:20 UTC (rev 147)
@@ -69,7 +69,7 @@


 Returns:          >= 0 number of substitutions made
                   < 0 an error code
-                  PCRE2_ERROR_BADREPLACEMENT means invalid use of $ 
+                  PCRE2_ERROR_BADREPLACEMENT means invalid use of $
 */


PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
@@ -84,14 +84,14 @@
uint32_t goptions = 0;
BOOL match_data_created = FALSE;
BOOL global = FALSE;
-PCRE2_SIZE buff_offset, lengthleft, endlength;
+PCRE2_SIZE buff_offset, lengthleft, fraglength;
PCRE2_SIZE *ovector;

/* Partial matching is not valid. */

if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION;
-
+
/* If no match data block is provided, create one. */

 if (match_data == NULL)
@@ -120,7 +120,7 @@
     }
   }
 #endif  /* SUPPORT_UNICODE */
- 
+
 /* Notice the global option and remove it from the options that are passed to
 pcre2_match(). */


@@ -151,17 +151,20 @@

   rc = pcre2_match(code, subject, length, start_offset, options|goptions,
     match_data, mcontext);
-    
-  /* Any error other than no match returns the error code. No match when not 
-  doing the special after-empty-match global rematch, or when at the end of the 
-  subject, breaks the global loop. Otherwise, advance the starting point and 
-  try again. */ 


+  /* Any error other than no match returns the error code. No match when not
+  doing the special after-empty-match global rematch, or when at the end of the
+  subject, breaks the global loop. Otherwise, advance the starting point by one
+  character, copying it to the output, and try again. */
+
   if (rc < 0)
     {
+    PCRE2_SIZE save_start;
+
     if (rc != PCRE2_ERROR_NOMATCH) goto EXIT;
     if (goptions == 0 || start_offset >= length) break;
-    start_offset++;
+
+    save_start = start_offset++;
     if ((code->overall_options & PCRE2_UTF) != 0)
       {
 #if PCRE2_CODE_UNIT_WIDTH == 8
@@ -173,20 +176,28 @@
         start_offset++;
 #endif
       }
+
+    fraglength = start_offset - save_start;
+    if (lengthleft < fraglength) goto NOROOM;
+    memcpy(buffer + buff_offset, subject + save_start,
+      fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
+    buff_offset += fraglength;
+    lengthleft -= fraglength;
+
     goptions = 0;
     continue;
     }
-    
+
   /* Handle a successful match. */


   subs++;
   if (rc == 0) rc = ovector_count;
-  endlength = ovector[0] - start_offset;
-  if (endlength >= lengthleft) goto NOROOM;
-  memcpy(buffer + buff_offset, subject + start_offset, 
-    endlength*(PCRE2_CODE_UNIT_WIDTH/8));
-  buff_offset += endlength;
-  lengthleft -= endlength;
+  fraglength = ovector[0] - start_offset;
+  if (fraglength >= lengthleft) goto NOROOM;
+  memcpy(buffer + buff_offset, subject + start_offset,
+    fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
+  buff_offset += fraglength;
+  lengthleft -= fraglength;


   for (i = 0; i < rlength; i++)
     {
@@ -196,11 +207,11 @@
       BOOL inparens;
       PCRE2_SIZE sublength;
       PCRE2_UCHAR next;
-      PCRE2_UCHAR name[33];    
- 
+      PCRE2_UCHAR name[33];
+
       if (++i == rlength) goto BAD;
       if ((next = replacement[i]) == CHAR_DOLLAR_SIGN) goto LITERAL;
- 
+
       group = -1;
       n = 0;
       inparens = FALSE;
@@ -232,7 +243,7 @@
           if (i == rlength) break;
           next = replacement[++i];
           }
-        if (n == 0) goto BAD;   
+        if (n == 0) goto BAD;
         name[n] = 0;
         }


@@ -241,7 +252,7 @@
         if (i == rlength || next != CHAR_RIGHT_CURLY_BRACKET) goto BAD;
         }
       else i--;   /* Last code unit of name/number */
-      
+
       /* Have found a syntactically correct group number or name. */


       sublength = lengthleft;
@@ -251,8 +262,8 @@
       else
         rc = pcre2_substring_copy_bynumber(match_data, group,
           buffer + buff_offset, &sublength);
-          
-      if (rc < 0) goto EXIT;    
+
+      if (rc < 0) goto EXIT;
       buff_offset += sublength;
       lengthleft -= sublength;
       }
@@ -279,17 +290,17 @@
 /* Copy the rest of the subject and return the number of substitutions. */


rc = subs;
-endlength = length - start_offset;
-if (endlength + 1 > lengthleft) goto NOROOM;
+fraglength = length - start_offset;
+if (fraglength + 1 > lengthleft) goto NOROOM;
memcpy(buffer + buff_offset, subject + start_offset,
- endlength*(PCRE2_CODE_UNIT_WIDTH/8));
-buff_offset += endlength;
+ fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
+buff_offset += fraglength;
buffer[buff_offset] = 0;
*blength = buff_offset;

EXIT:
if (match_data_created) pcre2_match_data_free(match_data);
- else match_data->rc = rc;
+ else match_data->rc = rc;
return rc;

NOROOM:

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/src/pcre2test.c    2014-11-14 18:41:20 UTC (rev 147)
@@ -164,11 +164,12 @@
 #define DFA_WS_DIMENSION 1000   /* Size of DFA workspace */
 #define DEFAULT_OVECCOUNT 15    /* Default ovector count */
 #define JUNK_OFFSET 0xdeadbeef  /* For initializing ovector */
+#define LOCALESIZE 32           /* Size of locale name */
 #define LOOPREPEAT 500000       /* Default loop count for timing */
 #define REPLACE_MODSIZE 96      /* Field for reading 8-bit replacement */
 #define VERSION_SIZE 64         /* Size of buffer for the version strings */


-/* Make sure the buffer into which replacement strings are copied is big enough
+/* Make sure the buffer into which replacement strings are copied is big enough
to hold them as 32-bit code units. */

#define REPLACE_BUFFSIZE (4*REPLACE_MODSIZE)
@@ -263,9 +264,9 @@

#define PCRE2_SUFFIX(a) a

-/* We need to be able to check input text for UTF-8 validity, whatever code
-widths are actually available, because the input to pcre2test is always in
-8-bit code units. So we include the UTF validity checking function for 8-bit
+/* We need to be able to check input text for UTF-8 validity, whatever code
+widths are actually available, because the input to pcre2test is always in
+8-bit code units. So we include the UTF validity checking function for 8-bit
code units. */

 extern int valid_utf(PCRE2_SPTR8, PCRE2_SIZE, PCRE2_SIZE *);
@@ -388,10 +389,10 @@
                     CTL_MARK|\
                     CTL_MEMORY|\
                     CTL_STARTCHAR)
-                    
-/* Structures for holding modifier information for patterns and subject strings 
-(data). Fields containing modifiers that can be set either for a pattern or a 
-subject must be at the start and in the same order in both cases so that the 
+
+/* Structures for holding modifier information for patterns and subject strings
+(data). Fields containing modifiers that can be set either for a pattern or a
+subject must be at the start and in the same order in both cases so that the
 same offset in the big table below works for both. */


 typedef struct patctl {    /* Structure for pattern modifiers. */
@@ -401,7 +402,7 @@
   uint32_t  jit;
   uint32_t  stackguard_test;
   uint32_t  tables_id;
-  uint8_t   locale[32];
+  uint8_t   locale[LOCALESIZE];
 } patctl;


 #define MAXCPYGET 10
@@ -486,7 +487,7 @@
   { "jitfast",             MOD_PAT,  MOD_CTL, CTL_JITFAST,               PO(control) },
   { "jitstack",            MOD_DAT,  MOD_INT, 0,                         DO(jitstack) },
   { "jitverify",           MOD_PAT,  MOD_CTL, CTL_JITVERIFY,             PO(control) },
-  { "locale",              MOD_PAT,  MOD_STR, 0,                         PO(locale) },
+  { "locale",              MOD_PAT,  MOD_STR, LOCALESIZE,                PO(locale) },
   { "mark",                MOD_PNDP, MOD_CTL, CTL_MARK,                  PO(control) },
   { "match_limit",         MOD_CTM,  MOD_INT, 0,                         MO(match_limit) },
   { "match_unset_backref", MOD_PAT,  MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
@@ -512,7 +513,7 @@
   { "posix",               MOD_PAT,  MOD_CTL, CTL_POSIX,                 PO(control) },
   { "ps",                  MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,        DO(options) },
   { "recursion_limit",     MOD_CTM,  MOD_INT, 0,                         MO(recursion_limit) },
-  { "replace",             MOD_PND,  MOD_STR, 0,                         PO(replacement) },
+  { "replace",             MOD_PND,  MOD_STR, REPLACE_MODSIZE,           PO(replacement) },
   { "stackguard",          MOD_PAT,  MOD_INT, 0,                         PO(stackguard_test) },
   { "startchar",           MOD_PND,  MOD_CTL, CTL_STARTCHAR,             PO(control) },
   { "tables",              MOD_PAT,  MOD_INT, 0,                         PO(tables_id) },
@@ -3141,6 +3142,12 @@
     break;


     case MOD_STR:
+    if (len + 1 > m->value)
+      {
+      fprintf(outfile, "** Overlong value for '%s' (max %d code units)\n",
+        m->name, m->value - 1);
+      return FALSE;
+      }
     memcpy(field, pp, len);
     ((uint8_t *)field)[len] = 0;
     pp = ep;
@@ -3974,8 +3981,8 @@
 if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
   return PR_ABEND;


-/* Call the JIT compiler if requested. When timing, we must free and recompile
-the pattern each time because that is the only way to free the JIT compiled
+/* Call the JIT compiler if requested. When timing, we must free and recompile
+the pattern each time because that is the only way to free the JIT compiled
code. We know that compilation will always succeed. */

 if (pat_patctl.jit != 0)
@@ -3992,7 +3999,7 @@
         pat_patctl.options|forbid_utf, &errorcode, &erroroffset, pat_context);
       start_time = clock();
       PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
-      time_taken += clock() - start_time; 
+      time_taken += clock() - start_time;
       }
     total_jit_compile_time += time_taken;
     fprintf(outfile, "JIT compile  %.4f milliseconds\n",
@@ -4000,9 +4007,9 @@
         (double)CLOCKS_PER_SEC);
     }
   else
-    { 
+    {
     PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
-    } 
+    }
   }


/* Output code size and other information if requested. */
@@ -4765,9 +4772,9 @@
PCRE2_MATCH_DATA_FREE(match_data);
PCRE2_MATCH_DATA_CREATE(match_data, max_oveccount, NULL);
}
-
-/* Replacement processing is ignored for DFA matching. */

+/* Replacement processing is ignored for DFA matching. */
+
if (dat_datctl.replacement[0] != 0 && (dat_datctl.control & CTL_DFA) != 0)
{
fprintf(outfile, "** Ignored for DFA matching: replace\n");
@@ -4799,7 +4806,7 @@
#endif

   if (timeitm)
-    fprintf(outfile, "** Timing is not supported with replace: ignored\n"); 
+    fprintf(outfile, "** Timing is not supported with replace: ignored\n");


   goption = ((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
     PCRE2_SUBSTITUTE_GLOBAL;
@@ -4828,21 +4835,21 @@
     nsize = n;
     }


- /* Now copy the replacement string to a buffer of the appropriate width. No
- escape processing is done for replacements. In UTF mode, check for an invalid
- UTF-8 input string, and if it is invalid, just copy its code units without
- UTF interpretation. This provides a means of checking that an invalid string
- is detected. Otherwise, UTF-8 can be used to include wide characters in a
+ /* Now copy the replacement string to a buffer of the appropriate width. No
+ escape processing is done for replacements. In UTF mode, check for an invalid
+ UTF-8 input string, and if it is invalid, just copy its code units without
+ UTF interpretation. This provides a means of checking that an invalid string
+ is detected. Otherwise, UTF-8 can be used to include wide characters in a
replacement. */
-
+
if (utf) badutf = valid_utf(pr, strlen((const char *)pr), &erroroffset);

   /* Not UTF or invalid UTF-8: just copy the code units. */
-  
+
   if (!utf || badutf)
     {
     while ((c = *pr++) != 0)
-      { 
+      {
 #ifdef SUPPORT_PCRE2_8
       if (test_mode == PCRE8_MODE) *r8++ = c;
 #endif
@@ -4854,9 +4861,9 @@
 #endif
       }
     }
-    
+
   /* Valid UTF-8 replacement string */
-        
+
   else while ((c = *pr++) != 0)
     {
     if (HASUTF8EXTRALEN(c)) { GETUTF8INC(c, pr); }
@@ -6314,7 +6321,7 @@


 if (showtotaltimes)
   {
-  const char *pad = ""; 
+  const char *pad = "";
   fprintf(outfile, "--------------------------------------\n");
   if (timeit > 0)
     {
@@ -6325,7 +6332,7 @@
       fprintf(outfile, "Total JIT compile  %.4f milliseconds\n",
         (((double)total_jit_compile_time * 1000.0) / (double)timeit) /
           (double)CLOCKS_PER_SEC);
-    pad = "  ";       
+    pad = "  ";
     }
   fprintf(outfile, "Total match time %s%.4f milliseconds\n", pad,
     (((double)total_match_time * 1000.0) / (double)timeitm) /


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/testdata/testinput2    2014-11-14 18:41:20 UTC (rev 147)
@@ -4073,6 +4073,9 @@
     123abc456abc789
     123abc456abc789\=g


+/(?<=abc)(|def)/g,replace=<$0>
+    123abcxyzabcdef789abcpqr
+
 # End of substitute tests 


# End of testinput2

Modified: code/trunk/testdata/testinput5
===================================================================
--- code/trunk/testdata/testinput5    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/testdata/testinput5    2014-11-14 18:41:20 UTC (rev 147)
@@ -1635,4 +1635,7 @@
 /ábc/utf,replace=XሴZ
     123ábc123


+/(?<=abc)(|def)/g,utf,replace=<$0>
+      123abcáyzabcdef789abcሴqr
+
 # End of testinput5 


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/testdata/testoutput2    2014-11-14 18:41:20 UTC (rev 147)
@@ -13699,6 +13699,10 @@
     123abc456abc789\=g
  2: 123xyz456xyz789


+/(?<=abc)(|def)/g,replace=<$0>
+    123abcxyzabcdef789abcpqr
+ 4: 123abc<>xyzabc<><def>789abc<>pqr
+
 # End of substitute tests 


# End of testinput2

Modified: code/trunk/testdata/testoutput5
===================================================================
--- code/trunk/testdata/testoutput5    2014-11-12 17:46:02 UTC (rev 146)
+++ code/trunk/testdata/testoutput5    2014-11-14 18:41:20 UTC (rev 147)
@@ -4004,4 +4004,8 @@
     123ábc123
  1: 123X\x{1234}Z123


+/(?<=abc)(|def)/g,utf,replace=<$0>
+      123abcáyzabcdef789abcሴqr
+ 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
+
 # End of testinput5