[Pcre-svn] [288] code/trunk: Source and document file tidies…

トップ ページ
このメッセージを削除
著者: Subversion repository
日付:  
To: pcre-svn
題目: [Pcre-svn] [288] code/trunk: Source and document file tidies for 10.20-RC1.
Revision: 288
          http://www.exim.org/viewvc/pcre2?view=rev&revision=288
Author:   ph10
Date:     2015-06-18 17:39:25 +0100 (Thu, 18 Jun 2015)
Log Message:
-----------
Source and document file tidies for 10.20-RC1.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/HACKING
    code/trunk/NEWS
    code/trunk/README
    code/trunk/RunTest
    code/trunk/configure.ac
    code/trunk/doc/html/README.txt
    code/trunk/doc/html/pcre2.html
    code/trunk/doc/html/pcre2_callout_enumerate.html
    code/trunk/doc/html/pcre2_compile.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2build.html
    code/trunk/doc/html/pcre2callout.html
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/html/pcre2syntax.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.3
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_callout_enumerate.3
    code/trunk/doc/pcre2_compile.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2build.3
    code/trunk/doc/pcre2callout.3
    code/trunk/doc/pcre2pattern.3
    code/trunk/doc/pcre2syntax.3
    code/trunk/doc/pcre2test.1
    code/trunk/doc/pcre2test.txt
    code/trunk/src/config.h.generic
    code/trunk/src/pcre2.h.generic
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_auto_possess.c
    code/trunk/src/pcre2_dfa_match.c
    code/trunk/src/pcre2_error.c
    code/trunk/src/pcre2_internal.h
    code/trunk/src/pcre2_intmodedep.h
    code/trunk/src/pcre2_jit_compile.c
    code/trunk/src/pcre2_match.c
    code/trunk/src/pcre2_pattern_info.c
    code/trunk/src/pcre2_tables.c
    code/trunk/src/pcre2test.c


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/ChangeLog    2015-06-18 16:39:25 UTC (rev 288)
@@ -1,8 +1,8 @@
 Change Log for PCRE2
 --------------------


-Version 10.20 xx-xx-2015
-------------------------
+ Version 10.20 16-June-2015
+--------------------------

1. Callouts with string arguments have been added.

@@ -123,27 +123,27 @@
current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a
buffer overflow at compile time. This bug was discovered by the LLVM fuzzer.

-31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1
+31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1
as an int; fixed by writing it as 1u).

-32. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives
+32. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives
a warning for "fileno" unless -std=gnu99 us used.

-33. A lookbehind assertion within a set of mutually recursive subpatterns could
+33. A lookbehind assertion within a set of mutually recursive subpatterns could
provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.

34. Give an error for an empty subpattern name such as (?'').

-35. Make pcre2test give an error if a pattern that follows #forbud_utf contains
+35. Make pcre2test give an error if a pattern that follows #forbud_utf contains
\P, \p, or \X.

-36. The way named subpatterns are handled has been refactored. There is now a
+36. The way named subpatterns are handled has been refactored. There is now a
pre-pass over the regex which does nothing other than identify named
subpatterns and count the total captures. This means that information about
-named patterns is known before the rest of the compile. In particular, it means
-that forward references can be checked as they are encountered. Previously, the
-code for handling forward references was contorted and led to several errors in
-computing the memory requirements for some patterns, leading to buffer
+named patterns is known before the rest of the compile. In particular, it means
+that forward references can be checked as they are encountered. Previously, the
+code for handling forward references was contorted and led to several errors in
+computing the memory requirements for some patterns, leading to buffer
overflows.

37. There was no check for integer overflow in subroutine calls such as (?123).
@@ -152,11 +152,11 @@
being treated as a literal 'l' instead of causing an error.

39. If a non-capturing group containing a conditional group that could match
-an empty string was repeated, it was not identified as matching an empty string
+an empty string was repeated, it was not identified as matching an empty string
itself. For example: /^(?:(?(1)x|)+)+$()/.

40. In an EBCDIC environment, pcretest was mishandling the escape sequences
-\a and \e in test subject lines.
+\a and \e in test subject lines.

41. In an EBCDIC environment, \a in a pattern was converted to the ASCII
instead of the EBCDIC value.

Modified: code/trunk/HACKING
===================================================================
--- code/trunk/HACKING    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/HACKING    2015-06-18 16:39:25 UTC (rev 288)
@@ -104,7 +104,22 @@
 for nested parenthesized groups. This is a safety feature for environments with
 small stacks where the patterns are provided by users.


+History repeated itself for release 10.20. A number of bugs relating to named
+subpatterns had been discovered by fuzzers. Most of these were related to the
+handling of forward references when it was not known if the named pattern was
+unique. (References to non-unique names use a different opcode and more
+memory.) The use of duplicate group numbers (the (?| facility) also caused
+issues.

+To get around these problems I adopted a new approach by adding a third pass,
+really a "pre-pass", over the pattern, which does nothing other than identify
+all the named subpatterns and their corresponding group numbers. This means
+that the actual compile (both pre-pass and real compile) have full knowledge of
+group names and numbers throughout. Several dozen lines of messy code were
+eliminated, though the new pre-pass is not short (skipping over [] classes is
+complicated).
+
+
Traditional matching function
-----------------------------

@@ -343,8 +358,9 @@

For classes containing characters with values greater than 255 or that contain
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable
-code points are less than 256, followed by a list of pairs (for a range) and
-single characters. In caseless mode, both cases are explicitly listed.
+code points are less than 256, followed by a list of pairs (for a range) and/or
+single characters and/or properties. In caseless mode, both cases are
+explicitly listed.

OP_XCLASS is followed by a LINK_SIZE value containing the total length of the
opcode and its data. This is followed by a code unit containing flag bits:
@@ -431,7 +447,7 @@
If a subpattern is quantified such that it is permitted to match zero times, it
is preceded by one of OP_BRAZERO, OP_BRAMINZERO, or OP_SKIPZERO. These are
single-unit opcodes that tell the matcher that skipping the following
-subpattern entirely is a valid branch. In the case of the first two, not
+subpattern entirely is a valid match. In the case of the first two, not
skipping the pattern is also valid (greedy and non-greedy). The third is used
when a pattern has the quantifier {0,0}. It cannot be entirely discarded,
because it may be called as a subroutine from elsewhere in the pattern.
@@ -487,9 +503,9 @@
of the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
is OP_REVERSE, followed by a count of the number of characters to move back the
-pointer in the subject string. In ASCII or UTF-32 mode, the count is a number
-of code units, but in UTF-8/16 mode each character may occupy more than one
-code unit. A separate count is present in each alternative of a lookbehind
+pointer in the subject string. In ASCII or UTF-32 mode, the count is also the
+number of code units, but in UTF-8/16 mode each character may occupy more than
+one code unit. A separate count is present in each alternative of a lookbehind
assertion, allowing them to have different (but fixed) lengths.


@@ -585,4 +601,4 @@
correct length, in order to catch updating errors.

Philip Hazel
-March 2015
+June 2015

Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/NEWS    2015-06-18 16:39:25 UTC (rev 288)
@@ -1,6 +1,26 @@
 News about PCRE2 releases
 -------------------------


+Version 10.20 16-June-2015
+--------------------------
+
+1. Callouts with string arguments and the pcre2_callout_enumerate() function
+have been implemented.
+
+2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added.
+
+3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a
+subject in multiline mode.
+
+4. The way named subpatterns are handled has been refactored. The previous
+approach had several bugs.
+
+5. The handling of \c in EBCDIC environments has been changed to conform to the
+perlebcdic document. This is an incompatible change.
+
+6. Bugs have been mended, many of them discovered by fuzzers.
+
+
Version 10.10 06-March-2015
---------------------------


Modified: code/trunk/README
===================================================================
--- code/trunk/README    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/README    2015-06-18 16:39:25 UTC (rev 288)
@@ -293,9 +293,9 @@
   both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
   which specifies that the code value for the EBCDIC NL character is 0x25
   instead of the default 0x15.
-  
+
 . If you specify --enable-debug, additional debugging code is included in the
-  build. This option is intended for use by the PCRE2 maintainers. 
+  build. This option is intended for use by the PCRE2 maintainers.


. In environments where valgrind is installed, if you specify


Modified: code/trunk/RunTest
===================================================================
--- code/trunk/RunTest    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/RunTest    2015-06-18 16:39:25 UTC (rev 288)
@@ -24,8 +24,8 @@
 # example, if JIT support is not compiled, test 16 is skipped, whereas if JIT
 # support is compiled, test 15 is skipped.
 #
-# Other arguments can be one of the words "-valgrind", "-valgrind-log", or 
-# "-sim" followed by an argument to run cross-compiled executables under a 
+# Other arguments can be one of the words "-valgrind", "-valgrind-log", or
+# "-sim" followed by an argument to run cross-compiled executables under a
 # simulator, for example:
 #
 # RunTest 3 -sim "qemu-arm -s 8388608"


Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/configure.ac    2015-06-18 16:39:25 UTC (rev 288)
@@ -11,15 +11,15 @@
 m4_define(pcre2_major, [10])
 m4_define(pcre2_minor, [20])
 m4_define(pcre2_prerelease, [-RC1])
-m4_define(pcre2_date, [2015-03-11])
+m4_define(pcre2_date, [2015-06-16])


# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.

 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre2_8_version,     [1:0:1])
-m4_define(libpcre2_16_version,    [1:0:1])
-m4_define(libpcre2_32_version,    [1:0:1])
+m4_define(libpcre2_8_version,     [2:0:0])
+m4_define(libpcre2_16_version,    [2:0:0])
+m4_define(libpcre2_32_version,    [2:0:0])
 m4_define(libpcre2_posix_version, [0:0:0])


 AC_PREREQ(2.57)
@@ -134,7 +134,7 @@
 AC_ARG_ENABLE(debug,
               AS_HELP_STRING([--enable-debug],
                              [enable debugging code]),
-              , enable_debug=no)                 
+              , enable_debug=no)


 # Handle --enable-jit (disabled by default)
 AC_ARG_ENABLE(jit,
@@ -141,7 +141,7 @@
               AS_HELP_STRING([--enable-jit],
                              [enable Just-In-Time compiling support]),
               , enable_jit=no)
-              
+
 # Handle --disable-pcre2grep-jit (enabled by default)
 AC_ARG_ENABLE(pcre2grep-jit,
               AS_HELP_STRING([--disable-pcre2grep-jit],
@@ -514,7 +514,7 @@
 if test "$enable_debug" = "yes"; then
   AC_DEFINE([PCRE2_DEBUG], [], [
     Define to any value to include debugging code.])
-fi      
+fi


# Unless running under Windows, JIT support requires pthreads.

@@ -876,7 +876,7 @@
     Build 8-bit pcre2 library ....... : ${enable_pcre2_8}
     Build 16-bit pcre2 library ...... : ${enable_pcre2_16}
     Build 32-bit pcre2 library ...... : ${enable_pcre2_32}
-    Include debugging code .......... : ${enable_debug} 
+    Include debugging code .......... : ${enable_debug}
     Enable JIT compiling support .... : ${enable_jit}
     Enable Unicode support .......... : ${enable_unicode}
     Newline char/sequence ........... : ${enable_newline}


Modified: code/trunk/doc/html/README.txt
===================================================================
--- code/trunk/doc/html/README.txt    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/README.txt    2015-06-18 16:39:25 UTC (rev 288)
@@ -294,6 +294,9 @@
   which specifies that the code value for the EBCDIC NL character is 0x25
   instead of the default 0x15.


+. If you specify --enable-debug, additional debugging code is included in the
+ build. This option is intended for use by the PCRE2 maintainers.
+
. In environments where valgrind is installed, if you specify

--enable-valgrind
@@ -829,4 +832,4 @@
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 26 January 2015
+Last updated: 24 April 2015

Modified: code/trunk/doc/html/pcre2.html
===================================================================
--- code/trunk/doc/html/pcre2.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -108,10 +108,16 @@
 <P>
 One way of guarding against this possibility is to use the
 <b>pcre2_pattern_info()</b> function to check the compiled pattern's options for
-UTF. Alternatively, you can set the PCRE2_NEVER_UTF option at compile time.
-This causes an compile time error if a pattern contains a UTF-setting sequence.
+PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling
+<b>pcre2_compile()</b>. This causes an compile time error if a pattern contains
+a UTF-setting sequence.
 </P>
 <P>
+The use of Unicode properties for character types such as \d can also be
+enabled from within the pattern, by specifying "(*UCP)". This feature can be
+disallowed by setting the PCRE2_NEVER_UCP option.
+</P>
+<P>
 If your application is one that supports UTF, be aware that validity checking
 can take time. If the same data string is to be matched many times, you can use
 the PCRE2_NO_UTF_CHECK option for the second and subsequent matches to avoid
@@ -118,6 +124,12 @@
 running redundant checks.
 </P>
 <P>
+The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead to
+problems, because it may leave the current matching point in the middle of a
+multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used to
+lock out the use of \C, causing a compile-time error if it is encountered.
+</P>
+<P>
 Another way that performance can be hit is by running a pattern that has a very
 large search tree against a string that will never match. Nested unlimited
 repeats in a pattern are a common example. PCRE2 provides some protection
@@ -175,9 +187,9 @@
 </P>
 <br><a name="SEC5" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 18 November 2014
+Last updated: 13 April 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2_callout_enumerate.html
===================================================================
--- code/trunk/doc/html/pcre2_callout_enumerate.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2_callout_enumerate.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -33,7 +33,7 @@
 <pre>
   <i>code</i>           Points to the compiled pattern
   <i>callback</i>       The callback function
-  <i>callout_data</i>   User data that is passed to the callback  
+  <i>callout_data</i>   User data that is passed to the callback
 </pre>
 The <i>callback()</i> function is passed a pointer to a data block containing
 the following fields:
@@ -46,9 +46,9 @@
   <i>callout_string_length</i>  Length of callout string
   <i>callout_string</i>         Points to callout string or is NULL
 </pre>
-The second argument is the callout data that was passed to 
-<b>pcre2_callout_enumerate()</b>. The <b>callback()</b> function must return zero 
-for success. Any other value causes the pattern scan to stop, with the value 
+The second argument is the callout data that was passed to
+<b>pcre2_callout_enumerate()</b>. The <b>callback()</b> function must return zero
+for success. Any other value causes the pattern scan to stop, with the value
 being passed back as the result of <b>pcre2_callout_enumerate()</b>.
 </P>
 <P>


Modified: code/trunk/doc/html/pcre2_compile.html
===================================================================
--- code/trunk/doc/html/pcre2_compile.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2_compile.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -49,6 +49,7 @@
 <pre>
   PCRE2_ANCHORED           Force pattern anchoring
   PCRE2_ALT_BSUX           Alternative handling of \u, \U, and \x
+  PCRE2_ALT_CIRCUMFLEX     Alternative handling of ^ in multiline mode
   PCRE2_AUTO_CALLOUT       Compile automatic callouts
   PCRE2_CASELESS           Do caseless matching
   PCRE2_DOLLAR_ENDONLY     $ not to match newline at end
@@ -58,6 +59,7 @@
   PCRE2_FIRSTLINE          Force matching to be before newline
   PCRE2_MATCH_UNSET_BACKREF  Match unset back references
   PCRE2_MULTILINE          ^ and $ match newlines within data
+  PCRE2_NEVER_BACKSLASH_C  Lock out the use of \C in patterns
   PCRE2_NEVER_UCP          Lock out PCRE2_UCP, e.g. via (*UCP)
   PCRE2_NEVER_UTF          Lock out PCRE2_UTF, e.g. via (*UTF)
   PCRE2_NO_AUTO_CAPTURE    Disable numbered capturing paren-


Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2api.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -1075,6 +1075,15 @@
 \x, but it may have zero, one, or two digits (so, for example, \xz matches a
 binary zero character followed by z).
 <pre>
+  PCRE2_ALT_CIRCUMFLEX
+</pre>
+In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter
+matches at the start of the subject (unless PCRE2_NOTBOL is set), and also
+after any internal newline. However, it does not match after a newline at the
+end of the subject, for compatibility with Perl. If you want a multiline
+circumflex also to match after a terminating newline, you must set
+PCRE2_ALT_CIRCUMFLEX.
+<pre>
   PCRE2_AUTO_CALLOUT
 </pre>
 If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
@@ -1174,9 +1183,20 @@
 constructs match immediately following or immediately before internal newlines
 in the subject string, respectively, as well as at the very start and end. This
 is equivalent to Perl's /m option, and it can be changed within a pattern by a
-(?m) option setting. If there are no newlines in a subject string, or no
-occurrences of ^ or $ in a pattern, setting PCRE2_MULTILINE has no effect.
+(?m) option setting. Note that the "start of line" metacharacter does not match
+after a newline at the end of the subject, for compatibility with Perl.
+However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If
+there are no newlines in a subject string, or no occurrences of ^ or $ in a
+pattern, setting PCRE2_MULTILINE has no effect.
 <pre>
+  PCRE2_NEVER_BACKSLASH_C
+</pre>
+This option locks out the use of \C in the pattern that is being compiled.
+This escape can cause unpredictable behaviour in UTF-8 or UTF-16 modes, because
+it may leave the current matching point in the middle of a multi-code-unit
+character. This option may be useful in applications that process patterns from
+external sources.
+<pre>
   PCRE2_NEVER_UCP
 </pre>
 This option locks out the use of Unicode properties for handling \B, \b, \D,
@@ -1183,8 +1203,8 @@
 \d, \S, \s, \W, \w, and some of the POSIX character classes, as described
 for the PCRE2_UCP option below. In particular, it prevents the creator of the
 pattern from enabling this facility by starting the pattern with (*UCP). This
-may be useful in applications that process patterns from external sources. The
-option combination PCRE_UCP and PCRE_NEVER_UCP causes an error.
+option may be useful in applications that process patterns from external
+sources. The option combination PCRE_UCP and PCRE_NEVER_UCP causes an error.
 <pre>
   PCRE2_NEVER_UTF
 </pre>
@@ -1191,9 +1211,9 @@
 This option locks out interpretation of the pattern as UTF-8, UTF-16, or
 UTF-32, depending on which library is in use. In particular, it prevents the
 creator of the pattern from switching to UTF interpretation by starting the
-pattern with (*UTF). This may be useful in applications that process patterns
-from external sources. The combination of PCRE2_UTF and PCRE2_NEVER_UTF causes
-an error.
+pattern with (*UTF). This option may be useful in applications that process
+patterns from external sources. The combination of PCRE2_UTF and
+PCRE2_NEVER_UTF causes an error.
 <pre>
   PCRE2_NO_AUTO_CAPTURE
 </pre>
@@ -1735,14 +1755,14 @@
 <b>  void *<i>user_data</i>);</b>
 <br>
 <br>
-A script language that supports the use of string arguments in callouts might 
-like to scan all the callouts in a pattern before running the match. This can 
-be done by calling <b>pcre2_callout_enumerate()</b>. The first argument is a 
+A script language that supports the use of string arguments in callouts might
+like to scan all the callouts in a pattern before running the match. This can
+be done by calling <b>pcre2_callout_enumerate()</b>. The first argument is a
 pointer to a compiled pattern, the second points to a callback function, and
 the third is arbitrary user data. The callback function is called for every
 callout in the pattern in the order in which they appear. Its first argument is
 a pointer to a callout enumeration block, and its second argument is the
-<i>user_data</i> value that was passed to <b>pcre2_callout_enumerate()</b>. The 
+<i>user_data</i> value that was passed to <b>pcre2_callout_enumerate()</b>. The
 contents of the callout enumeration block are described in the
 <a href="pcre2callout.html"><b>pcre2callout</b></a>
 documentation, which also gives further details about callouts.
@@ -2273,7 +2293,7 @@
   PCRE2_ERROR_CALLOUT
 </pre>
 This error is never generated by <b>pcre2_match()</b> itself. It is provided for
-use by callout functions that want to cause <b>pcre2_match()</b> or 
+use by callout functions that want to cause <b>pcre2_match()</b> or
 <b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
 <a href="pcre2callout.html"><b>pcre2callout</b></a>
 documentation for details.
@@ -2863,7 +2883,7 @@
 </P>
 <br><a name="SEC40" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 March 2015
+Last updated: 22 April 2015
 <br>
 Copyright &copy; 1997-2015 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2build.html
===================================================================
--- code/trunk/doc/html/pcre2build.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2build.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -29,11 +29,12 @@
 <li><a name="TOC14" href="#SEC14">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
 <li><a name="TOC15" href="#SEC15">PCRE2GREP BUFFER SIZE</a>
 <li><a name="TOC16" href="#SEC16">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
-<li><a name="TOC17" href="#SEC17">DEBUGGING WITH VALGRIND SUPPORT</a>
-<li><a name="TOC18" href="#SEC18">CODE COVERAGE REPORTING</a>
-<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
-<li><a name="TOC20" href="#SEC20">AUTHOR</a>
-<li><a name="TOC21" href="#SEC21">REVISION</a>
+<li><a name="TOC17" href="#SEC17">INCLUDING DEBUGGING CODE</a>
+<li><a name="TOC18" href="#SEC18">DEBUGGING WITH VALGRIND SUPPORT</a>
+<li><a name="TOC19" href="#SEC19">CODE COVERAGE REPORTING</a>
+<li><a name="TOC20" href="#SEC20">SEE ALSO</a>
+<li><a name="TOC21" href="#SEC21">AUTHOR</a>
+<li><a name="TOC22" href="#SEC22">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
 <P>
@@ -147,6 +148,12 @@
 option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also
 request this by starting with (*UCP).
 </P>
+<P>
+The \C escape sequence, which matches a single code unit, even in a UTF mode,
+can cause unpredictable behaviour because it may leave the current matching
+point in the middle of a multi-code-unit character. It can be locked out by
+setting the PCRE2_NEVER_BACKSLASH_C option.
+</P>
 <br><a name="SEC6" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
 <P>
 Just-in-time compiler support is included in the build by specifying
@@ -397,10 +404,19 @@
 </pre>
 immediately before the <b>configure</b> command.
 </P>
-<br><a name="SEC17" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
+<br><a name="SEC17" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
 <P>
 If you add
 <pre>
+  --enable-debug
+</pre>
+to the <b>configure</b> command, additional debugging code is included in the
+build. This feature is intended for use by the PCRE2 maintainers.
+</P>
+<br><a name="SEC18" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
+<P>
+If you add
+<pre>
   --enable-valgrind
 </pre>
 to the <b>configure</b> command, PCRE2 will use valgrind annotations to mark
@@ -407,7 +423,7 @@
 certain memory regions as unaddressable. This allows it to detect invalid
 memory accesses, and is mostly useful for debugging PCRE2 itself.
 </P>
-<br><a name="SEC18" href="#TOC1">CODE COVERAGE REPORTING</a><br>
+<br><a name="SEC19" href="#TOC1">CODE COVERAGE REPORTING</a><br>
 <P>
 If your C compiler is gcc, you can build a version of PCRE2 that can generate a
 code coverage report for its test suite. To enable this, you must install
@@ -464,11 +480,11 @@
 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
 documentation.
 </P>
-<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC20" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2api</b>(3), <b>pcre2-config</b>(3).
 </P>
-<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC21" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@@ -477,9 +493,9 @@
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC21" href="#TOC1">REVISION</a><br>
+<br><a name="SEC22" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 26 January 2015
+Last updated: 24 April 2015
 <br>
 Copyright &copy; 1997-2015 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2callout.html
===================================================================
--- code/trunk/doc/html/pcre2callout.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2callout.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -219,11 +219,11 @@
   PCRE2_SIZE    <i>pattern_position</i>;
   PCRE2_SIZE    <i>next_item_length</i>;
   PCRE2_SIZE    <i>callout_string_offset</i>;
-  PCRE2_SIZE    <i>callout_string_length</i>; 
-  PCRE2_SPTR    <i>callout_string</i>; 
+  PCRE2_SIZE    <i>callout_string_length</i>;
+  PCRE2_SPTR    <i>callout_string</i>;
 </pre>
 The <i>version</i> field contains the version number of the block format. The
-current version is 1; the three callout string fields were added for this 
+current version is 1; the three callout string fields were added for this
 version. If you are writing an application that might use an earlier release of
 PCRE2, you should check the version number before accessing any of these
 fields. The version number will increase in future if more fields are added,
@@ -263,7 +263,7 @@
 Fields for all callouts
 </b><br>
 <P>
-The remaining fields in the callout block are the same for both kinds of 
+The remaining fields in the callout block are the same for both kinds of
 callout.
 </P>
 <P>
@@ -306,7 +306,7 @@
 </P>
 <P>
 The <i>pattern_position</i> field contains the offset in the pattern string to
-the next item to be matched. 
+the next item to be matched.
 </P>
 <P>
 The <i>next_item_length</i> field contains the length of the next item to be
@@ -318,8 +318,8 @@
 <P>
 The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
 help in distinguishing between different automatic callouts, which all have the
-same callout number. However, they are set for all callouts, and are used by 
-<b>pcre2test</b> to show the next item to be matched when displaying callout 
+same callout number. However, they are set for all callouts, and are used by
+<b>pcre2test</b> to show the next item to be matched when displaying callout
 information.
 </P>
 <P>
@@ -351,9 +351,9 @@
 <b>  void *<i>user_data</i>);</b>
 <br>
 <br>
-A script language that supports the use of string arguments in callouts might 
-like to scan all the callouts in a pattern before running the match. This can 
-be done by calling <b>pcre2_callout_enumerate()</b>. The first argument is a 
+A script language that supports the use of string arguments in callouts might
+like to scan all the callouts in a pattern before running the match. This can
+be done by calling <b>pcre2_callout_enumerate()</b>. The first argument is a
 pointer to a compiled pattern, the second points to a callback function, and
 the third is arbitrary user data. The callback function is called for every
 callout in the pattern in the order in which they appear. Its first argument is
@@ -369,7 +369,7 @@
   <i>callout_string_length</i>  Length of callout string
   <i>callout_string</i>         Points to callout string or is NULL
 </pre>
-The version number is currently 0. It will increase if new fields are ever 
+The version number is currently 0. It will increase if new fields are ever
 added to the block. The remaining fields are the same as their namesakes in the
 <b>pcre2_callout</b> block that is used for callouts during matching, as
 described
@@ -384,8 +384,8 @@
 with the same value for <i>pattern_position</i> in each case.
 </P>
 <P>
-The callback function should normally return zero. If it returns a non-zero 
-value, scanning the pattern stops, and that value is returned from 
+The callback function should normally return zero. If it returns a non-zero
+value, scanning the pattern stops, and that value is returned from
 <b>pcre2_callout_enumerate()</b>.
 </P>
 <br><a name="SEC7" href="#TOC1">AUTHOR</a><br>


Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2pattern.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -357,10 +357,11 @@
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters in a pattern, but when a pattern is being prepared by
 text editing, it is often easier to use one of the following escape sequences
-than the binary character it represents:
+than the binary character it represents. In an ASCII or Unicode environment,
+these escapes are as follows:
 <pre>
   \a        alarm, that is, the BEL character (hex 07)
-  \cx       "control-x", where x is any ASCII character
+  \cx       "control-x", where x is any printable ASCII character
   \e        escape (hex 1B)
   \f        form feed (hex 0C)
   \n        linefeed (hex 0A)
@@ -377,23 +378,38 @@
 case letter, it is converted to upper case. Then bit 6 of the character (hex
 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
 but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
-code unit following \c has a value greater than 127, a compile-time error
-occurs. This locks out non-ASCII characters in all modes.
+code unit following \c has a value less than 32 or greater than 126, a
+compile-time error occurs. This locks out non-printable ASCII characters in all
+modes.
 </P>
 <P>
-The \c facility was designed for use with ASCII characters, but with the
-extension to Unicode it is even less useful than it once was. It is, however,
-recognized when PCRE2 is compiled in EBCDIC mode, where data items are always
-bytes. In this mode, all values are valid after \c. If the next character is a
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
-byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
-the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
-characters also generate different values.
+When PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
+generate the appropriate EBCDIC code values. The \c escape is processed
+as specified for Perl in the <b>perlebcdic</b> document. The only characters
+that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
+other character provokes a compile-time error. The sequence \@ encodes
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
+to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
+\? becomes either 255 (hex FF) or 95 (hex 5F).
 </P>
 <P>
+Thus, apart from \?, these escapes generate the same character code values as
+they do in an ASCII environment, though the meanings of the values mostly
+differ. For example, \G always generates code value 7, which is BEL in ASCII
+but DEL in EBCDIC.
+</P>
+<P>
+The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
+because 127 is not a control character in EBCDIC, Perl makes it generate the
+APC character. Unfortunately, there are several variants of EBCDIC. In most of
+them the APC character has the value 255 (hex FF), but in the one Perl calls
+POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
+values, PCRE2 makes \? generate 95; otherwise it generates 255.
+</P>
+<P>
 After \0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \0\x\07
-specifies two binary zeros followed by a BEL character (code value 7). Make
+digits, just those that are present are used. Thus the sequence \0\x\015
+specifies two binary zeros followed by a CR character (code value 13). Make
 sure you supply two digits after the initial zero if the pattern character that
 follows is itself an octal digit.
 </P>
@@ -412,21 +428,24 @@
 </P>
 <P>
 The handling of a backslash followed by a digit other than 0 is complicated,
-and Perl has changed in recent releases, causing PCRE2 also to change. Outside
-a character class, PCRE2 reads the digit and any following digits as a decimal
-number. If the number is less than 8, or if there have been at least that many
-previous capturing left parentheses in the expression, the entire sequence is
-taken as a <i>back reference</i>. A description of how this works is given
+and Perl has changed over time, causing PCRE2 also to change.
+</P>
+<P>
+Outside a character class, PCRE2 reads the digit and any following digits as a
+decimal number. If the number is less than 10, begins with the digit 8 or 9, or
+if there are at least that many previous capturing left parentheses in the
+expression, the entire sequence is taken as a <i>back reference</i>. A
+description of how this works is given
 <a href="#backreferences">later,</a>
 following the discussion of
 <a href="#subpattern">parenthesized subpatterns.</a>
+Otherwise, up to three octal digits are read to form a character code.
 </P>
 <P>
-Inside a character class, or if the decimal number following \ is greater than
-7 and there have not been that many capturing subpatterns, PCRE2 handles \8
-and \9 as the literal characters "8" and "9", and otherwise re-reads up to
-three octal digits following the backslash, using them to generate a data
-character. Any subsequent digits stand for themselves. For example:
+Inside a character class, PCRE2 handles \8 and \9 as the literal characters
+"8" and "9", and otherwise reads up to three octal digits following the
+backslash, using them to generate a data character. Any subsequent digits stand
+for themselves. For example, outside a character class:
 <pre>
   \040   is another way of writing an ASCII space
   \40    is the same, provided there are fewer than 40 previous capturing subpatterns
@@ -436,7 +455,7 @@
   \0113  is a tab followed by the character "3"
   \113   might be a back reference, otherwise the character with octal code 113
   \377   might be a back reference, otherwise the value 255 (decimal)
-  \81    is either a back reference, or the two characters "8" and "1"
+  \81    is always a back reference .sp
 </pre>
 Note that octal values of 100 or greater that are specified using this syntax
 must not be introduced by a leading zero, because no more than three octal
@@ -1105,15 +1124,19 @@
 <P>
 The circumflex and dollar metacharacters are zero-width assertions. That is,
 they test for a particular condition being true without consuming any
-characters from the subject string.
+characters from the subject string. These two metacharacters are concerned with
+matching the starts and ends of lines. If the newline convention is set so that
+only the two-character sequence CRLF is recognized as a newline, isolated CR
+and LF characters are treated as ordinary data characters, and are not
+recognized as newlines.
 </P>
 <P>
 Outside a character class, in the default matching mode, the circumflex
 character is an assertion that is true only if the current matching point is at
 the start of the subject string. If the <i>startoffset</i> argument of
-<b>pcre2_match()</b> is non-zero, circumflex can never match if the
-PCRE2_MULTILINE option is unset. Inside a character class, circumflex has an
-entirely different meaning
+<b>pcre2_match()</b> is non-zero, or if PCRE2_NOTBOL is set, circumflex can
+never match if the PCRE2_MULTILINE option is unset. Inside a character class,
+circumflex has an entirely different meaning
 <a href="#characterclass">(see below).</a>
 </P>
 <P>
@@ -1128,10 +1151,11 @@
 <P>
 The dollar character is an assertion that is true only if the current matching
 point is at the end of the subject string, or immediately before a newline at
-the end of the string (by default). Note, however, that it does not actually
-match the newline. Dollar need not be the last character of the pattern if a
-number of alternatives are involved, but it should be the last item in any
-branch in which it appears. Dollar has no special meaning in a character class.
+the end of the string (by default), unless PCRE2_NOTEOL is set. Note, however,
+that it does not actually match the newline. Dollar need not be the last
+character of the pattern if a number of alternatives are involved, but it
+should be the last item in any branch in which it appears. Dollar has no
+special meaning in a character class.
 </P>
 <P>
 The meaning of dollar can be changed so that it matches only at the very end of
@@ -1139,13 +1163,13 @@
 does not affect the \Z assertion.
 </P>
 <P>
-The meanings of the circumflex and dollar characters are changed if the
-PCRE2_MULTILINE option is set. When this is the case, a circumflex matches
-immediately after internal newlines as well as at the start of the subject
-string. It does not match after a newline that ends the string. A dollar
-matches before any newlines in the string, as well as at the very end, when
-PCRE2_MULTILINE is set. When newline is specified as the two-character
-sequence CRLF, isolated CR and LF characters do not indicate newlines.
+The meanings of the circumflex and dollar metacharacters are changed if the
+PCRE2_MULTILINE option is set. When this is the case, a dollar character
+matches before any newlines in the string, as well as at the very end, and a
+circumflex matches immediately after internal newlines as well as at the start
+of the subject string. It does not match after a newline that ends the string,
+for compatibility with Perl. However, this can be changed by setting the
+PCRE2_ALT_CIRCUMFLEX option.
 </P>
 <P>
 For example, the pattern /^abc$/ matches the subject string "def\nabc" (where
@@ -1198,14 +1222,18 @@
 byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is a
 32-bit unit. Unlike a dot, \C always matches line-ending characters. The
 feature is provided in Perl in order to match individual bytes in UTF-8 mode,
-but it is unclear how it can usefully be used. Because \C breaks up characters
-into individual code units, matching one unit with \C in a UTF mode means that
-the rest of the string may start with a malformed UTF character. This has
-undefined results, because PCRE2 assumes that it is dealing with valid UTF
-strings (and by default it checks this at the start of processing unless the
-PCRE2_NO_UTF_CHECK option is used).
+but it is unclear how it can usefully be used.
 </P>
 <P>
+Because \C breaks up characters into individual code units, matching one unit
+with \C in UTF-8 or UTF-16 mode means that the rest of the string may start
+with a malformed UTF character. This has undefined results, because PCRE2
+assumes that it is matching character by character in a valid UTF string (by
+default it checks the subject string's validity at the start of processing
+unless the PCRE2_NO_UTF_CHECK option is used). An application can lock out the
+use of \C by setting the PCRE2_NEVER_BACKSLASH_C option.
+</P>
+<P>
 PCRE2 does not allow \C to appear in lookbehind assertions
 <a href="#lookbehind">(described below)</a>
 in a UTF mode, because this would make it impossible to calculate the length of
@@ -1475,7 +1503,8 @@
 setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS and
 PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
 permitted. If a letter appears both before and after the hyphen, the option is
-unset.
+unset. An empty options setting "(?)" is allowed. Needless to say, it has no
+effect.
 </P>
 <P>
 The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
@@ -1508,11 +1537,20 @@
 behaviour otherwise.
 </P>
 <P>
+As a convenient shorthand, if any option settings are required at the start of
+a non-capturing subpattern (see the next section), the option letters may
+appear between the "?" and the ":". Thus the two patterns
+<pre>
+  (?i:saturday|sunday)
+  (?:(?i)saturday|sunday)
+</pre>
+match exactly the same set of strings.
+</P>
+<P>
 <b>Note:</b> There are other PCRE2-specific options that can be set by the
-application when the compiling function is called.
-The pattern can contain special leading sequences such as (*CRLF) to override
-what the application has set or what has been defaulted. Details are given in
-the section entitled
+application when the compiling function is called. The pattern can contain
+special leading sequences such as (*CRLF) to override what the application has
+set or what has been defaulted. Details are given in the section entitled
 <a href="#newlineseq">"Newline sequences"</a>
 above. There are also the (*UTF) and (*UCP) leading sequences that can be used
 to set UTF and Unicode property modes; they are equivalent to setting the
@@ -2841,10 +2879,10 @@
 Callouts with string arguments
 </b><br>
 <P>
-A delimited string may be used instead of a number as a callout argument. The 
-starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is 
-the same as the start, except for {, where the ending delimiter is }. If the 
-ending delimiter is needed within the string, it must be doubled. For 
+A delimited string may be used instead of a number as a callout argument. The
+starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is
+the same as the start, except for {, where the ending delimiter is }. If the
+ending delimiter is needed within the string, it must be doubled. For
 example:
 <pre>
   (?C'ab ''c'' d')xyz(?C{any text})pqr
@@ -3285,7 +3323,7 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 15 March 2015
+Last updated: 13 June 2015
 <br>
 Copyright &copy; 1997-2015 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2syntax.html
===================================================================
--- code/trunk/doc/html/pcre2syntax.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2syntax.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -15,7 +15,7 @@
 <ul>
 <li><a name="TOC1" href="#SEC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a>
 <li><a name="TOC2" href="#SEC2">QUOTING</a>
-<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
+<li><a name="TOC3" href="#SEC3">ESCAPED CHARACTERS</a>
 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
@@ -55,11 +55,12 @@
   \Q...\E    treat enclosed characters as literal
 </PRE>
 </P>
-<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
+<br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
 <P>
+This table applies to ASCII and Unicode environments.
 <pre>
   \a         alarm, that is, the BEL character (hex 07)
-  \cx        "control-x", where x is any ASCII character
+  \cx        "control-x", where x is any ASCII printing character
   \e         escape (hex 1B)
   \f         form feed (hex 0C)
   \n         newline (hex 0A)
@@ -68,18 +69,32 @@
   \0dd       character with octal code 0dd
   \ddd       character with octal code ddd, or backreference
   \o{ddd..}  character with octal code ddd..
+  \U         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
+  \uhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
   \xhh       character with hex code hh
   \x{hhh..}  character with hex code hhh..
 </pre>
-Note that \0dd is always an octal code, and that \8 and \9 are the literal
-characters "8" and "9".
+Note that \0dd is always an octal code. The treatment of backslash followed by
+a non-zero digit is complicated; for details see the section
+<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
+in the
+<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
+documentation, where details of escape processing in EBCDIC environments are
+also given.
 </P>
+<P>
+When \x is not followed by {, from zero to two hexadecimal digits are read,
+but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
+be recognized as a hexadecimal escape; otherwise it matches a literal "x".
+Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
+it matches a literal "u".
+</P>
 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
 <P>
 <pre>
   .          any character except newline;
                in dotall mode, any character whatsoever
-  \C         one data unit, even in UTF mode (best avoided)
+  \C         one code unit, even in UTF mode (best avoided)
   \d         a decimal digit
   \D         a character that is not a decimal digit
   \h         a horizontal white space character
@@ -96,6 +111,11 @@
   \W         a "non-word" character
   \X         a Unicode extended grapheme cluster
 </pre>
+The application can lock out the use of \C by setting the
+PCRE2_NEVER_BACKSLASH_C option. It is dangerous because it may leave the
+current matching point in the middle of a UTF-8 or UTF-16 character.
+</P>
+<P>
 By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
 or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
 happening, \s and \w may also match characters with code points in the range
@@ -348,13 +368,14 @@
   \b          word boundary
   \B          not a word boundary
   ^           start of subject
-               also after internal newline in multiline mode
+                also after an internal newline in multiline mode
+                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
   \A          start of subject
   $           end of subject
-               also before newline at end of subject
-               also before internal newline in multiline mode
+                also before newline at end of subject
+                also before internal newline in multiline mode
   \Z          end of subject
-               also before newline at end of subject
+                also before newline at end of subject
   \z          end of subject
   \G          first matching position in subject
 </PRE>
@@ -423,7 +444,9 @@
   (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
 </pre>
 Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
-limits set by the caller of pcre2_match(), not increase them.
+limits set by the caller of pcre2_match(), not increase them. The application
+can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
+PCRE2_NEVER_UCP options, respectively, at compile time.
 </P>
 <br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
 <P>
@@ -539,9 +562,9 @@
   (?Cn)           callout with numerical data n
   (?C"text")      callout with string data
 </pre>
-The allowed string delimiters are ` ' " ^ % # $ (which are the same for the 
-start and the end), and the starting delimiter { matched with the ending 
-delimiter }. To encode the ending delimiter within the string, double it.   
+The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
+start and the end), and the starting delimiter { matched with the ending
+delimiter }. To encode the ending delimiter within the string, double it.
 </P>
 <br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
 <P>
@@ -559,7 +582,7 @@
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 15 March 2015
+Last updated: 13 June 2015
 <br>
 Copyright &copy; 1997-2015 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/html/pcre2test.html    2015-06-18 16:39:25 UTC (rev 288)
@@ -94,7 +94,7 @@
 contain binary zeroes, even though in Unix-like environments, <b>fgets()</b>
 treats any bytes other than newline as data characters. In some Windows
 environments character 26 (hex 1A) causes an immediate end of file, and no
-further data is read. 
+further data is read.
 </P>
 <P>
 For maximum portability, therefore, it is safest to avoid non-printing
@@ -284,13 +284,20 @@
   #forbid_utf
 </pre>
 Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
-options set, which locks out the use of UTF and Unicode property features. This
-is a trigger guard that is used in test files to ensure that UTF or Unicode
-property tests are not accidentally added to files that are used when Unicode
-support is not included in the library. This effect can also be obtained by the
-use of <b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be
-unset, and the automatic options are not displayed in pattern information, to
-avoid cluttering up test output.
+options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and
+the use of (*UTF) and (*UCP) at the start of patterns. This command also forces
+an error if a subsequent pattern contains any occurrences of \P, \p, or \X,
+which are still supported when PCRE2_UTF is not set, but which require Unicode
+property support to be included in the library.
+</P>
+<P>
+This is a trigger guard that is used in test files to ensure that UTF or
+Unicode property tests are not accidentally added to files that are used when
+Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and
+PCRE2_NEVER_UCP as a default can also be obtained by the use of <b>#pattern</b>;
+the difference is that <b>#forbid_utf</b> cannot be unset, and the automatic
+options are not displayed in pattern information, to avoid cluttering up test
+output.
 <pre>
   #load &#60;filename&#62;
 </pre>
@@ -471,6 +478,7 @@
 <pre>
       allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
       alt_bsux                  set PCRE2_ALT_BSUX
+      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
       anchored                  set PCRE2_ANCHORED
       auto_callout              set PCRE2_AUTO_CALLOUT
   /i  caseless                  set PCRE2_CASELESS
@@ -481,6 +489,7 @@
       firstline                 set PCRE2_FIRSTLINE
       match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
   /m  multiline                 set PCRE2_MULTILINE
+      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
       never_ucp                 set PCRE2_NEVER_UCP
       never_utf                 set PCRE2_NEVER_UTF
       no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
@@ -506,7 +515,7 @@
 <pre>
       bsr=[anycrlf|unicode]     specify \R handling
   /B  bincode                   show binary code without lengths
-      callout_info              show callout information 
+      callout_info              show callout information
       debug                     same as info,fullbincode
       fullbincode               show binary code with lengths
   /I  info                      show info about compiled pattern
@@ -589,9 +598,9 @@
 ending code units are recorded.
 </P>
 <P>
-The <b>callout_info</b> modifier requests information about all the callouts in 
-the pattern. A list of them is output at the end of any other information that 
-is requested. For each callout, either its number or string is given, followed 
+The <b>callout_info</b> modifier requests information about all the callouts in
+the pattern. A list of them is output at the end of any other information that
+is requested. For each callout, either its number or string is given, followed
 by the item that follows it in the pattern.
 </P>
 <br><b>
@@ -1460,7 +1469,7 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 22 March 2015
+Last updated: 20 May 2015
 <br>
 Copyright &copy; 1997-2015 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre2.3
===================================================================
--- code/trunk/doc/pcre2.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -103,12 +103,12 @@
 .P
 One way of guarding against this possibility is to use the
 \fBpcre2_pattern_info()\fP function to check the compiled pattern's options for
-PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling 
+PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling
 \fBpcre2_compile()\fP. This causes an compile time error if a pattern contains
 a UTF-setting sequence.
 .P
-The use of Unicode properties for character types such as \ed can also be 
-enabled from within the pattern, by specifying "(*UCP)". This feature can be 
+The use of Unicode properties for character types such as \ed can also be
+enabled from within the pattern, by specifying "(*UCP)". This feature can be
 disallowed by setting the PCRE2_NEVER_UCP option.
 .P
 If your application is one that supports UTF, be aware that validity checking


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2.txt    2015-06-18 16:39:25 UTC (rev 288)
@@ -87,16 +87,26 @@
        mance.


        One  way  of guarding against this possibility is to use the pcre2_pat-
-       tern_info() function to check the compiled pattern's options  for  UTF.
-       Alternatively,  you can set the PCRE2_NEVER_UTF option at compile time.
-       This causes an compile time error if a pattern contains  a  UTF-setting
-       sequence.
+       tern_info() function  to  check  the  compiled  pattern's  options  for
+       PCRE2_UTF.  Alternatively,  you can set the PCRE2_NEVER_UTF option when
+       calling pcre2_compile(). This causes an compile time error if a pattern
+       contains a UTF-setting sequence.


+       The  use  of Unicode properties for character types such as \d can also
+       be enabled from within the pattern, by specifying "(*UCP)".  This  fea-
+       ture can be disallowed by setting the PCRE2_NEVER_UCP option.
+
        If  your  application  is one that supports UTF, be aware that validity
        checking can take time. If the same data string is to be  matched  many
        times,  you  can  use  the PCRE2_NO_UTF_CHECK option for the second and
        subsequent matches to avoid running redundant checks.


+       The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead
+       to  problems,  because  it  may leave the current matching point in the
+       middle of  a  multi-code-unit  character.  The  PCRE2_NEVER_BACKSLASH_C
+       option  can  be  used to lock out the use of \C, causing a compile-time
+       error if it is encountered.
+
        Another way that performance can be hit is by running  a  pattern  that
        has  a  very  large search tree against a string that will never match.
        Nested unlimited repeats in a pattern are a common example. PCRE2  pro-
@@ -155,11 +165,11 @@


REVISION

-       Last updated: 18 November 2014
-       Copyright (c) 1997-2014 University of Cambridge.
+       Last updated: 13 April 2015
+       Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2API(3)                Library Functions Manual                PCRE2API(3)



@@ -1109,34 +1119,43 @@
        always expected after \x, but it may have zero, one, or two digits (so,
        for example, \xz matches a binary zero character followed by z).


+         PCRE2_ALT_CIRCUMFLEX
+
+       In  multiline  mode  (when  PCRE2_MULTILINE  is  set),  the  circumflex
+       metacharacter  matches at the start of the subject (unless PCRE2_NOTBOL
+       is set), and also after any internal  newline.  However,  it  does  not
+       match after a newline at the end of the subject, for compatibility with
+       Perl. If you want a multiline circumflex also to match after  a  termi-
+       nating newline, you must set PCRE2_ALT_CIRCUMFLEX.
+
          PCRE2_AUTO_CALLOUT


-       If this bit  is  set,  pcre2_compile()  automatically  inserts  callout
+       If  this  bit  is  set,  pcre2_compile()  automatically inserts callout
        items, all with number 255, before each pattern item. For discussion of
        the callout facility, see the pcre2callout documentation.


          PCRE2_CASELESS


-       If this bit is set, letters in the pattern match both upper  and  lower
-       case  letters in the subject. It is equivalent to Perl's /i option, and
+       If  this  bit is set, letters in the pattern match both upper and lower
+       case letters in the subject. It is equivalent to Perl's /i option,  and
        it can be changed within a pattern by a (?i) option setting.


          PCRE2_DOLLAR_ENDONLY


-       If this bit is set, a dollar metacharacter in the pattern matches  only
-       at  the  end  of the subject string. Without this option, a dollar also
-       matches immediately before a newline at the end of the string (but  not
-       before  any other newlines). The PCRE2_DOLLAR_ENDONLY option is ignored
-       if PCRE2_MULTILINE is set. There is no equivalent  to  this  option  in
+       If  this bit is set, a dollar metacharacter in the pattern matches only
+       at the end of the subject string. Without this option,  a  dollar  also
+       matches  immediately before a newline at the end of the string (but not
+       before any other newlines). The PCRE2_DOLLAR_ENDONLY option is  ignored
+       if  PCRE2_MULTILINE  is  set.  There is no equivalent to this option in
        Perl, and no way to set it within a pattern.


          PCRE2_DOTALL


-       If  this  bit  is  set,  a dot metacharacter in the pattern matches any
-       character, including one that indicates a  newline.  However,  it  only
+       If this bit is set, a dot metacharacter  in  the  pattern  matches  any
+       character,  including  one  that  indicates a newline. However, it only
        ever matches one character, even if newlines are coded as CRLF. Without
        this option, a dot does not match when the current position in the sub-
-       ject  is  at  a newline. This option is equivalent to Perl's /s option,
+       ject is at a newline. This option is equivalent to  Perl's  /s  option,
        and it can be changed within a pattern by a (?s) option setting. A neg-
        ative class such as [^a] always matches newline characters, independent
        of the setting of this option.
@@ -1143,71 +1162,82 @@


          PCRE2_DUPNAMES


-       If this bit is set, names used to identify capturing  subpatterns  need
+       If  this  bit is set, names used to identify capturing subpatterns need
        not be unique. This can be helpful for certain types of pattern when it
-       is known that only one instance of the named  subpattern  can  ever  be
-       matched.  There  are  more details of named subpatterns below; see also
+       is  known  that  only  one instance of the named subpattern can ever be
+       matched. There are more details of named subpatterns  below;  see  also
        the pcre2pattern documentation.


          PCRE2_EXTENDED


-       If this bit is set, most white space  characters  in  the  pattern  are
-       totally  ignored  except when escaped or inside a character class. How-
-       ever, white space is not allowed within  sequences  such  as  (?>  that
+       If  this  bit  is  set,  most white space characters in the pattern are
+       totally ignored except when escaped or inside a character  class.  How-
+       ever,  white  space  is  not  allowed within sequences such as (?> that
        introduce various parenthesized subpatterns, nor within numerical quan-
-       tifiers such as {1,3}.  Ignorable white space is permitted  between  an
-       item  and a following quantifier and between a quantifier and a follow-
+       tifiers  such  as {1,3}.  Ignorable white space is permitted between an
+       item and a following quantifier and between a quantifier and a  follow-
        ing + that indicates possessiveness.


-       PCRE2_EXTENDED also causes characters between an unescaped # outside  a
-       character  class  and the next newline, inclusive, to be ignored, which
+       PCRE2_EXTENDED  also causes characters between an unescaped # outside a
+       character class and the next newline, inclusive, to be  ignored,  which
        makes it possible to include comments inside complicated patterns. Note
-       that  the  end of this type of comment is a literal newline sequence in
+       that the end of this type of comment is a literal newline  sequence  in
        the pattern; escape sequences that happen to represent a newline do not
-       count.  PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be
+       count. PCRE2_EXTENDED is equivalent to Perl's /x option, and it can  be
        changed within a pattern by a (?x) option setting.


        Which characters are interpreted as newlines can be specified by a set-
-       ting  in  the compile context that is passed to pcre2_compile() or by a
-       special sequence at the start of the pattern, as described in the  sec-
-       tion  entitled "Newline conventions" in the pcre2pattern documentation.
+       ting in the compile context that is passed to pcre2_compile() or  by  a
+       special  sequence at the start of the pattern, as described in the sec-
+       tion entitled "Newline conventions" in the pcre2pattern  documentation.
        A default is defined when PCRE2 is built.


          PCRE2_FIRSTLINE


-       If this option is set, an  unanchored  pattern  is  required  to  match
-       before  or  at  the  first  newline  in  the subject string, though the
+       If  this  option  is  set,  an  unanchored pattern is required to match
+       before or at the first  newline  in  the  subject  string,  though  the
        matched text may continue over the newline.


          PCRE2_MATCH_UNSET_BACKREF


-       If this option is set, a back reference to an  unset  subpattern  group
-       matches  an  empty  string (by default this causes the current matching
-       alternative to fail).  A pattern such as  (\1)(a)  succeeds  when  this
-       option  is set (assuming it can find an "a" in the subject), whereas it
-       fails by default, for Perl compatibility.  Setting  this  option  makes
+       If  this  option  is set, a back reference to an unset subpattern group
+       matches an empty string (by default this causes  the  current  matching
+       alternative  to  fail).   A  pattern such as (\1)(a) succeeds when this
+       option is set (assuming it can find an "a" in the subject), whereas  it
+       fails  by  default,  for  Perl compatibility. Setting this option makes
        PCRE2 behave more like ECMAscript (aka JavaScript).


          PCRE2_MULTILINE


-       By  default,  for  the purposes of matching "start of line" and "end of
-       line", PCRE2 treats the subject string as consisting of a  single  line
-       of  characters,  even  if  it actually contains newlines. The "start of
-       line" metacharacter (^) matches only at the start of  the  string,  and
-       the  "end  of  line"  metacharacter  ($) matches only at the end of the
+       By default, for the purposes of matching "start of line"  and  "end  of
+       line",  PCRE2  treats the subject string as consisting of a single line
+       of characters, even if it actually contains  newlines.  The  "start  of
+       line"  metacharacter  (^)  matches only at the start of the string, and
+       the "end of line" metacharacter ($) matches only  at  the  end  of  the
        string,  or  before  a  terminating  newline  (except  when  PCRE2_DOL-
-       LAR_ENDONLY  is  set).  Note, however, that unless PCRE2_DOTALL is set,
+       LAR_ENDONLY is set). Note, however, that unless  PCRE2_DOTALL  is  set,
        the "any character" metacharacter (.) does not match at a newline. This
        behaviour (for ^, $, and dot) is the same as Perl.


-       When  PCRE2_MULTILINE  it is set, the "start of line" and "end of line"
-       constructs match immediately following or immediately  before  internal
-       newlines  in  the  subject string, respectively, as well as at the very
-       start and end. This is equivalent to Perl's /m option, and  it  can  be
-       changed within a pattern by a (?m) option setting. If there are no new-
-       lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
-       setting PCRE2_MULTILINE has no effect.
+       When PCRE2_MULTILINE it is set, the "start of line" and "end  of  line"
+       constructs  match  immediately following or immediately before internal
+       newlines in the subject string, respectively, as well as  at  the  very
+       start  and  end.  This is equivalent to Perl's /m option, and it can be
+       changed within a pattern by a (?m) option setting. Note that the "start
+       of line" metacharacter does not match after a newline at the end of the
+       subject, for compatibility with Perl.  However, you can change this  by
+       setting  the PCRE2_ALT_CIRCUMFLEX option. If there are no newlines in a
+       subject string, or no occurrences of ^  or  $  in  a  pattern,  setting
+       PCRE2_MULTILINE has no effect.


+         PCRE2_NEVER_BACKSLASH_C
+
+       This  option  locks out the use of \C in the pattern that is being com-
+       piled.  This escape can  cause  unpredictable  behaviour  in  UTF-8  or
+       UTF-16  modes,  because  it may leave the current matching point in the
+       middle of a multi-code-unit character. This option  may  be  useful  in
+       applications that process patterns from external sources.
+
          PCRE2_NEVER_UCP


        This  option  locks  out the use of Unicode properties for handling \B,
@@ -1214,9 +1244,9 @@
        \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes, as
        described  for  the  PCRE2_UCP option below. In particular, it prevents
        the creator of the pattern from enabling this facility by starting  the
-       pattern  with  (*UCP).  This may be useful in applications that process
-       patterns from external sources. The  option  combination  PCRE_UCP  and
-       PCRE_NEVER_UCP causes an error.
+       pattern  with  (*UCP).  This  option may be useful in applications that
+       process patterns from external sources. The option combination PCRE_UCP
+       and PCRE_NEVER_UCP causes an error.


          PCRE2_NEVER_UTF


@@ -1223,9 +1253,9 @@
        This  option  locks out interpretation of the pattern as UTF-8, UTF-16,
        or UTF-32, depending on which library is in use. In particular, it pre-
        vents  the  creator of the pattern from switching to UTF interpretation
-       by starting the pattern with (*UTF). This may be useful in applications
-       that  process  patterns  from  external  sources.  The  combination  of
-       PCRE2_UTF and PCRE2_NEVER_UTF causes an error.
+       by starting the pattern with (*UTF).  This  option  may  be  useful  in
+       applications  that process patterns from external sources. The combina-
+       tion of PCRE2_UTF and PCRE2_NEVER_UTF causes an error.


          PCRE2_NO_AUTO_CAPTURE


@@ -2796,11 +2826,11 @@

REVISION

-       Last updated: 23 March 2015
+       Last updated: 22 April 2015
        Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2BUILD(3)              Library Functions Manual              PCRE2BUILD(3)



@@ -2916,7 +2946,12 @@
        PCRE2_UCP option. Unless the application  has  set  PCRE2_NEVER_UCP,  a
        pattern may also request this by starting with (*UCP).


+       The \C escape sequence, which matches a single code unit, even in a UTF
+       mode, can cause unpredictable behaviour because it may leave  the  cur-
+       rent  matching  point  in the middle of a multi-code-unit character. It
+       can be locked out by setting the PCRE2_NEVER_BACKSLASH_C option.


+
JUST-IN-TIME COMPILER SUPPORT

        Just-in-time compiler support is included in the build by specifying
@@ -2923,10 +2958,10 @@


          --enable-jit


-       This  support  is available only for certain hardware architectures. If
-       this option is set for an unsupported architecture,  a  building  error
-       occurs.   See the pcre2jit documentation for a discussion of JIT usage.
-       When JIT support is enabled, pcre2grep automatically makes use  of  it,
+       This support is available only for certain hardware  architectures.  If
+       this  option  is  set for an unsupported architecture, a building error
+       occurs.  See the pcre2jit documentation for a discussion of JIT  usage.
+       When  JIT  support is enabled, pcre2grep automatically makes use of it,
        unless you add


          --disable-pcre2grep-jit
@@ -2936,14 +2971,14 @@


NEWLINE RECOGNITION

-       By  default, PCRE2 interprets the linefeed (LF) character as indicating
-       the end of a line. This is the normal newline  character  on  Unix-like
-       systems.  You can compile PCRE2 to use carriage return (CR) instead, by
+       By default, PCRE2 interprets the linefeed (LF) character as  indicating
+       the  end  of  a line. This is the normal newline character on Unix-like
+       systems. You can compile PCRE2 to use carriage return (CR) instead,  by
        adding


          --enable-newline-is-cr


-       to the configure  command.  There  is  also  an  --enable-newline-is-lf
+       to  the  configure  command.  There  is  also an --enable-newline-is-lf
        option, which explicitly specifies linefeed as the newline character.


        Alternatively, you can specify that line endings are to be indicated by
@@ -2956,76 +2991,76 @@


          --enable-newline-is-anycrlf


-       which  causes  PCRE2 to recognize any of the three sequences CR, LF, or
+       which causes PCRE2 to recognize any of the three sequences CR,  LF,  or
        CRLF as indicating a line ending. Finally, a fifth option, specified by


          --enable-newline-is-any


-       causes PCRE2 to recognize any Unicode  newline  sequence.  The  Unicode
+       causes  PCRE2  to  recognize  any Unicode newline sequence. The Unicode
        newline sequences are the three just mentioned, plus the single charac-
        ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
-       U+0085),  LS  (line  separator,  U+2028),  and PS (paragraph separator,
+       U+0085), LS (line separator,  U+2028),  and  PS  (paragraph  separator,
        U+2029).


        Whatever default line ending convention is selected when PCRE2 is built
-       can  be  overridden by applications that use the library. At build time
+       can be overridden by applications that use the library. At  build  time
        it is conventional to use the standard for your operating system.



WHAT \R MATCHES

-       By default, the sequence \R in a pattern matches  any  Unicode  newline
-       sequence,  independently  of  what has been selected as the line ending
+       By  default,  the  sequence \R in a pattern matches any Unicode newline
+       sequence, independently of what has been selected as  the  line  ending
        sequence. If you specify


          --enable-bsr-anycrlf


-       the default is changed so that \R matches only CR, LF, or  CRLF.  What-
-       ever  is selected when PCRE2 is built can be overridden by applications
+       the  default  is changed so that \R matches only CR, LF, or CRLF. What-
+       ever is selected when PCRE2 is built can be overridden by  applications
        that use the called.



HANDLING VERY LARGE PATTERNS

-       Within a compiled pattern, offset values are used  to  point  from  one
-       part  to another (for example, from an opening parenthesis to an alter-
-       nation metacharacter). By default, in the 8-bit and  16-bit  libraries,
-       two-byte  values  are used for these offsets, leading to a maximum size
-       for a compiled pattern of around 64K code units. This is sufficient  to
+       Within  a  compiled  pattern,  offset values are used to point from one
+       part to another (for example, from an opening parenthesis to an  alter-
+       nation  metacharacter).  By default, in the 8-bit and 16-bit libraries,
+       two-byte values are used for these offsets, leading to a  maximum  size
+       for  a compiled pattern of around 64K code units. This is sufficient to
        handle all but the most gigantic patterns. Nevertheless, some people do
-       want to process truly enormous patterns, so it is possible  to  compile
-       PCRE2  to  use three-byte or four-byte offsets by adding a setting such
+       want  to  process truly enormous patterns, so it is possible to compile
+       PCRE2 to use three-byte or four-byte offsets by adding a  setting  such
        as


          --with-link-size=3


-       to the configure command. The value given must be 2, 3, or 4.  For  the
-       16-bit  library,  a  value of 3 is rounded up to 4. In these libraries,
-       using longer offsets slows down the operation of PCRE2 because  it  has
-       to  load additional data when handling them. For the 32-bit library the
-       value is always 4 and cannot be overridden; the value  of  --with-link-
+       to  the  configure command. The value given must be 2, 3, or 4. For the
+       16-bit library, a value of 3 is rounded up to 4.  In  these  libraries,
+       using  longer  offsets slows down the operation of PCRE2 because it has
+       to load additional data when handling them. For the 32-bit library  the
+       value  is  always 4 and cannot be overridden; the value of --with-link-
        size is ignored.



AVOIDING EXCESSIVE STACK USAGE

-       When  matching  with the pcre2_match() function, PCRE2 implements back-
-       tracking by making recursive  calls  to  an  internal  function  called
-       match().  In  environments where the size of the stack is limited, this
-       can severely limit PCRE2's operation. (The Unix  environment  does  not
-       usually  suffer from this problem, but it may sometimes be necessary to
+       When matching with the pcre2_match() function, PCRE2  implements  back-
+       tracking  by  making  recursive  calls  to  an internal function called
+       match(). In environments where the size of the stack is  limited,  this
+       can  severely  limit  PCRE2's operation. (The Unix environment does not
+       usually suffer from this problem, but it may sometimes be necessary  to
        increase  the  maximum  stack  size.  There  is  a  discussion  in  the
-       pcre2stack  documentation.)  An  alternative approach to recursion that
-       uses memory from the heap to remember data, instead of using  recursive
-       function  calls, has been implemented to work round the problem of lim-
-       ited stack size. If you want to build a version  of  PCRE2  that  works
+       pcre2stack documentation.) An alternative approach  to  recursion  that
+       uses  memory from the heap to remember data, instead of using recursive
+       function calls, has been implemented to work round the problem of  lim-
+       ited  stack  size.  If  you want to build a version of PCRE2 that works
        this way, add


          --disable-stack-for-recursion


        to the configure command. By default, the system functions malloc() and
-       free() are called to manage the heap memory that is required, but  cus-
-       tom  memory  management  functions  can  be  called instead. PCRE2 runs
+       free()  are called to manage the heap memory that is required, but cus-
+       tom memory management functions  can  be  called  instead.  PCRE2  runs
        noticeably more slowly when built in this way. This option affects only
        the pcre2_match() function; it is not relevant for pcre2_dfa_match().


@@ -3033,30 +3068,30 @@
LIMITING PCRE2 RESOURCE USAGE

        Internally, PCRE2 has a function called match(), which it calls repeat-
-       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
+       edly   (sometimes   recursively)  when  matching  a  pattern  with  the
        pcre2_match() function. By controlling the maximum number of times this
-       function may be called during a single matching operation, a limit  can
-       be  placed on the resources used by a single call to pcre2_match(). The
+       function  may be called during a single matching operation, a limit can
+       be placed on the resources used by a single call to pcre2_match().  The
        limit can be changed at run time, as described in the pcre2api documen-
-       tation.  The default is 10 million, but this can be changed by adding a
+       tation. The default is 10 million, but this can be changed by adding  a
        setting such as


          --with-match-limit=500000


-       to  the  configure  command.  This  setting  has  no  effect   on   the
+       to   the   configure  command.  This  setting  has  no  effect  on  the
        pcre2_dfa_match() matching function.


-       In  some  environments  it is desirable to limit the depth of recursive
+       In some environments it is desirable to limit the  depth  of  recursive
        calls of match() more strictly than the total number of calls, in order
-       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
+       to restrict the maximum amount of stack (or heap,  if  --disable-stack-
        for-recursion is specified) that is used. A second limit controls this;
-       it  defaults  to  the  value  that is set for --with-match-limit, which
-       imposes no additional constraints. However, you can set a  lower  limit
+       it defaults to the value that  is  set  for  --with-match-limit,  which
+       imposes  no  additional constraints. However, you can set a lower limit
        by adding, for example,


          --with-match-limit-recursion=10000


-       to  the  configure  command.  This  value can also be overridden at run
+       to the configure command. This value can  also  be  overridden  at  run
        time.



@@ -3064,16 +3099,16 @@

        PCRE2 uses fixed tables for processing characters whose code points are
        less than 256. By default, PCRE2 is built with a set of tables that are
-       distributed in the file src/pcre2_chartables.c.dist. These  tables  are
+       distributed  in  the file src/pcre2_chartables.c.dist. These tables are
        for ASCII codes only. If you add


          --enable-rebuild-chartables


-       to  the  configure  command, the distributed tables are no longer used.
-       Instead, a program called dftables is compiled and  run.  This  outputs
+       to the configure command, the distributed tables are  no  longer  used.
+       Instead,  a  program  called dftables is compiled and run. This outputs
        the source for new set of tables, created in the default locale of your
-       C run-time system. (This method of replacing the tables does  not  work
-       if  you are cross compiling, because dftables is run on the local host.
+       C  run-time  system. (This method of replacing the tables does not work
+       if you are cross compiling, because dftables is run on the local  host.
        If you need to create alternative tables when cross compiling, you will
        have to do so "by hand".)


@@ -3080,8 +3115,8 @@

USING EBCDIC CODE

-       PCRE2  assumes  by default that it will run in an environment where the
-       character code is ASCII or Unicode, which is a superset of ASCII.  This
+       PCRE2 assumes by default that it will run in an environment  where  the
+       character  code is ASCII or Unicode, which is a superset of ASCII. This
        is the case for most computer operating systems. PCRE2 can, however, be
        compiled to run in an 8-bit EBCDIC environment by adding


@@ -3088,21 +3123,21 @@
          --enable-ebcdic --disable-unicode


        to the configure command. This setting implies --enable-rebuild-charta-
-       bles.  You  should  only  use  it if you know that you are in an EBCDIC
+       bles. You should only use it if you know that  you  are  in  an  EBCDIC
        environment (for example, an IBM mainframe operating system).


-       It is not possible to support both EBCDIC and UTF-8 codes in  the  same
-       version  of  the  library. Consequently, --enable-unicode and --enable-
+       It  is  not possible to support both EBCDIC and UTF-8 codes in the same
+       version of the library. Consequently,  --enable-unicode  and  --enable-
        ebcdic are mutually exclusive.


        The EBCDIC character that corresponds to an ASCII LF is assumed to have
-       the  value  0x15 by default. However, in some EBCDIC environments, 0x25
+       the value 0x15 by default. However, in some EBCDIC  environments,  0x25
        is used. In such an environment you should use


          --enable-ebcdic-nl25


        as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
-       has  the  same  value  as in ASCII, namely, 0x0d. Whichever of 0x15 and
+       has the same value as in ASCII, namely, 0x0d.  Whichever  of  0x15  and
        0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
        acter (which, in Unicode, is 0x85).


@@ -3113,8 +3148,8 @@

PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT

-       By  default,  pcre2grep reads all files as plain text. You can build it
-       so that it recognizes files whose names end in .gz or .bz2,  and  reads
+       By default, pcre2grep reads all files as plain text. You can  build  it
+       so  that  it recognizes files whose names end in .gz or .bz2, and reads
        them with libz or libbz2, respectively, by adding one or both of


          --enable-pcre2grep-libz
@@ -3121,23 +3156,23 @@
          --enable-pcre2grep-libbz2


        to the configure command. These options naturally require that the rel-
-       evant libraries are installed on your system. Configuration  will  fail
+       evant  libraries  are installed on your system. Configuration will fail
        if they are not.



PCRE2GREP BUFFER SIZE

-       pcre2grep  uses an internal buffer to hold a "window" on the file it is
+       pcre2grep uses an internal buffer to hold a "window" on the file it  is
        scanning, in order to be able to output "before" and "after" lines when
-       it  finds  a match. The size of the buffer is controlled by a parameter
+       it finds a match. The size of the buffer is controlled by  a  parameter
        whose default value is 20K. The buffer itself is three times this size,
        but because of the way it is used for holding "before" lines, the long-
-       est line that is guaranteed to be processable is  the  parameter  size.
+       est  line  that  is guaranteed to be processable is the parameter size.
        You can change the default parameter value by adding, for example,


          --with-pcre2grep-bufsize=50K


-       to  the  configure  command.  The caller of pcre2grep can override this
+       to the configure command. The caller of  pcre2grep  can  override  this
        value by using --buffer-size on the command line..



@@ -3148,19 +3183,19 @@
          --enable-pcre2test-libreadline
          --enable-pcre2test-libedit


-       to the configure command, pcre2test  is  linked  with  the  libreadline
+       to  the  configure  command,  pcre2test  is linked with the libreadline
        orlibedit library, respectively, and when its input is from a terminal,
-       it reads it using the readline() function. This  provides  line-editing
-       and  history  facilities.  Note that libreadline is GPL-licensed, so if
-       you distribute a binary of pcre2test linked in this way, there  may  be
+       it  reads  it using the readline() function. This provides line-editing
+       and history facilities. Note that libreadline is  GPL-licensed,  so  if
+       you  distribute  a binary of pcre2test linked in this way, there may be
        licensing issues. These can be avoided by linking instead with libedit,
        which has a BSD licence.


-       Setting --enable-pcre2test-libreadline causes the -lreadline option  to
-       be  added to the pcre2test build. In many operating environments with a
-       sytem-installed readline library this is sufficient. However,  in  some
+       Setting  --enable-pcre2test-libreadline causes the -lreadline option to
+       be added to the pcre2test build. In many operating environments with  a
+       sytem-installed  readline  library this is sufficient. However, in some
        environments (e.g. if an unmodified distribution version of readline is
-       in use), some extra configuration may be necessary.  The  INSTALL  file
+       in  use),  some  extra configuration may be necessary. The INSTALL file
        for libreadline says this:


          "Readline uses the termcap functions, but does not link with
@@ -3167,7 +3202,7 @@
          the termcap or curses library itself, allowing applications
          which link with readline the to choose an appropriate library."


-       If  your environment has not been set up so that an appropriate library
+       If your environment has not been set up so that an appropriate  library
        is automatically included, you may need to add something like


          LIBS="-ncurses"
@@ -3175,6 +3210,16 @@
        immediately before the configure command.



+INCLUDING DEBUGGING CODE
+
+       If you add
+
+         --enable-debug
+
+       to  the configure command, additional debugging code is included in the
+       build. This feature is intended for use by the PCRE2 maintainers.
+
+
 DEBUGGING WITH VALGRIND SUPPORT


        If you add
@@ -3257,11 +3302,11 @@


REVISION

-       Last updated: 26 January 2015
+       Last updated: 24 April 2015
        Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2CALLOUT(3)            Library Functions Manual            PCRE2CALLOUT(3)



@@ -3624,8 +3669,8 @@
        Last updated: 23 March 2015
        Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2COMPAT(3)             Library Functions Manual             PCRE2COMPAT(3)



@@ -3809,8 +3854,8 @@
        Last updated: 15 March 2015
        Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2JIT(3)                Library Functions Manual                PCRE2JIT(3)



@@ -4192,8 +4237,8 @@
        Last updated: 27 November 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2LIMITS(3)             Library Functions Manual             PCRE2LIMITS(3)



@@ -4264,8 +4309,8 @@
        Last updated: 25 November 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2MATCHING(3)           Library Functions Manual           PCRE2MATCHING(3)



@@ -4483,8 +4528,8 @@
        Last updated: 29 September 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2PARTIAL(3)            Library Functions Manual            PCRE2PARTIAL(3)



@@ -4923,8 +4968,8 @@
        Last updated: 22 December 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE2UNICODE(3)            Library Functions Manual            PCRE2UNICODE(3)



@@ -5150,5 +5195,5 @@
        Last updated: 23 November 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+


Modified: code/trunk/doc/pcre2_callout_enumerate.3
===================================================================
--- code/trunk/doc/pcre2_callout_enumerate.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2_callout_enumerate.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -21,7 +21,7 @@
 .sp
   \fIcode\fP           Points to the compiled pattern
   \fIcallback\fP       The callback function
-  \fIcallout_data\fP   User data that is passed to the callback  
+  \fIcallout_data\fP   User data that is passed to the callback
 .sp
 The \fIcallback()\fP function is passed a pointer to a data block containing
 the following fields:
@@ -34,9 +34,9 @@
   \fIcallout_string_length\fP  Length of callout string
   \fIcallout_string\fP         Points to callout string or is NULL
 .sp
-The second argument is the callout data that was passed to 
-\fBpcre2_callout_enumerate()\fP. The \fBcallback()\fP function must return zero 
-for success. Any other value causes the pattern scan to stop, with the value 
+The second argument is the callout data that was passed to
+\fBpcre2_callout_enumerate()\fP. The \fBcallback()\fP function must return zero
+for success. Any other value causes the pattern scan to stop, with the value
 being passed back as the result of \fBpcre2_callout_enumerate()\fP.
 .P
 There is a complete description of the PCRE2 native API in the


Modified: code/trunk/doc/pcre2_compile.3
===================================================================
--- code/trunk/doc/pcre2_compile.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2_compile.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -37,7 +37,7 @@
 .sp
   PCRE2_ANCHORED           Force pattern anchoring
   PCRE2_ALT_BSUX           Alternative handling of \eu, \eU, and \ex
-  PCRE2_ALT_CIRCUMFLEX     Alternative handling of ^ in multiline mode 
+  PCRE2_ALT_CIRCUMFLEX     Alternative handling of ^ in multiline mode
   PCRE2_AUTO_CALLOUT       Compile automatic callouts
   PCRE2_CASELESS           Do caseless matching
   PCRE2_DOLLAR_ENDONLY     $ not to match newline at end
@@ -47,7 +47,7 @@
   PCRE2_FIRSTLINE          Force matching to be before newline
   PCRE2_MATCH_UNSET_BACKREF  Match unset back references
   PCRE2_MULTILINE          ^ and $ match newlines within data
-  PCRE2_NEVER_BACKSLASH_C  Lock out the use of \C in patterns 
+  PCRE2_NEVER_BACKSLASH_C  Lock out the use of \eC in patterns
   PCRE2_NEVER_UCP          Lock out PCRE2_UCP, e.g. via (*UCP)
   PCRE2_NEVER_UTF          Lock out PCRE2_UTF, e.g. via (*UTF)
   PCRE2_NO_AUTO_CAPTURE    Disable numbered capturing paren-


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2api.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -1045,7 +1045,7 @@
 binary zero character followed by z).
 .sp
   PCRE2_ALT_CIRCUMFLEX
-.sp   
+.sp
 In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter
 matches at the start of the subject (unless PCRE2_NOTBOL is set), and also
 after any internal newline. However, it does not match after a newline at the
@@ -1161,11 +1161,10 @@
 However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If
 there are no newlines in a subject string, or no occurrences of ^ or $ in a
 pattern, setting PCRE2_MULTILINE has no effect.
-
 .sp
   PCRE2_NEVER_BACKSLASH_C
 .sp
-This option locks out the use of \eC in the pattern that is being compiled. 
+This option locks out the use of \eC in the pattern that is being compiled.
 This escape can cause unpredictable behaviour in UTF-8 or UTF-16 modes, because
 it may leave the current matching point in the middle of a multi-code-unit
 character. This option may be useful in applications that process patterns from
@@ -1756,14 +1755,14 @@
 .B "  void *\fIuser_data\fP);"
 .fi
 .sp
-A script language that supports the use of string arguments in callouts might 
-like to scan all the callouts in a pattern before running the match. This can 
-be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a 
+A script language that supports the use of string arguments in callouts might
+like to scan all the callouts in a pattern before running the match. This can
+be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a
 pointer to a compiled pattern, the second points to a callback function, and
 the third is arbitrary user data. The callback function is called for every
 callout in the pattern in the order in which they appear. Its first argument is
 a pointer to a callout enumeration block, and its second argument is the
-\fIuser_data\fP value that was passed to \fBpcre2_callout_enumerate()\fP. The 
+\fIuser_data\fP value that was passed to \fBpcre2_callout_enumerate()\fP. The
 contents of the callout enumeration block are described in the
 .\" HREF
 \fBpcre2callout\fP
@@ -2330,7 +2329,7 @@
   PCRE2_ERROR_CALLOUT
 .sp
 This error is never generated by \fBpcre2_match()\fP itself. It is provided for
-use by callout functions that want to cause \fBpcre2_match()\fP or 
+use by callout functions that want to cause \fBpcre2_match()\fP or
 \fBpcre2_callout_enumerate()\fP to return a distinctive error code. See the
 .\" HREF
 \fBpcre2callout\fP


Modified: code/trunk/doc/pcre2build.3
===================================================================
--- code/trunk/doc/pcre2build.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2build.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -134,8 +134,8 @@
 request this by starting with (*UCP).
 .P
 The \eC escape sequence, which matches a single code unit, even in a UTF mode,
-can cause unpredictable behaviour because it may leave the current matching 
-point in the middle of a multi-code-unit character. It can be locked out by 
+can cause unpredictable behaviour because it may leave the current matching
+point in the middle of a multi-code-unit character. It can be locked out by
 setting the PCRE2_NEVER_BACKSLASH_C option.
 .
 .
@@ -417,8 +417,8 @@
 .sp
   --enable-debug
 .sp
-to the \fBconfigure\fP command, additional debugging code is included in the 
-build. This feature is intended for use by the PCRE2 maintainers.   
+to the \fBconfigure\fP command, additional debugging code is included in the
+build. This feature is intended for use by the PCRE2 maintainers.
 .
 .
 .SH "DEBUGGING WITH VALGRIND SUPPORT"


Modified: code/trunk/doc/pcre2callout.3
===================================================================
--- code/trunk/doc/pcre2callout.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2callout.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -204,11 +204,11 @@
   PCRE2_SIZE    \fIpattern_position\fP;
   PCRE2_SIZE    \fInext_item_length\fP;
   PCRE2_SIZE    \fIcallout_string_offset\fP;
-  PCRE2_SIZE    \fIcallout_string_length\fP; 
-  PCRE2_SPTR    \fIcallout_string\fP; 
+  PCRE2_SIZE    \fIcallout_string_length\fP;
+  PCRE2_SPTR    \fIcallout_string\fP;
 .sp
 The \fIversion\fP field contains the version number of the block format. The
-current version is 1; the three callout string fields were added for this 
+current version is 1; the three callout string fields were added for this
 version. If you are writing an application that might use an earlier release of
 PCRE2, you should check the version number before accessing any of these
 fields. The version number will increase in future if more fields are added,
@@ -247,7 +247,7 @@
 .SS "Fields for all callouts"
 .rs
 .sp
-The remaining fields in the callout block are the same for both kinds of 
+The remaining fields in the callout block are the same for both kinds of
 callout.
 .P
 The \fIoffset_vector\fP field is a pointer to the vector of capturing offsets
@@ -283,7 +283,7 @@
 always the case for the DFA matching functions.
 .P
 The \fIpattern_position\fP field contains the offset in the pattern string to
-the next item to be matched. 
+the next item to be matched.
 .P
 The \fInext_item_length\fP field contains the length of the next item to be
 matched in the pattern string. When the callout immediately precedes an
@@ -293,8 +293,8 @@
 .P
 The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
 help in distinguishing between different automatic callouts, which all have the
-same callout number. However, they are set for all callouts, and are used by 
-\fBpcre2test\fP to show the next item to be matched when displaying callout 
+same callout number. However, they are set for all callouts, and are used by
+\fBpcre2test\fP to show the next item to be matched when displaying callout
 information.
 .P
 In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to
@@ -329,9 +329,9 @@
 .B "  void *\fIuser_data\fP);"
 .fi
 .sp
-A script language that supports the use of string arguments in callouts might 
-like to scan all the callouts in a pattern before running the match. This can 
-be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a 
+A script language that supports the use of string arguments in callouts might
+like to scan all the callouts in a pattern before running the match. This can
+be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a
 pointer to a compiled pattern, the second points to a callback function, and
 the third is arbitrary user data. The callback function is called for every
 callout in the pattern in the order in which they appear. Its first argument is
@@ -347,7 +347,7 @@
   \fIcallout_string_length\fP  Length of callout string
   \fIcallout_string\fP         Points to callout string or is NULL
 .sp
-The version number is currently 0. It will increase if new fields are ever 
+The version number is currently 0. It will increase if new fields are ever
 added to the block. The remaining fields are the same as their namesakes in the
 \fBpcre2_callout\fP block that is used for callouts during matching, as
 described
@@ -363,8 +363,8 @@
 /(a)(a)/. This means that the callout will be enumerated more than once, but
 with the same value for \fIpattern_position\fP in each case.
 .P
-The callback function should normally return zero. If it returns a non-zero 
-value, scanning the pattern stops, and that value is returned from 
+The callback function should normally return zero. If it returns a non-zero
+value, scanning the pattern stops, and that value is returned from
 \fBpcre2_callout_enumerate()\fP.
 .
 .


Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2pattern.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -337,7 +337,7 @@
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters in a pattern, but when a pattern is being prepared by
 text editing, it is often easier to use one of the following escape sequences
-than the binary character it represents. In an ASCII or Unicode environment, 
+than the binary character it represents. In an ASCII or Unicode environment,
 these escapes are as follows:
 .sp
   \ea        alarm, that is, the BEL character (hex 07)
@@ -372,15 +372,15 @@
 \e? becomes either 255 (hex FF) or 95 (hex 5F).
 .P
 Thus, apart from \e?, these escapes generate the same character code values as
-they do in an ASCII environment, though the meanings of the values mostly 
+they do in an ASCII environment, though the meanings of the values mostly
 differ. For example, \eG always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 .P
 The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
-because 127 is not a control character in EBCDIC, Perl makes it generate the 
-APC character. Unfortunately, there are several variants of EBCDIC. In most of 
-them the APC character has the value 255 (hex FF), but in the one Perl calls 
-POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC 
+because 127 is not a control character in EBCDIC, Perl makes it generate the
+APC character. Unfortunately, there are several variants of EBCDIC. In most of
+them the APC character has the value 255 (hex FF), but in the one Perl calls
+POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
 values, PCRE2 makes \e? generate 95; otherwise it generates 255.
 .P
 After \e0 up to two further octal digits are read. If there are fewer than two
@@ -415,7 +415,7 @@
 following the discussion of
 .\" HTML <a href="#subpattern">
 .\" </a>
-parenthesized subpatterns. 
+parenthesized subpatterns.
 .\"
 Otherwise, up to three octal digits are read to form a character code.
 .P
@@ -1128,7 +1128,7 @@
 characters from the subject string. These two metacharacters are concerned with
 matching the starts and ends of lines. If the newline convention is set so that
 only the two-character sequence CRLF is recognized as a newline, isolated CR
-and LF characters are treated as ordinary data characters, and are not 
+and LF characters are treated as ordinary data characters, and are not
 recognized as newlines.
 .P
 Outside a character class, in the default matching mode, the circumflex
@@ -1220,7 +1220,7 @@
 byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is a
 32-bit unit. Unlike a dot, \eC always matches line-ending characters. The
 feature is provided in Perl in order to match individual bytes in UTF-8 mode,
-but it is unclear how it can usefully be used. 
+but it is unclear how it can usefully be used.
 .P
 Because \eC breaks up characters into individual code units, matching one unit
 with \eC in UTF-8 or UTF-16 mode means that the rest of the string may start
@@ -1227,7 +1227,7 @@
 with a malformed UTF character. This has undefined results, because PCRE2
 assumes that it is matching character by character in a valid UTF string (by
 default it checks the subject string's validity at the start of processing
-unless the PCRE2_NO_UTF_CHECK option is used). An application can lock out the 
+unless the PCRE2_NO_UTF_CHECK option is used). An application can lock out the
 use of \eC by setting the PCRE2_NEVER_BACKSLASH_C option.
 .P
 PCRE2 does not allow \eC to appear in lookbehind assertions
@@ -1505,7 +1505,7 @@
 setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS and
 PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
 permitted. If a letter appears both before and after the hyphen, the option is
-unset. An empty options setting "(?)" is allowed. Needless to say, it has no 
+unset. An empty options setting "(?)" is allowed. Needless to say, it has no
 effect.
 .P
 The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
@@ -1542,7 +1542,7 @@
   (?i:saturday|sunday)
   (?:(?i)saturday|sunday)
 .sp
-match exactly the same set of strings. 
+match exactly the same set of strings.
 .P
 \fBNote:\fP There are other PCRE2-specific options that can be set by the
 application when the compiling function is called. The pattern can contain
@@ -2907,14 +2907,14 @@
 .SS "Callouts with string arguments"
 .rs
 .sp
-A delimited string may be used instead of a number as a callout argument. The 
-starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is 
-the same as the start, except for {, where the ending delimiter is }. If the 
-ending delimiter is needed within the string, it must be doubled. For 
+A delimited string may be used instead of a number as a callout argument. The
+starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is
+the same as the start, except for {, where the ending delimiter is }. If the
+ending delimiter is needed within the string, it must be doubled. For
 example:
 .sp
   (?C'ab ''c'' d')xyz(?C{any text})pqr
-.sp   
+.sp
 The doubling is removed before the string is passed to the callout function.
 .
 .


Modified: code/trunk/doc/pcre2syntax.3
===================================================================
--- code/trunk/doc/pcre2syntax.3    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2syntax.3    2015-06-18 16:39:25 UTC (rev 288)
@@ -49,7 +49,7 @@
 .\" HREF
 \fBpcre2pattern\fP
 .\"
-documentation, where details of escape processing in EBCDIC environments are 
+documentation, where details of escape processing in EBCDIC environments are
 also given.
 .P
 When \ex is not followed by {, from zero to two hexadecimal digits are read,


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2test.1    2015-06-18 16:39:25 UTC (rev 288)
@@ -65,7 +65,7 @@
 contain binary zeroes, even though in Unix-like environments, \fBfgets()\fP
 treats any bytes other than newline as data characters. In some Windows
 environments character 26 (hex 1A) causes an immediate end of file, and no
-further data is read. 
+further data is read.
 .P
 For maximum portability, therefore, it is safest to avoid non-printing
 characters in \fBpcre2test\fP input files. There is a facility for specifying a
@@ -237,7 +237,7 @@
   #forbid_utf
 .sp
 Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
-options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and 
+options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and
 the use of (*UTF) and (*UCP) at the start of patterns. This command also forces
 an error if a subsequent pattern contains any occurrences of \eP, \ep, or \eX,
 which are still supported when PCRE2_UTF is not set, but which require Unicode
@@ -245,7 +245,7 @@
 .P
 This is a trigger guard that is used in test files to ensure that UTF or
 Unicode property tests are not accidentally added to files that are used when
-Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and 
+Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and
 PCRE2_NEVER_UCP as a default can also be obtained by the use of \fB#pattern\fP;
 the difference is that \fB#forbid_utf\fP cannot be unset, and the automatic
 options are not displayed in pattern information, to avoid cluttering up test
@@ -443,7 +443,7 @@
 .sp
       allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
       alt_bsux                  set PCRE2_ALT_BSUX
-      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX 
+      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
       anchored                  set PCRE2_ANCHORED
       auto_callout              set PCRE2_AUTO_CALLOUT
   /i  caseless                  set PCRE2_CASELESS
@@ -454,7 +454,7 @@
       firstline                 set PCRE2_FIRSTLINE
       match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
   /m  multiline                 set PCRE2_MULTILINE
-      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C 
+      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
       never_ucp                 set PCRE2_NEVER_UCP
       never_utf                 set PCRE2_NEVER_UTF
       no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
@@ -481,7 +481,7 @@
 .sp
       bsr=[anycrlf|unicode]     specify \eR handling
   /B  bincode                   show binary code without lengths
-      callout_info              show callout information 
+      callout_info              show callout information
       debug                     same as info,fullbincode
       fullbincode               show binary code with lengths
   /I  info                      show info about compiled pattern
@@ -559,9 +559,9 @@
 not necessarily the last character. These lines are omitted if no starting or
 ending code units are recorded.
 .P
-The \fBcallout_info\fP modifier requests information about all the callouts in 
-the pattern. A list of them is output at the end of any other information that 
-is requested. For each callout, either its number or string is given, followed 
+The \fBcallout_info\fP modifier requests information about all the callouts in
+the pattern. A list of them is output at the end of any other information that
+is requested. For each callout, either its number or string is given, followed
 by the item that follows it in the pattern.
 .
 .


Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/doc/pcre2test.txt    2015-06-18 16:39:25 UTC (rev 288)
@@ -226,15 +226,21 @@
          #forbid_utf


        Subsequent  patterns  automatically  have   the   PCRE2_NEVER_UTF   and
-       PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
-       property features. This is a trigger guard that is used in  test  files
-       to ensure that UTF or Unicode property tests are not accidentally added
-       to files that are used when Unicode support  is  not  included  in  the
-       library.  This  effect can also be obtained by the use of #pattern; the
-       difference is that #forbid_utf  cannot  be  unset,  and  the  automatic
-       options  are  not displayed in pattern information, to avoid cluttering
-       up test output.
+       PCRE2_NEVER_UCP  options  set, which locks out the use of the PCRE2_UTF
+       and PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start  of
+       patterns.  This  command  also  forces an error if a subsequent pattern
+       contains any occurrences of \P, \p, or \X, which  are  still  supported
+       when  PCRE2_UTF  is not set, but which require Unicode property support
+       to be included in the library.


+       This is a trigger guard that is used in test files to ensure  that  UTF
+       or  Unicode property tests are not accidentally added to files that are
+       used when Unicode support is  not  included  in  the  library.  Setting
+       PCRE2_NEVER_UTF  and  PCRE2_NEVER_UCP as a default can also be obtained
+       by the use of #pattern; the difference is that  #forbid_utf  cannot  be
+       unset,  and the automatic options are not displayed in pattern informa-
+       tion, to avoid cluttering up test output.
+
          #load <filename>


        This command is used to load a set of precompiled patterns from a file,
@@ -417,6 +423,7 @@


              allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
              alt_bsux                  set PCRE2_ALT_BSUX
+             alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
              anchored                  set PCRE2_ANCHORED
              auto_callout              set PCRE2_AUTO_CALLOUT
          /i  caseless                  set PCRE2_CASELESS
@@ -427,6 +434,7 @@
              firstline                 set PCRE2_FIRSTLINE
              match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
          /m  multiline                 set PCRE2_MULTILINE
+             never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
              never_ucp                 set PCRE2_NEVER_UCP
              never_utf                 set PCRE2_NEVER_UTF
              no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
@@ -1322,5 +1330,5 @@


REVISION

-       Last updated: 22 March 2015
+       Last updated: 20 May 2015
        Copyright (c) 1997-2015 University of Cambridge.


Modified: code/trunk/src/config.h.generic
===================================================================
--- code/trunk/src/config.h.generic    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/config.h.generic    2015-06-18 16:39:25 UTC (rev 288)
@@ -200,7 +200,7 @@
 #define PACKAGE_NAME "PCRE2"


/* Define to the full name and version of this package. */
-#define PACKAGE_STRING "PCRE2 10.10"
+#define PACKAGE_STRING "PCRE2 10.20-RC1"

/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre2"
@@ -209,7 +209,7 @@
#define PACKAGE_URL ""

/* Define to the version of this package. */
-#define PACKAGE_VERSION "10.10"
+#define PACKAGE_VERSION "10.20-RC1"

 /* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
    parentheses (of any kind) in a pattern. This limits the amount of system
@@ -227,6 +227,9 @@
 #define PCRE2GREP_BUFSIZE 20480
 #endif


+/* Define to any value to include debugging code. */
+/* #undef PCRE2_DEBUG */
+
 /* If you are compiling for a system other than a Unix-like system or
    Win32, and it needs some magic to be inserted before the definition
    of a function that is exported by the library, define this macro to
@@ -287,7 +290,7 @@
 /* #undef SUPPORT_VALGRIND */


/* Version number of package */
-#define VERSION "10.10"
+#define VERSION "10.20-RC1"

/* Define to empty if `const' does not conform to ANSI C. */
/* #undef const */

Modified: code/trunk/src/pcre2.h.generic
===================================================================
--- code/trunk/src/pcre2.h.generic    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2.h.generic    2015-06-18 16:39:25 UTC (rev 288)
@@ -42,9 +42,9 @@
 /* The current PCRE version information. */


 #define PCRE2_MAJOR          10
-#define PCRE2_MINOR          10
-#define PCRE2_PRERELEASE     
-#define PCRE2_DATE           2015-03-06
+#define PCRE2_MINOR          20
+#define PCRE2_PRERELEASE     -RC1
+#define PCRE2_DATE           2015-06-16


 /* When an application links to a PCRE DLL in Windows, the symbols that are
 imported have to be identified as such. When building PCRE2, the appropriate
@@ -118,6 +118,8 @@
 #define PCRE2_UCP                 0x00020000u  /* C J M D */
 #define PCRE2_UNGREEDY            0x00040000u  /* C       */
 #define PCRE2_UTF                 0x00080000u  /* C J M D */
+#define PCRE2_NEVER_BACKSLASH_C   0x00100000u  /* C       */
+#define PCRE2_ALT_CIRCUMFLEX      0x00200000u  /*   J M D */


/* These are for pcre2_jit_compile(). */

@@ -125,9 +127,10 @@
 #define PCRE2_JIT_PARTIAL_SOFT    0x00000002u
 #define PCRE2_JIT_PARTIAL_HARD    0x00000004u


-/* These are for pcre2_match() and pcre2_dfa_match(). Note that PCRE2_ANCHORED,
-and PCRE2_NO_UTF_CHECK can also be passed to these functions, so take care not
-to define synonyms by mistake. */
+/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note
+that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these
+functions (though pcre2_jit_match() ignores the latter since it bypasses all
+sanity checks). */

 #define PCRE2_NOTBOL              0x00000001u
 #define PCRE2_NOTEOL              0x00000002u
@@ -337,8 +340,24 @@
   PCRE2_SIZE    current_position;  /* Where we currently are in the subject */ \
   PCRE2_SIZE    pattern_position;  /* Offset to next item in the pattern */ \
   PCRE2_SIZE    next_item_length;  /* Length of next item in the pattern */ \
+  /* ------------------- Added for Version 1 -------------------------- */ \
+  PCRE2_SIZE    callout_string_offset; /* Offset to string within pattern */ \
+  PCRE2_SIZE    callout_string_length; /* Length of string compiled into pattern */ \
+  PCRE2_SPTR    callout_string;    /* String compiled into pattern */ \
   /* ------------------------------------------------------------------ */ \
-} pcre2_callout_block;
+} pcre2_callout_block; \
+\
+typedef struct pcre2_callout_enumerate_block { \
+  uint32_t      version;           /* Identifies version of block */ \
+  /* ------------------------ Version 0 ------------------------------- */ \
+  PCRE2_SIZE    pattern_position;  /* Offset to next item in the pattern */ \
+  PCRE2_SIZE    next_item_length;  /* Length of next item in the pattern */ \
+  uint32_t      callout_number;    /* Number compiled into pattern */ \
+  PCRE2_SIZE    callout_string_offset; /* Offset to string within pattern */ \
+  PCRE2_SIZE    callout_string_length; /* Length of string compiled into pattern */ \
+  PCRE2_SPTR    callout_string;    /* String compiled into pattern */ \
+  /* ------------------------------------------------------------------ */ \
+} pcre2_callout_enumerate_block;



/* List the generic forms of all other functions in macros, which will be
@@ -406,6 +425,9 @@

 #define PCRE2_PATTERN_INFO_FUNCTIONS \
 PCRE2_EXP_DECL int       pcre2_pattern_info(const pcre2_code *, uint32_t, \
+                           void *); \
+PCRE2_EXP_DECL int       pcre2_callout_enumerate(const pcre2_code *, \
+                           int (*)(pcre2_callout_enumerate_block *, void *), \
                            void *);



@@ -534,15 +556,17 @@

/* Data blocks */

-#define pcre2_callout_block         PCRE2_SUFFIX(pcre2_callout_block_)
-#define pcre2_general_context       PCRE2_SUFFIX(pcre2_general_context_)
-#define pcre2_compile_context       PCRE2_SUFFIX(pcre2_compile_context_)
-#define pcre2_match_context         PCRE2_SUFFIX(pcre2_match_context_)
-#define pcre2_match_data            PCRE2_SUFFIX(pcre2_match_data_)
+#define pcre2_callout_block            PCRE2_SUFFIX(pcre2_callout_block_)
+#define pcre2_callout_enumerate_block  PCRE2_SUFFIX(pcre2_callout_enumerate_block_)
+#define pcre2_general_context          PCRE2_SUFFIX(pcre2_general_context_)
+#define pcre2_compile_context          PCRE2_SUFFIX(pcre2_compile_context_)
+#define pcre2_match_context            PCRE2_SUFFIX(pcre2_match_context_)
+#define pcre2_match_data               PCRE2_SUFFIX(pcre2_match_data_)



/* Functions: the complete list in alphabetical order */

+#define pcre2_callout_enumerate               PCRE2_SUFFIX(pcre2_callout_enumerate_)
 #define pcre2_code_free                       PCRE2_SUFFIX(pcre2_code_free_)
 #define pcre2_compile                         PCRE2_SUFFIX(pcre2_compile_)
 #define pcre2_compile_context_copy            PCRE2_SUFFIX(pcre2_compile_context_copy_)
@@ -550,7 +574,6 @@
 #define pcre2_compile_context_free            PCRE2_SUFFIX(pcre2_compile_context_free_)
 #define pcre2_config                          PCRE2_SUFFIX(pcre2_config_)
 #define pcre2_dfa_match                       PCRE2_SUFFIX(pcre2_dfa_match_)
-#define pcre2_match                           PCRE2_SUFFIX(pcre2_match_)
 #define pcre2_general_context_copy            PCRE2_SUFFIX(pcre2_general_context_copy_)
 #define pcre2_general_context_create          PCRE2_SUFFIX(pcre2_general_context_create_)
 #define pcre2_general_context_free            PCRE2_SUFFIX(pcre2_general_context_free_)
@@ -566,6 +589,7 @@
 #define pcre2_jit_stack_create                PCRE2_SUFFIX(pcre2_jit_stack_create_)
 #define pcre2_jit_stack_free                  PCRE2_SUFFIX(pcre2_jit_stack_free_)
 #define pcre2_maketables                      PCRE2_SUFFIX(pcre2_maketables_)
+#define pcre2_match                           PCRE2_SUFFIX(pcre2_match_)
 #define pcre2_match_context_copy              PCRE2_SUFFIX(pcre2_match_context_copy_)
 #define pcre2_match_context_create            PCRE2_SUFFIX(pcre2_match_context_create_)
 #define pcre2_match_context_free              PCRE2_SUFFIX(pcre2_match_context_free_)


Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2.h.in    2015-06-18 16:39:25 UTC (rev 288)
@@ -129,7 +129,7 @@


/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note
that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these
-functions (though pcre2_jit_match() ignores the latter since it bypasses all
+functions (though pcre2_jit_match() ignores the latter since it bypasses all
sanity checks). */

 #define PCRE2_NOTBOL              0x00000001u


Modified: code/trunk/src/pcre2_auto_possess.c
===================================================================
--- code/trunk/src/pcre2_auto_possess.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_auto_possess.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -562,7 +562,7 @@
   cb          compile data block
   base_list   the data list of the base opcode
   base_end    the end of the data list
-  rec_limit   points to recursion depth counter  
+  rec_limit   points to recursion depth counter


 Returns:      TRUE if the auto-possessification is possible
 */
@@ -664,7 +664,7 @@


     while (*next_code == OP_ALT)
       {
-      if (!compare_opcodes(code, utf, cb, base_list, base_end, rec_limit)) 
+      if (!compare_opcodes(code, utf, cb, base_list, base_end, rec_limit))
         return FALSE;
       code = next_code + 1 + LINK_SIZE;
       next_code += GET(next_code, 1);


Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_dfa_match.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -2632,7 +2632,7 @@
             if (code[LINK_SIZE + 1] == OP_CALLOUT)
               {
               cb.callout_number = code[2 + 3*LINK_SIZE];
-              cb.callout_string_offset = 0; 
+              cb.callout_string_offset = 0;
               cb.callout_string = NULL;
               cb.callout_string_length = 0;
               }
@@ -2639,7 +2639,7 @@
             else
               {
               cb.callout_number = 0;
-              cb.callout_string_offset = GET(code, 2 + 4*LINK_SIZE); 
+              cb.callout_string_offset = GET(code, 2 + 4*LINK_SIZE);
               cb.callout_string = code + (2 + 5*LINK_SIZE) + 1;
               cb.callout_string_length =
                 callout_length - (1 + 4*LINK_SIZE) - 2;
@@ -2663,7 +2663,7 @@


         /* The DEFINE condition is always false, and the assertion (?!) is
         converted to OP_FAIL. */
-        
+
         if (condcode == OP_FALSE || condcode == OP_FAIL)
           { ADD_ACTIVE(state_offset + codelink + LINK_SIZE + 1, 0); }


@@ -3001,7 +3001,7 @@
           if (*code == OP_CALLOUT)
             {
             cb.callout_number = code[1 + 2*LINK_SIZE];
-            cb.callout_string_offset = 0; 
+            cb.callout_string_offset = 0;
             cb.callout_string = NULL;
             cb.callout_string_length = 0;
             }
@@ -3008,7 +3008,7 @@
           else
             {
             cb.callout_number = 0;
-            cb.callout_string_offset = GET(code, 1 + 3*LINK_SIZE); 
+            cb.callout_string_offset = GET(code, 1 + 3*LINK_SIZE);
             cb.callout_string = code + (1 + 4*LINK_SIZE) + 1;
             cb.callout_string_length =
               callout_length - (1 + 4*LINK_SIZE) - 2;


Modified: code/trunk/src/pcre2_error.c
===================================================================
--- code/trunk/src/pcre2_error.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_error.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -145,9 +145,9 @@
   "different names for subpatterns of the same number are not allowed\0"
   "(*MARK) must have an argument\0"
   "non-hex character in \\x{} (closing brace missing?)\0"
-#ifndef EBCDIC   
+#ifndef EBCDIC
   "\\c must be followed by a printable ASCII character\0"
-#else   
+#else
   "\\c must be followed by a letter or one of [\\]^_?\0"
 #endif
   "\\k is not followed by a braced, angle-bracketed, or quoted name\0"
@@ -168,7 +168,7 @@
   "missing terminating delimiter for callout with string argument\0"
   "unrecognized string delimiter follows (?C\0"
   "using \\C is disabled by the application\0"
-  "(?| and/or (?J: or (?x: parentheses are too deeply nested\0" 
+  "(?| and/or (?J: or (?x: parentheses are too deeply nested\0"
   ;


/* Match-time and UTF error texts are in the same format. */

Modified: code/trunk/src/pcre2_internal.h
===================================================================
--- code/trunk/src/pcre2_internal.h    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_internal.h    2015-06-18 16:39:25 UTC (rev 288)
@@ -1230,7 +1230,7 @@
 #define XCL_PROP      3    /* Unicode property (2-byte property code follows) */
 #define XCL_NOTPROP   4    /* Unicode inverted property (ditto) */


-/* Escape items that are just an encoding of a particular data value. These
+/* Escape items that are just an encoding of a particular data value. These
appear in the escapes[] table in pcre2_compile.c as positive numbers. */

#ifndef ESC_a
@@ -1262,7 +1262,7 @@

/* These are escaped items that aren't just an encoding of a particular data
value such as \n. They must have non-zero values, as check_escape() returns 0
-for a data character. In the escapes[] table in pcre2_compile.c their values
+for a data character. In the escapes[] table in pcre2_compile.c their values
are negated in order to distinguish them from data values.

They must appear here in the same order as in the opcode definitions below, up

Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_intmodedep.h    2015-06-18 16:39:25 UTC (rev 288)
@@ -662,7 +662,7 @@
   PCRE2_SPTR   name;          /* Points to the name in the pattern */
   uint32_t     number;        /* Group number */
   uint16_t     length;        /* Length of the name */
-  uint16_t     isdup;         /* TRUE if a duplicate */ 
+  uint16_t     isdup;         /* TRUE if a duplicate */
 } named_group;


/* Structure for passing "static" information around between the functions

Modified: code/trunk/src/pcre2_jit_compile.c
===================================================================
--- code/trunk/src/pcre2_jit_compile.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_jit_compile.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -6432,13 +6432,13 @@
   {
   value1 = 0;
   value2 = 0;
-  value3 = 0; 
+  value3 = 0;
   }
 else
   {
   value1 = (sljit_sw) (cc + (1 + 4*LINK_SIZE) + 1);
   value2 = (callout_length - (1 + 4*LINK_SIZE + 2));
-  value3 = (sljit_sw) (GET(cc, 1 + 3*LINK_SIZE)); 
+  value3 = (sljit_sw) (GET(cc, 1 + 3*LINK_SIZE));
   }


OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(callout_string), SLJIT_IMM, value1);

Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_match.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -2156,7 +2156,7 @@
     ecode++;
     break;


-    /* Multiline mode: start of subject unless notbol, or after any newline 
+    /* Multiline mode: start of subject unless notbol, or after any newline
     except for one at the very end, unless PCRE2_ALT_CIRCUMFLEX is set. */


     case OP_CIRCM:
@@ -2163,7 +2163,7 @@
     if ((mb->moptions & PCRE2_NOTBOL) != 0 && eptr == mb->start_subject)
       RRETURN(MATCH_NOMATCH);
     if (eptr != mb->start_subject &&
-        ((eptr == mb->end_subject && 
+        ((eptr == mb->end_subject &&
            (mb->poptions & PCRE2_ALT_CIRCUMFLEX) == 0) ||
          !WAS_NEWLINE(eptr)))
       RRETURN(MATCH_NOMATCH);


Modified: code/trunk/src/pcre2_pattern_info.c
===================================================================
--- code/trunk/src/pcre2_pattern_info.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_pattern_info.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -239,7 +239,7 @@


 Returns:        0 when successfully completed
                 < 0 on local error
-               != 0 for callback error  
+               != 0 for callback error
 */


PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
@@ -270,7 +270,7 @@

 while (TRUE)
   {
-  int rc; 
+  int rc;
   switch (*cc)
     {
     case OP_END:
@@ -378,7 +378,7 @@
     cb.callout_string_length = 0;
     cb.callout_string = NULL;
     rc = callback(&cb, callout_data);
-    if (rc != 0) return rc; 
+    if (rc != 0) return rc;
     cc += PRIV(OP_lengths)[*cc];
     break;


@@ -391,7 +391,7 @@
       GET(cc, 1 + 2*LINK_SIZE) - (1 + 4*LINK_SIZE) - 2;
     cb.callout_string = cc + (1 + 4*LINK_SIZE) + 1;
     rc = callback(&cb, callout_data);
-    if (rc != 0) return rc; 
+    if (rc != 0) return rc;
     cc += GET(cc, 1 + 2*LINK_SIZE);
     break;



Modified: code/trunk/src/pcre2_tables.c
===================================================================
--- code/trunk/src/pcre2_tables.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2_tables.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -67,18 +67,18 @@
 const uint32_t PRIV(vspace_list)[] = { VSPACE_LIST };


/* These tables are the pairs of delimiters that are valid for callout string
-arguments. For each starting delimiter there must be a matching ending
+arguments. For each starting delimiter there must be a matching ending
delimiter, which in fact is different only for bracket-like delimiters. */

const uint32_t PRIV(callout_start_delims)[] = {
CHAR_GRAVE_ACCENT, CHAR_APOSTROPHE, CHAR_QUOTATION_MARK,
CHAR_CIRCUMFLEX_ACCENT, CHAR_PERCENT_SIGN, CHAR_NUMBER_SIGN,
- CHAR_DOLLAR_SIGN, CHAR_LEFT_CURLY_BRACKET, 0 };
+ CHAR_DOLLAR_SIGN, CHAR_LEFT_CURLY_BRACKET, 0 };

const uint32_t PRIV(callout_end_delims[]) = {
CHAR_GRAVE_ACCENT, CHAR_APOSTROPHE, CHAR_QUOTATION_MARK,
CHAR_CIRCUMFLEX_ACCENT, CHAR_PERCENT_SIGN, CHAR_NUMBER_SIGN,
- CHAR_DOLLAR_SIGN, CHAR_RIGHT_CURLY_BRACKET, 0 };
+ CHAR_DOLLAR_SIGN, CHAR_RIGHT_CURLY_BRACKET, 0 };


/*************************************************

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2015-06-17 11:32:06 UTC (rev 287)
+++ code/trunk/src/pcre2test.c    2015-06-18 16:39:25 UTC (rev 288)
@@ -4492,9 +4492,9 @@
   fprintf(outfile, "\n");
   return PR_SKIP;
   }
-  
-/* If forbid_utf is non-zero, we are running a non-UTF test. UTF and UCP are 
-locked out at compile time, but we must also check for occurrences of \P, \p, 
+
+/* If forbid_utf is non-zero, we are running a non-UTF test. UTF and UCP are
+locked out at compile time, but we must also check for occurrences of \P, \p,
 and \X, which are only supported when Unicode is supported. */


 if (forbid_utf != 0)
@@ -4503,9 +4503,9 @@
     {
     fprintf(outfile, "** \\P, \\p, and \\X are not allowed after the "
       "#forbid_utf command\n");
-    return PR_SKIP;     
-    }     
-  }  
+    return PR_SKIP;
+    }
+  }


/* Remember the maximum lookbehind, for partial matching. */

@@ -5095,7 +5095,7 @@
#endif

/* Allocate a buffer to hold the data line; len+1 is an upper bound on
-the number of code units that will be needed (though the buffer may have to be
+the number of code units that will be needed (though the buffer may have to be
extended if replication is involved). */

needlen = (size_t)(len * code_unit_size);
@@ -5145,7 +5145,7 @@

     replen = CAST8VAR(q) - start_rep;
     needlen += replen * i;
-    
+
     if (needlen >= dbuffer_size)
       {
       while (needlen >= dbuffer_size) dbuffer_size *= 2;
@@ -5158,7 +5158,7 @@
       SETCASTPTR(q, dbuffer + qoffset);
       start_rep = dbuffer + rep_offset;
       }
-      
+
     while (i-- > 0)
       {
       memcpy(CAST8VAR(q), start_rep, replen);